Placeholder Image

字幕列表 影片播放

  • In a previous video, we looked at how CPU's can use caches to speed up accesses to memory.

  • So, the CPU has to fetch things from memory; it might be a bit of data, it might be an instruction

  • And it goes through the cache to try and access it.

  • And the cache keeps a local copy in fast memory to try and speed up the accesses

  • But what we didn't talk about is: What does a CPU do with what it's fetched from memory

  • what is it actually doing and how does it process it?

  • So the CPU is fetching values from memory.

  • We'll ignore the cache for now, because it doesn't matter if the CPU has a cache or not

  • it's still gonna do roughly the same things

  • And we're also gonna look at very old CPU's

  • the sort of things that are in 8-bit machines

  • purely because they're simpler to deal with

  • and simpler to see what's going on

  • The same idea is still applied to an ARM CPU today or an X86 chip

  • or whatever it is you got in your machine.

  • Modern CPU's use what's called the Van Neumann architecture

  • and what this basically means is that you have a CPU

  • and you have a block of memory.

  • And that memory is connected to the CPU by two buses

  • Each is just a collection of several wires that are connecting

  • And again we're looking at old-fashioned macines. On a modern machine it gets a bit more complicated

  • But the idea, the principle, is the same.

  • So we have an addess bus

  • and the idea is that the CPU can generate a number in here in binary

  • to access any particular value in here.

  • So we say that the first one is at adress 0

  • and we're gonna use a 6502 as an example

  • We'll say that the last one is at address 65535 in decimal, or FFFF in hexadecimal

  • So we can generate any of these numbers on 16 bits of this address bus

  • to access any of the individual bytes in this memory

  • How do we get the data between the two? Well we have another bus

  • which is called the data bus, which connects the two together

  • Now the reason why this is a Van Neumann machine

  • is because this memory can contain both the program

  • i.e. the bytes that make up the instructions that the CPU can execute

  • and the data

  • So the same block of memory contain some bytes

  • which contain program instructions

  • some bytes which contain data

  • And the CPU if you wanted to could treat the program as data

  • or treat the data as program

  • Well if you do that then it would probably crash

  • So what we've got here is an old BBC Micro using a 6502 CPU

  • and we're gonna just write a very, very simple machine code program

  • that uses

  • well the operation is saying just to print out the letter C for computerphile

  • So if you assemble it, we're using hexadecimal

  • we've started our program at 084C

  • So that's the address, were our program is being created

  • And our program is very simple

  • It loads one of the CPU's registers

  • which is just basically a temporary data store that you can use

  • and this one is called the accumulator

  • with the ascii code 67 which represents a capital C

  • and then it says: jump to the subroutine at this address

  • which will print out that particular character

  • And then we tell it we want to stop so we gotta return

  • from subroutine. And if we run this

  • and type in the address, so we're at ... 84C

  • then you'll see that it prints out the letter C

  • and then we get a prompt to carry on doing things

  • So our program, we write it in assembly language

  • which we can understand as humans

  • -ish, LDA: Load Accumulator JSR: Jump to subroutine

  • RTS: Return to subroutine

  • You get the idea once you've done it a few times

  • And the computer converts this into a series of numbers, in binary

  • The CPU is working in binary but to make it easier to read we display it as hexadecimal

  • So our program becomes: A9, 43

  • 20 EE FF 60

  • That's the program we've written

  • And the CPU, when it runs it needs to fetch those bytes from memory

  • into the CPU

  • Now, how does it do that?

  • To get the first byte we need to put the address: 084C on the address bus

  • and a bit later on, the memory will send back the byte that represents the instruction: A9

  • Now, how does the CPU know where to get these instructions from?

  • Well, it's quite simple. Inside the CPU

  • there is a register, which we call the program counter, or PC on a 6502

  • or something like an X86 machine it's known as the instruction pointer.

  • And all that does is store the address to the next instruction to execute

  • So when we were starting up here, it would have 084C in it

  • That's the address to the instruction we want to execute

  • So when the CPU wants to fetch the instruction it's gonna execute

  • It puts that address on the address bus

  • and the memory then sends the instruction back to the CPU

  • So the first thing the CPU is gonna do to run our program

  • is to fetch the instruction

  • and the way it does that is by putting the address from

  • the program counter onto the address bus

  • and then fetching the actual instruction

  • So the memory provides it, but the CPU then reads that in

  • on it's input on the data bus

  • Now it needs to fetch the whole instruction that the CPU is gonna execute

  • and on the example we saw there it was relatively straightforward

  • because the instruction was only a byte long

  • Not all CPU's are that simple

  • Some CPU's will vary these things, so this hardware can actually be quite complicated

  • so it needs to work out how long the instruction is

  • So it could be as short as one byte

  • it could be as long on some CPU's as 15 bytes

  • and you sometimes don't know how long it's gonna be until you've read at few of the bytes

  • So this hardware can be relatively trivial

  • So an ARM CPU makes it very, very simple it says: all instructions are 32 bits long

  • So the Archimedes over there can fetch the instruction very, very simply

  • 32 bits

  • On something like an x86, it can be any length up to 15 bytes or so

  • and so this becomes more complicated, you have to sort of work out

  • what it is utnil you've got it

  • But we fetch the instruction

  • So in the example we've got, we've got A9 here

  • So we now need to work out what A9 does

  • Well, we need to decode it into what we want the CPU to actually do

  • So we need to have another bit of our CPU's hardware

  • which we're dedicating to decoding the instruction

  • So we have a part of the CPU which is fetching it

  • and part of the CPU which is then decoding it

  • So it gets A9 into it: So the A9 comes into the decode

  • And it says: Well okay, that's a load instruction.

  • So I need to fetch a value from memory

  • which was the 43

  • the ASCII code for the capital letter C that we saw earlier

  • So we need to fetch something else from memory

  • We need to access memory again, and we need to work out what address

  • that's gonna be.

  • We also then need to, once we've got that value,

  • update the right register to store that value

  • So we've gotta do things in sequence.

  • So part of the Decode logic is to take the single instruction byte,

  • or how long it is,

  • and work out what's the sequence that we need to drive the other bits of the CPU to do

  • And so that also means that we have another bit of the CPU

  • which is the actual bit that does things,

  • which is gonna be all the logic which actually executes instructions

  • So we start off by fetching it

  • and then once we've fetched it we can start decoding it

  • and then we can execute it

  • And the decode logic is responsible for saying:

  • Put the address for where you want to get the value, that you can load into memory from

  • and then store it, once it's been loaded into the CPU

  • So you're doing things in order:

  • We have to fetch it first

  • and we can't decode it until we've fetched it

  • and we can't execute things until we've decoded it

  • So, at any one time, we'll probably find on a simple CPU

  • that quite a few of the bits of the CPU wouldn't actually be doing anything

  • So, while we're fetching the value from memory

  • to work out how we're gonna decode it

  • the decode and the execute logic aren't doing anything

  • They're just sitting there, waiting for their turn

  • And then, when we decode it, it's not fetching anything

  • and it's not executing anything

  • So we're sort of moving through these different states one after the other

  • And that takes different amounts of time

  • If we're fetching 15 bytes it's gonna take longer than if we're fetching one

  • decoding it might well be shorter

  • than if we're fetching something from memory, cos' this is all inside the CPU

  • And the execution depends on what's actually happening

  • So your CPU will work like this: It will go through each phase,

  • then once it's done that, it'll start on the next clock tick

  • all the CPU's are synchronized to a clock,

  • which just keeps things moving in sequence

  • and you can build a CPU. Something like the 6502 worked like that

  • But, as we said, lots of the CPU aren't actually doing anything at any time

  • which is a bit wasteful of the resources

  • So is there another way you can do this?

  • And the answer is yes! You can do what's called

  • a sort of pipe-lined model of a CPU

  • So what you do here is, you still have the same 3 bits of the CPU

  • But you say: Okay, so we gotta fetch (f)

  • instruction one

  • In the next bit of time, I'm gonna start decoding this one

  • So, I'm gonna start decoding instruction one

  • But I'm gonna say: I'm not using the fetch logic here,

  • so I'm gonna have this start to get things ready

  • and, start to do things ahead of schedule

  • I'm also at the same time gonna fetch instruction 2

  • So now I'm doing two things, two bit's of my CPU in use the same time

  • I'm fetching the next instruction, while decoding the first one

  • And once we've done decoding, I can start executing the first instruction

  • So I execute that

  • But at the same time, I can start decoding instruction 2

  • and hopefully, I can start fetching instruction 3

  • So what? It is still taking the same amount of time to execute that first instruction

  • So the beauty is when it comes to executing instruction two

  • it completes exactly one cycle after the other

  • rather than having to wait for it to go through the fetch and decode and execute cycles

  • we can just execute it as soon as we've finished instruction one

  • So each instruction still takes the same amount of time

  • it's gonna take, say, three clock cycles to go through the CPU

  • but because we've sort of pipelined it together

  • they actually appear to execute one after each other

  • so it appears to execute one clock cycle after each other

  • And we could do this again So we could start decoding

  • instruction 3 here

  • at the same time as we're executing instruction two

  • Now there can be problems

  • This works for some instructions, but say this instruction

  • said "store this value in memory"

  • Now you've got a problem

  • You've only got one address bus and one data bus

  • so you can only access or store one thing in memory at a time

  • You can't execute a store instruction and fetch a value from memory

  • So you wouldn't be able to fetch it until the next clock cycle

  • So we fetch instruction four there

  • while executing instruction three

  • But we can't decode anything here

  • So in this clock cycle, we can decode instruction four

  • and fetch instruction five

  • but we can't execute anything

  • We've got what's called a "bubble" in our pipelines,

  • or pipeline store

  • because at this point, the design of the CPU doesn't let us

  • fetch an instruction

  • and execute an instruction at the same time

  • it's ... what is called "pipeline hazards"

  • that you can get when designing a pipeline CPU

  • because the design of the CPU doesn't let you

  • do the things you need to do at the same time

  • at the same time. So you have to

  • delay things, which means that you get a bubble

  • So, you can't quite get up to one instruction per cycle

  • efficiency

  • But you can certainly get closer

  • than you could if you just had everything

  • to do one instruction at a time.

In a previous video, we looked at how CPU's can use caches to speed up accesses to memory.

字幕與單字

影片操作 你可以在這邊進行「影片」的調整,以及「字幕」的顯示

A2 初級 英國腔

CPU內部 - Computerphile (Inside the CPU - Computerphile)

  • 13 0
    dearjane 發佈於 2021 年 01 月 14 日
影片單字