字幕列表 影片播放 列印英文字幕 In a previous video, we looked at how CPU's can use caches to speed up accesses to memory. So, the CPU has to fetch things from memory; it might be a bit of data, it might be an instruction And it goes through the cache to try and access it. And the cache keeps a local copy in fast memory to try and speed up the accesses But what we didn't talk about is: What does a CPU do with what it's fetched from memory what is it actually doing and how does it process it? So the CPU is fetching values from memory. We'll ignore the cache for now, because it doesn't matter if the CPU has a cache or not it's still gonna do roughly the same things And we're also gonna look at very old CPU's the sort of things that are in 8-bit machines purely because they're simpler to deal with and simpler to see what's going on The same idea is still applied to an ARM CPU today or an X86 chip or whatever it is you got in your machine. Modern CPU's use what's called the Van Neumann architecture and what this basically means is that you have a CPU and you have a block of memory. And that memory is connected to the CPU by two buses Each is just a collection of several wires that are connecting And again we're looking at old-fashioned macines. On a modern machine it gets a bit more complicated But the idea, the principle, is the same. So we have an addess bus and the idea is that the CPU can generate a number in here in binary to access any particular value in here. So we say that the first one is at adress 0 and we're gonna use a 6502 as an example We'll say that the last one is at address 65535 in decimal, or FFFF in hexadecimal So we can generate any of these numbers on 16 bits of this address bus to access any of the individual bytes in this memory How do we get the data between the two? Well we have another bus which is called the data bus, which connects the two together Now the reason why this is a Van Neumann machine is because this memory can contain both the program i.e. the bytes that make up the instructions that the CPU can execute and the data So the same block of memory contain some bytes which contain program instructions some bytes which contain data And the CPU if you wanted to could treat the program as data or treat the data as program Well if you do that then it would probably crash So what we've got here is an old BBC Micro using a 6502 CPU and we're gonna just write a very, very simple machine code program that uses well the operation is saying just to print out the letter C for computerphile So if you assemble it, we're using hexadecimal we've started our program at 084C So that's the address, were our program is being created And our program is very simple It loads one of the CPU's registers which is just basically a temporary data store that you can use and this one is called the accumulator with the ascii code 67 which represents a capital C and then it says: jump to the subroutine at this address which will print out that particular character And then we tell it we want to stop so we gotta return from subroutine. And if we run this and type in the address, so we're at ... 84C then you'll see that it prints out the letter C and then we get a prompt to carry on doing things So our program, we write it in assembly language which we can understand as humans -ish, LDA: Load Accumulator JSR: Jump to subroutine RTS: Return to subroutine You get the idea once you've done it a few times And the computer converts this into a series of numbers, in binary The CPU is working in binary but to make it easier to read we display it as hexadecimal So our program becomes: A9, 43 20 EE FF 60 That's the program we've written And the CPU, when it runs it needs to fetch those bytes from memory into the CPU Now, how does it do that? To get the first byte we need to put the address: 084C on the address bus and a bit later on, the memory will send back the byte that represents the instruction: A9 Now, how does the CPU know where to get these instructions from? Well, it's quite simple. Inside the CPU there is a register, which we call the program counter, or PC on a 6502 or something like an X86 machine it's known as the instruction pointer. And all that does is store the address to the next instruction to execute So when we were starting up here, it would have 084C in it That's the address to the instruction we want to execute So when the CPU wants to fetch the instruction it's gonna execute It puts that address on the address bus and the memory then sends the instruction back to the CPU So the first thing the CPU is gonna do to run our program is to fetch the instruction and the way it does that is by putting the address from the program counter onto the address bus and then fetching the actual instruction So the memory provides it, but the CPU then reads that in on it's input on the data bus Now it needs to fetch the whole instruction that the CPU is gonna execute and on the example we saw there it was relatively straightforward because the instruction was only a byte long Not all CPU's are that simple Some CPU's will vary these things, so this hardware can actually be quite complicated so it needs to work out how long the instruction is So it could be as short as one byte it could be as long on some CPU's as 15 bytes and you sometimes don't know how long it's gonna be until you've read at few of the bytes So this hardware can be relatively trivial So an ARM CPU makes it very, very simple it says: all instructions are 32 bits long So the Archimedes over there can fetch the instruction very, very simply 32 bits On something like an x86, it can be any length up to 15 bytes or so and so this becomes more complicated, you have to sort of work out what it is utnil you've got it But we fetch the instruction So in the example we've got, we've got A9 here So we now need to work out what A9 does Well, we need to decode it into what we want the CPU to actually do So we need to have another bit of our CPU's hardware which we're dedicating to decoding the instruction So we have a part of the CPU which is fetching it and part of the CPU which is then decoding it So it gets A9 into it: So the A9 comes into the decode And it says: Well okay, that's a load instruction. So I need to fetch a value from memory which was the 43 the ASCII code for the capital letter C that we saw earlier So we need to fetch something else from memory We need to access memory again, and we need to work out what address that's gonna be. We also then need to, once we've got that value, update the right register to store that value So we've gotta do things in sequence. So part of the Decode logic is to take the single instruction byte, or how long it is, and work out what's the sequence that we need to drive the other bits of the CPU to do And so that also means that we have another bit of the CPU which is the actual bit that does things, which is gonna be all the logic which actually executes instructions So we start off by fetching it and then once we've fetched it we can start decoding it and then we can execute it And the decode logic is responsible for saying: Put the address for where you want to get the value, that you can load into memory from and then store it, once it's been loaded into the CPU So you're doing things in order: We have to fetch it first and we can't decode it until we've fetched it and we can't execute things until we've decoded it So, at any one time, we'll probably find on a simple CPU that quite a few of the bits of the CPU wouldn't actually be doing anything So, while we're fetching the value from memory to work out how we're gonna decode it the decode and the execute logic aren't doing anything They're just sitting there, waiting for their turn And then, when we decode it, it's not fetching anything and it's not executing anything So we're sort of moving through these different states one after the other And that takes different amounts of time If we're fetching 15 bytes it's gonna take longer than if we're fetching one decoding it might well be shorter than if we're fetching something from memory, cos' this is all inside the CPU And the execution depends on what's actually happening So your CPU will work like this: It will go through each phase, then once it's done that, it'll start on the next clock tick all the CPU's are synchronized to a clock, which just keeps things moving in sequence and you can build a CPU. Something like the 6502 worked like that But, as we said, lots of the CPU aren't actually doing anything at any time which is a bit wasteful of the resources So is there another way you can do this? And the answer is yes! You can do what's called a sort of pipe-lined model of a CPU So what you do here is, you still have the same 3 bits of the CPU But you say: Okay, so we gotta fetch (f) instruction one In the next bit of time, I'm gonna start decoding this one So, I'm gonna start decoding instruction one But I'm gonna say: I'm not using the fetch logic here, so I'm gonna have this start to get things ready and, start to do things ahead of schedule I'm also at the same time gonna fetch instruction 2 So now I'm doing two things, two bit's of my CPU in use the same time I'm fetching the next instruction, while decoding the first one And once we've done decoding, I can start executing the first instruction So I execute that But at the same time, I can start decoding instruction 2 and hopefully, I can start fetching instruction 3 So what? It is still taking the same amount of time to execute that first instruction So the beauty is when it comes to executing instruction two it completes exactly one cycle after the other rather than having to wait for it to go through the fetch and decode and execute cycles we can just execute it as soon as we've finished instruction one So each instruction still takes the same amount of time it's gonna take, say, three clock cycles to go through the CPU but because we've sort of pipelined it together they actually appear to execute one after each other so it appears to execute one clock cycle after each other And we could do this again So we could start decoding instruction 3 here at the same time as we're executing instruction two Now there can be problems This works for some instructions, but say this instruction said "store this value in memory" Now you've got a problem You've only got one address bus and one data bus so you can only access or store one thing in memory at a time You can't execute a store instruction and fetch a value from memory So you wouldn't be able to fetch it until the next clock cycle So we fetch instruction four there while executing instruction three But we can't decode anything here So in this clock cycle, we can decode instruction four and fetch instruction five but we can't execute anything We've got what's called a "bubble" in our pipelines, or pipeline store because at this point, the design of the CPU doesn't let us fetch an instruction and execute an instruction at the same time it's ... what is called "pipeline hazards" that you can get when designing a pipeline CPU because the design of the CPU doesn't let you do the things you need to do at the same time at the same time. So you have to delay things, which means that you get a bubble So, you can't quite get up to one instruction per cycle efficiency But you can certainly get closer than you could if you just had everything to do one instruction at a time.