Placeholder Image

字幕列表 影片播放

  • Things we talked about a spectrum meltdown and they rely on some of the more advanced ways that the CPU operates

  • It's probably worth diving down and actually looking at how a CPU actually executes the code be right

  • I mean, we've touched on this before we did a video on pipelining we did a video on caching, but also delve down and see

  • What happens below the surface when we actually get our CPU to execute our code?

  • Let's start by having a simple example:

  • A line of code that we might want to look at what happens. Let's take a line of code that takes a variable

  • Let's take a line of code. It's gonna add up A plus B plus C plus D

  • Times e so I've written this in this sort of see like language

  • So we're gonna do this calculation now as I'm sure most of us are aware

  • When we take that and put into our C compiler run it it gets converted into the machine code that the CPU executes

  • so we take that client of code, and then we'd have to

  • Convert that into the machine code, and then the CPU

  • Executes that machine code so a program like this would end up looking and I'm going to use arm assembly here

  • Just because I know it better than the anything else perhaps for the first instruction. We would load the value for memory of a

  • Into registers, let's pick our zero. We've got 14 or so of them

  • We can use the 16 of them

  • but some of them get used for different things that we don't really use so although the value of a

  • Into our zero next thing we want to do is you want to add that to the value of B

  • Then after make sure we'll get the operator precedence right so we can load the value of B into a register

  • So let's loading the value of C

  • Here into another register

  • And we might as well do D. And E. As well so load or three come on D. And

  • We'll load our four

  • With E as well, and now we can start

  • Adding these things up multiplying them to produce

  • The actual result we want now we're going to make sure we get the precedence right

  • But we could either start by adding a and B together then add on C. And then

  • Multiply D. And E and have them together or we could do that one first

  • I'm just going to start going from left to right as long as the math is right

  • We'll get the right result so we'll add together a and B now

  • I put those two values in r0 and r1 and we need to store the results somewhere

  • We are going to need the value of a again after this, so we'll reuse the register R

  • 0 so we're saying put into R 0

  • the value of R 0

  • plus R

  • 1 so this is adding together storing the result in R. 0 so we now added a and B together

  • We want to add on C. And so we could do the same thing add

  • to R 0

  • The value in R. 0 which is now because of this instruction a plus B want to add on the value in R

  • 2 there's now about a plus B plus C in

  • Our 0 now we need to do the multiplication

  • And we need to do that separately before we add it on so we get the right result so we'll multiply

  • And we'll see we've got an arm, too cheap here, so we've got the multiply instruction there

  • And we need to put the results on whether it's use our 5 D. Which we put in R. 3 and E

  • Which we put in R?

  • 4 and then we want to add the result of that onto the value

  • In our 0 and now our 0 contains the result of a plus B. Plus C plus D times E. And

  • We could then store that

  • back into X

  • So that line of code there at one line of C code would become what 1 2 3 4 5 6 7 8 9 10

  • different lines an assembler and I've numbered them because I'm going to

  • Refer to them at different times so we can say searching one instruction 5 etc to refer to the different ones now

  • We might expect that our CPU will just xu instruction 1 the new instruction 2 instruction 3 instruction 405 and so on in order

  • To generate the result and some cpus do in fact work exactly like that, but actually if you think about

  • What the cpus and what these descriptions are actually doing you might think well actually?

  • when I get this first one I've got to go an access memory and

  • As we talked about in the caching video many years ago, cache is perhaps a an old-fashioned English word

  • but it basically just means a small place where we can store things so you might use it to store your hidden treasure if you're

  • a pirate or to store

  • Your food for winter on a modern CPU probably say around 200

  • Nanoseconds to actually go and get the value out of your main memory and load it into the register now of course

  • If these are already cached in the same bit of memory, then you may find that these all execute very quickly

  • We don't know that this isn't the only way we could write this program because if we take this instruction here instruction 6

  • Where we do the add of r0 and r1 to add up a and B. Well. We've got those two values here

  • They're already in the registers at this point in the program

  • So there's nothing to stop us moving this instruction up there

  • and it would still have exactly the same effect so instruction 6 could be moved to me between instructions 2 & 3

  • And then we do the next instruction which was the same as instruction 3 here?

  • which would be LDR

  • R to come of the values in memory that's representing the letter the variable see how exactly the same effect. We just moved that

  • Instruction earlier so you could rewrite this program in various different ways now

  • Why is that interesting?

  • well when we think about how a CPU is designed and that you will have

  • various different what impress be termed execution units within there now one of them is what's generally referred to as the

  • ALU or the arithmetic and logic unit and that's the bit of your CPU that does

  • Addition it does subtraction it does sort of logical operators and or and so on

  • But you also have other bits inside there

  • And one of the bits you'll often have in a modern CPU is it part of your CPU that handles loading and storing

  • Values from memory sometimes interact sometimes they don't now

  • Assuming that they are separate parts of the CPU if we look back at our instructions here. We execute instruction 1

  • It uses a load store. You need to get a value for memory we execute instruction 2

  • It uses the load store unit to get a value for memory instruction 3

  • It uses a load store unit to get a value for memory for uses the load store unit to get a value for memory

  • 5 uses the load store unit to get a value for a memory 6

  • changes and uses the ALU as 2 7 8 & 9

  • before insertion turn uses the load store unit so we've got a pretty sequential series the first 5

  • instructions all execute using the load store part of the CPU the next four instructions execute using the ALU and

  • The final instruction again uses the load store unit but as we said we can reorder that

  • into this version here using instructions w x y and z

  • Differentiate them and we execute the first instruction instruction w uses a load store unit instruction X

  • Uses a load store unit instruction Y uses the ALU restrictions ed uses the load store unit

  • Okay, what difference does that make well let's think about what's happening when we're using the load store unit

  • the ALU isn't being used that part of the CV is just sitting there not being used and

  • When we're using the ALU the load store units sitting there not being used, that's what we saw there

  • But does that have to be the case could we actually design it, and you probably guess the answer is that yes?

  • We can so that

  • While the load store unit say is being used that we can run the instructions on the ALU part as well

  • I'd turn the paper round and I'm going to draw

  • This as a sort of timeline so these are our two units and we've got time running along this side as well

  • I'm using the computer for our paper in a

  • Radically different orientation, but never mind, so we're going to execute the instructions

  • On here and the first thing that happens is that we execute instruction W

  • No problem

  • That's going to take certain amount of time to actually that's using the load store unit to execute it

  • These are being fetched and decoded and sort of executed by the different execution units we then execute the next instruction

  • which is

  • X and we couldn't execute this any earlier because the load store unit was being used to execute that one so no difference than what?

  • We had before we're using this one after the other we now come to

  • execute

  • The add instruction now we can't execute this any earlier than this point in time

  • Because this depends on the value of registers r0 and r1 which aren't set?

  • until this point so we need those two values so we can start doing

  • instruction why here now actually

  • It's an ADD it's not going to take as long as fetching things from memory because it's all inside the CPU so we can use

  • A smaller box and we can put instruction Y there and this depends on the value being fetched from there

  • And I'm just going to show this as an arrow here, but the next instruction load

  • r2 comma C

  • well

  • I doesn't depend on anything except the value in there Marie and our load/store units not being used

  • So if we build our CPU right? There's nothing to start that

  • Instruction being executed at the same time and that means that actually when we come to the next instruction

  • Which would be which will be the best instruction to execute next in this example. Let's go back to our program

  • We've executed instructions one to six and three already

  • That's w x y&z we've rewritten the mass let's put instruction seven here

  • What was instruction seven and this is now going to become?

  • I'm gonna have to use it's gonna become instruction a I'll hopefully remember to say instruction a but

  • You can guess the colonics are referring to a on its own is probably the variable if not is probably the instruction so we can

  • now execute

  • instruction a and again instruction a depends on two things

  • It depends on the value of R. 0 which is going to come from this instruction so we have to have that ready

  • But it also depends on the value of R

  • 2 which is coming from this instruction so we have to have that ready as well so it can actually happen any point before

  • This point in time so this would be the LDR R 2 comma dot and this is the add R

  • 0 and this is the next add, but again we can start trying to leave more the instructions because I okay well

  • That's what instruction for here at the same time. We'll call this instruction B

  • And so on we put that at that point we can execute

  • Instruction B at the same time as we do way and I'm really confusing myself with pens here and so again

  • We've saved some time because rather than having to execute that in the same thing we can do these two things at the same time

  • Now to be able to do this we need these instructions need to execute on different execution units we couldn't for example

  • Execute to add instructions at the same time because we haven't got to Al use well, though

  • There's no reason why you can build a CPU with two Al use if you look at modern

  • CPU designs from Intel AMD arm and cetera they all have often have multiple Al used or allow you to do just that

  • but because the different types we can execute them at the same time and the reason we can do that is because

  • They don't depend

  • on the results of one to work out the other so they're working on different things and they're using different parts of the CPU and

  • The CPU that enables you to do this is what's known as a superscalar

  • CPU because it can run multiple instructions at the same time will you continue doing this and we'd end up we execute instruction

  • B then we've got to execute instruction C

  • instruction D

  • uses a

  • Multiply and actually on a CPU probably got a separate execution unit which does and multiplies because you can actually do them faster that way

  • So you have a multiply unit as well so we can execute that multiply D up there

  • We think well okay?

  • Can we do the other at the same time well no because we need the result of that as well so we can then execute

  • the ad down here before finally, and it just fits on the paper like that so we can actually squash things up and

  • we're going to save some time because if you think about it you have the original order of the program and

  • Here's one. I made earlier

  • All right, or as in I'm just about to draw and Shawn will do some very clever

  • Cutting so even if we had a superscalar processor. We've only got one load store unit we've only got one

  • Al you really got one multiply unit we wouldn't have any opportunities with this program

  • To run two instructions at the same time so this version of the program would

  • Still take ten instructions this one still takes ten instructions, but with a superscalar processor we have the opportunity to sort of

  • execute two instructions at the same time because they use different bits of the CPU now you need to design the CPU to allow that

  • but that enables us to

  • Speed things up a little bit because while this is working to get the value for memory. We can execute some more

  • instructions

  • Now that's all very well and superscalar processors started to appear in

  • the mid 90s things like the six eight thousand and sixty the

  • Pentium I think was superscalar

  • But they require the code to be written in a way

  • That enables this to happen so this program wouldn't have been able to do anything

  • This one would but as we said when we were developing this we could work out which

  • Instructions we could move around to get that speed up based on

  • What those instructions depended on so this instruction?

  • We said what what six became why only depended on the values of R

  • 0 and R 1 which has been set by instructions 1 & 2 so we can move that earlier

  • Without affecting anything in our program because it only depended on those 2 values

  • so we can either do this in the compiler or by hand if you write in the assembly yourself like we just did here or

  • It's possible to let the CPU work it out, and so what a modern CPU does what's called an out of water CPU is

  • Reorders the instructions without

  • supposedly breaking the rules of

  • What each instruction does so it'll still execute it as if it was written like this?

  • And it won't change break any of the rules of that

  • but it will say well hang on it will spot that this instruction could happen earlier and

  • So move it earlier to gain some of that parallelism in fact then execute them together at the same time

  • And that works generally get well

  • But as we saw with things like Spector and meltdown if you allow things to happen too far earlier and start doing what's called speculative

  • Evaluation where you say okay?

  • I've got the stuff. I need to execute it now

  • I don't if I need the result but I might do so I'll execute it anyway, and then if I need it

  • I've already done it and if I don't need it while I was still waiting for this to come in anyway

  • So it doesn't matter that I've done it. I've not lost

  • Any time well

  • Then it's turned out that you can have sort of side channels where you can sort of see that that's happened or not

  • Which is caused a few issues with computing?

  • It goes along here like this

  • Intersects the curve somewhere else flips over and it's over here, so this is for G

  • Now we won't look at any more right the edge of a formula for this is just

  • mathematics to do with lines and the tangent of this curve

  • It's actually not very complicated the point is that what we're doing is by multiplying G

Things we talked about a spectrum meltdown and they rely on some of the more advanced ways that the CPU operates

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

A2 初級

CPU失靈 - Computerphile (CPUs Are Out of Order - Computerphile)

  • 1 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字