Placeholder Image

字幕列表 影片播放

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • CHARLES E. LEISERSON: OK, let's get started.

  • On Thursday, we are really privileged to have

  • Jon Bentley, who is one of the masters of performance

  • engineering, come and give us a guest lecture.

  • In 1982, he wrote this wonderful little book

  • from which we adapted the Bentley

  • rules that you folks have seen.

  • And he's also famous for things like kd-trees.

  • Who's seen kd-trees. trees?

  • Yeah, so he invented kd-trees.

  • And who's ever used the master method of recurrence solving?

  • He invented the master method of recurrence solving.

  • And so he's just a wonderful, wonderful fellow,

  • with lots and lots of insights, and just a superior performance

  • engineer.

  • And he's going to give a guest lecture on Thursday.

  • And so I would encourage you to bring your friends.

  • Bring friends maybe who dropped the class,

  • and others who if any of you have UROPS, other people

  • in your research group, whether they're graduate students,

  • they're all welcome to come.

  • Because this is really one of the opportunities

  • you get to actually meet one of the legends of the field,

  • if you will.

  • And you'll see he's very approachable.

  • He's very approachable.

  • So anyway, my advertisement for Jon Bentley.

  • Good.

  • So let's start.

  • I want to talk about speculative parallelism for a little bit,

  • because that's kind of what you need to make the code go fast.

  • So if you look at code like alpha beta,

  • it turns out that it's inherently serial.

  • That is, if you try to do things in parallel,

  • then you missed the cutoffs.

  • And if you missed the cutoffs, then you

  • start doing work that you wouldn't necessarily

  • need to do.

  • And so the only way to sort of make this type of code run

  • fast is to use speculation.

  • Which is to say, that you're going

  • to guess that you can do stuff in parallel

  • and it'll be worthwhile, and then, occasionally, you're

  • going to be wrong.

  • But the trick is, well, how do you minimize the wasted effort

  • when things go wrong?

  • So let's just take a look at a very simple example

  • of speculative parallelism, which is thresholding a sum.

  • So here we have a little program that is returning a boolean.

  • It takes as input an array of size n.

  • So thresholding a sum-- we have an array

  • of length n and a limit.

  • And these are all unsigned integers, say.

  • And I start out the sum at 0, and what I'm going to do

  • is add up all the elements of the array.

  • And if it exceeds the limit, then I'm going to return.

  • And so how might I optimize this code a little bit?

  • Yeah.

  • AUDIENCE: Split up [INAUDIBLE] four processes, [INAUDIBLE]

  • split up an array [INAUDIBLE].

  • CHARLES E. LEISERSON: No, let's say on processor.

  • On one processor, what might you do?

  • AUDIENCE: In the middle of the loop,

  • you can check [INAUDIBLE].

  • CHARLES E. LEISERSON: Yeah, once you

  • check so that once you would exceed it,

  • why keep adding the other numbers?

  • So here's a code that does that-- quit early

  • if the partial product ever exceeds the threshold.

  • This isn't necessarily an optimization.

  • Because notice that now in the optimization,

  • not only do I have a memory reference and an addition,

  • I also now have an unpredictable branch.

  • Actually, it's predictable branch--

  • predictable branch.

  • But it's still one more step, because it's always

  • going to predict that it's not exceeding until it actually

  • does exceed.

  • So that'll be pretty predictable.

  • But still, I've added something into the loop

  • so that maybe slower.

  • How might I mitigate that problem?

  • So I don't want to add too much into the inner loop.

  • Yeah.

  • AUDIENCE: It could have an inner loop

  • that goes [INAUDIBLE] them.

  • CHARLES E. LEISERSON: Yeah.

  • So basically, I can do what's called strip mining.

  • I replace the loop of n iterations with a loop of,

  • let's say, n over four iterations,

  • with an inner loop of four iterations.

  • And every fourth iteration, I check

  • to see whether or not I've exceeded,

  • so that the cost of the check is going to be relatively

  • small at that point.

  • So are there ways of making this sort of go fast.

  • So now, we want to make this operate in parallel.

  • And the problem when I operate in parallel

  • is that as I'm adding stuff up, I want to do it.

  • So I'm going to do this here with a reducer.

  • And so, basically, I'm going to add up

  • the values in the reducer.

  • But now, I'd like to do the same thing

  • of being able to quit early.

  • And so the question is, well, how

  • can we parallelize a short-circuited loop?

  • And so, the way I'm going to do this-- and this

  • is sort of by hand here, but it's

  • going to give you-- so actually, as you

  • know, underneath the loop is really a divide

  • and conquer loop.

  • And so I could write it as a parallel loop.

  • And now, it becomes, I think, a little bit clearer

  • how I could, in this case, what I'm

  • going to do is return the value of the sum.

  • And now the question is, well, how

  • can I quit early and save the work if I exceed the threshold?

  • Understanding that when I'm executing

  • this and something like Cilk, it's

  • going to tend to go down one path.

  • And often, it's going to be a serial execution.

  • So I'd like if it's possible.

  • So here's one way of doing it, which

  • is that I add an abort flag to the code, which

  • is going to say whether I should keep going

  • or whether I've actually exceeded at that point.

  • And so I'm going to use that recursively.

  • And now, I take a look, for each time through the loop--

  • or through the divide and conquer--

  • to see whether the sum is greater than a limit.

  • And I haven't you know already aborted the flag,

  • then I set the abort flag to be true.

  • Why do I bother testing the abort flag

  • before I set it to true?

  • So notice that setting the abort flag is a race,

  • isn't it-- a determinancy race.

  • Because-- great.

  • Thank you.

  • It's because I have the stuff operating in parallel

  • is all trying to set the abort flag whenever something aborts.

  • I can have two guys who are in parallel setting

  • the abort flag.

  • But if they're setting it, they're

  • setting it to the same value.

  • So it's a benign race, assuming your memory model is such

  • that you can actually set the values.

  • So what happens here when--

  • why do I why do you bother checking this?

  • So what happens when several guys in parallel

  • want to write to the same variable?

  • This is quiz 1 stuff.

  • Yeah.

  • AUDIENCE: Cache bouncing.

  • CHARLES E. LEISERSON: Yeah.

  • You can have it bouncing along the cache.

  • It will serialize it.

  • Because if they're all trying to write,

  • then they have to get exclusive access for it

  • to write and modify it.

  • And so that happens-- boom, boom, boom-- one at a time.

  • And so by checking it first, it can be in a shared state,

  • and then one guy clobbers it, and then

  • it will update all the other ones.

  • So it makes it so we don't continually

  • have a true sharing race.