字幕列表 影片播放 列印英文字幕 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. CHARLES E. LEISERSON: OK, let's get started. On Thursday, we are really privileged to have Jon Bentley, who is one of the masters of performance engineering, come and give us a guest lecture. In 1982, he wrote this wonderful little book from which we adapted the Bentley rules that you folks have seen. And he's also famous for things like kd-trees. Who's seen kd-trees. trees? Yeah, so he invented kd-trees. And who's ever used the master method of recurrence solving? He invented the master method of recurrence solving. And so he's just a wonderful, wonderful fellow, with lots and lots of insights, and just a superior performance engineer. And he's going to give a guest lecture on Thursday. And so I would encourage you to bring your friends. Bring friends maybe who dropped the class, and others who if any of you have UROPS, other people in your research group, whether they're graduate students, they're all welcome to come. Because this is really one of the opportunities you get to actually meet one of the legends of the field, if you will. And you'll see he's very approachable. He's very approachable. So anyway, my advertisement for Jon Bentley. Good. So let's start. I want to talk about speculative parallelism for a little bit, because that's kind of what you need to make the code go fast. So if you look at code like alpha beta, it turns out that it's inherently serial. That is, if you try to do things in parallel, then you missed the cutoffs. And if you missed the cutoffs, then you start doing work that you wouldn't necessarily need to do. And so the only way to sort of make this type of code run fast is to use speculation. Which is to say, that you're going to guess that you can do stuff in parallel and it'll be worthwhile, and then, occasionally, you're going to be wrong. But the trick is, well, how do you minimize the wasted effort when things go wrong? So let's just take a look at a very simple example of speculative parallelism, which is thresholding a sum. So here we have a little program that is returning a boolean. It takes as input an array of size n. So thresholding a sum-- we have an array of length n and a limit. And these are all unsigned integers, say. And I start out the sum at 0, and what I'm going to do is add up all the elements of the array. And if it exceeds the limit, then I'm going to return. And so how might I optimize this code a little bit? Yeah. AUDIENCE: Split up [INAUDIBLE] four processes, [INAUDIBLE] split up an array [INAUDIBLE]. CHARLES E. LEISERSON: No, let's say on processor. On one processor, what might you do? AUDIENCE: In the middle of the loop, you can check [INAUDIBLE]. CHARLES E. LEISERSON: Yeah, once you check so that once you would exceed it, why keep adding the other numbers? So here's a code that does that-- quit early if the partial product ever exceeds the threshold. This isn't necessarily an optimization. Because notice that now in the optimization, not only do I have a memory reference and an addition, I also now have an unpredictable branch. Actually, it's predictable branch-- predictable branch. But it's still one more step, because it's always going to predict that it's not exceeding until it actually does exceed. So that'll be pretty predictable. But still, I've added something into the loop so that maybe slower. How might I mitigate that problem? So I don't want to add too much into the inner loop. Yeah. AUDIENCE: It could have an inner loop that goes [INAUDIBLE] them. CHARLES E. LEISERSON: Yeah. So basically, I can do what's called strip mining. I replace the loop of n iterations with a loop of, let's say, n over four iterations, with an inner loop of four iterations. And every fourth iteration, I check to see whether or not I've exceeded, so that the cost of the check is going to be relatively small at that point. So are there ways of making this sort of go fast. So now, we want to make this operate in parallel. And the problem when I operate in parallel is that as I'm adding stuff up, I want to do it. So I'm going to do this here with a reducer. And so, basically, I'm going to add up the values in the reducer. But now, I'd like to do the same thing of being able to quit early. And so the question is, well, how can we parallelize a short-circuited loop? And so, the way I'm going to do this-- and this is sort of by hand here, but it's going to give you-- so actually, as you know, underneath the loop is really a divide and conquer loop. And so I could write it as a parallel loop. And now, it becomes, I think, a little bit clearer how I could, in this case, what I'm going to do is return the value of the sum. And now the question is, well, how can I quit early and save the work if I exceed the threshold? Understanding that when I'm executing this and something like Cilk, it's going to tend to go down one path. And often, it's going to be a serial execution. So I'd like if it's possible. So here's one way of doing it, which is that I add an abort flag to the code, which is going to say whether I should keep going or whether I've actually exceeded at that point. And so I'm going to use that recursively. And now, I take a look, for each time through the loop-- or through the divide and conquer-- to see whether the sum is greater than a limit. And I haven't you know already aborted the flag, then I set the abort flag to be true. Why do I bother testing the abort flag before I set it to true? So notice that setting the abort flag is a race, isn't it-- a determinancy race. Because-- great. Thank you. It's because I have the stuff operating in parallel is all trying to set the abort flag whenever something aborts. I can have two guys who are in parallel setting the abort flag. But if they're setting it, they're setting it to the same value. So it's a benign race, assuming your memory model is such that you can actually set the values. So what happens here when-- why do I why do you bother checking this? So what happens when several guys in parallel want to write to the same variable? This is quiz 1 stuff. Yeah. AUDIENCE: Cache bouncing. CHARLES E. LEISERSON: Yeah. You can have it bouncing along the cache. It will serialize it. Because if they're all trying to write, then they have to get exclusive access for it to write and modify it. And so that happens-- boom, boom, boom-- one at a time. And so by checking it first, it can be in a shared state, and then one guy clobbers it, and then it will update all the other ones. So it makes it so we don't continually have a true sharing race.