Placeholder Image

字幕列表 影片播放

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • PROFESSOR: Hey, everybody.

  • It's my pleasure once again to welcome

  • TB Schardl, who is the author of your taper compiler,

  • to talk about the Cilk runtime system.

  • TAO SCHARDL: Thanks, Charles.

  • Can anyone hear me in the back, seem good?

  • OK.

  • Thanks for the introduction.

  • Today I'll be talking about the Cilk runtime system.

  • This is pretty exciting for me.

  • This is a lecture that's not about compilers.

  • I get to talk about something a little different for once.

  • It should be a fun lecture.

  • Recently, as I understand it, you've

  • been looking at storage allocation,

  • both in the serial case as well as the parallel case.

  • And you've already done Cilk programming for a while,

  • at this point.

  • This lecture, honestly, is a bit of a non

  • sequitur in terms of the overall flow of the course.

  • And it's also an advanced topic.

  • The Cilk runtime system is a pretty complicated piece

  • of software.

  • But nevertheless, I believe you should have enough background

  • to at least start to understand and appreciate

  • some of the aspects of the design of the Cilk runtime

  • system.

  • So that's why we're talking about that today.

  • Just to quickly recall something that you're all,

  • I'm sure, intimately familiar with by this point, what's

  • Cilk programming all about?

  • Well, Cilk is a parallel programming language

  • that allows you to make your software run faster

  • using parallel processors.

  • And to use Cilk, it's pretty straightforward.

  • You may start with some serial code that

  • runs in some running time-- we'll denote that as Ts

  • for certain parts of the lecture.

  • If you wanted to run in parallel using Cilk,

  • you just insert Cilk keywords in choice locations.

  • For example, you can parallelize the outer loop

  • in this matrix multiply kernel, and that will let your code run

  • in time Tp on P processors.

  • And ideally, Tp should be less than Ts.

  • Now, just adding keywords is all you

  • need to do to tell Cilk to execute

  • the computation in parallel.

  • What does Cilk do in light of those keywords?

  • At a very high level, Cilk and specifically its runtime system

  • takes care of the task of scheduling and load

  • balancing the computation on the parallel processors

  • and on the multicore system in general.

  • So after you've denoted logical parallel in the program using

  • spawn, Cilk spawn, Cilk sync, and Cilk four,

  • the Cilk scheduler maps that computation

  • onto the processors.

  • And it does so dynamically at runtime,

  • based on whatever processing resources happen

  • to be available, and still uses a randomized work stealing

  • scheduler which guarantees that that mapping is efficient

  • and the execution runs efficiently.

  • Now you've all been using the Cilk platform for a while.

  • In its basic usage, you write some Cilk code, possibly

  • by parallelizing ordinary serial code,

  • you feed that to a compiler, you get a binary,

  • you run the binary the binary with some particular input

  • on a multicore system.

  • You get parallel performance.

  • Today, we're going to look at how exactly does Cilk work?

  • What's the magic that goes on, hidden

  • by the boxes on this diagram?

  • And the very first thing to note is that this picture

  • is a little bit--

  • the first simplification that we're going to break

  • is that it's not really just Cilk source and the Cilk

  • compiler.

  • There's also a runtime system library, libcilkrts.so, in case

  • you've seen that file or messages

  • about that file on your system.

  • And really it's the compiler and the runtime library,

  • that work together to implement Cilk's runtime system,

  • to do the work stealing and do the efficient scheduling

  • and load balancing.

  • Now we might suspect that if you just take a look at the code

  • that you get when you compile a Cilk program,

  • that might tell you something about how Cilk works.

  • Here's C pseudocode for the results when you compile

  • a simple piece of Cilk code.

  • It's a bit complicated.

  • I think that's fair to say.

  • There's a lot going on here.

  • There is one function in the original program,

  • now there are two.

  • There's some new variables, there's

  • some calls to functions that look a little bit strange,

  • there's a lot going on in the compiled results.

  • This isn't exactly easy to interpret or understand,

  • and this doesn't even bring into the picture the runtime system

  • library.

  • The runtime system library, you can find the source

  • code online.

  • It's a little less than 20,000 lines of code.

  • It's also kind of complicated.

  • So rather than dive into the code directly,

  • what we're going to do today is an attempt

  • at a top-down approach to understanding

  • how the Cilk runtime system works,

  • and some of the design considerations.

  • So we're going to start by talking about some

  • of the required functionality that we need out of the Cilk

  • runtime system, as well as some performance

  • considerations for how the runtime system should work.

  • And then we'll take a look at how the worker deques in Cilk

  • get implemented, how spawning actually works,

  • how stealing a computation works,

  • and how synchronization works within Cilk.

  • That all sound good?

  • Any questions so far?

  • This should all be review, more or less.

  • OK, so let's talk a little bit about required functionality.

  • You've seen this picture before, I hope.

  • This picture illustrated the execution model

  • of a Cilk program.

  • Here we have everyone's favorite exponential time Fibonacci

  • routine, parallelized using Cilk.

  • This is not an efficient way to compute Fibonacci numbers,

  • but it's a nice didactic example for understanding

  • parallel computation, especially the Cilk model.

  • And as we saw many lectures ago, when

  • you run this program on a given input,

  • the execution of the program can be

  • modeled as a computation dag.

  • And this computation dag unfolds dynamically

  • as the program executes.

  • But I want to stop and take a hard look

  • at exactly what that dynamic execution looks like when we've

  • got parallel processors and work stealing all coming into play.

  • So we'll stick with this Fibonacci routine,

  • and we'll imagine we've just got one processor on the system,

  • to start.

  • And we're just going to use this one

  • processor to execute fib(4).

  • And it's going to take some time to do it,

  • just to make the story interesting.

  • So we start executing this computation,

  • and that one processor is just going to execute the Fibonacci

  • routine from beginning up to the Cilk spawn statement,

  • as if it's ordinary serial code, because it

  • is ordinary serial code.

  • At this point the processor hits the Cilk spawn statement.

  • What happens now?

  • Anyone remember?

  • What happens to the dag?

  • AUDIENCE: It branches down [INAUDIBLE]

  • TAO SCHARDL: It branches downward and spawns

  • another process, more or less.

  • The way we model that--

  • the Cilk spawn is of a routine fib of n minus 1.

  • In this case, that'll be fib(3).

  • And so, like an ordinary function call,

  • we're going to get a brand new frame for fib(3).

  • And that's going to have some strand that's

  • available to execute.

  • But the spawn is not your typical function call.

  • It actually allows some other computation to run in parallel.

  • And so the way we model that in this picture

  • is that we get a new frame for fib(3).

  • There's a strand available to execute there.

  • And the continuation, the green strand,

  • is now available in the frame fib(4).

  • But no one's necessarily executing it.

  • It's just kind of faded in the picture.

  • So once the spawn has occurred, what's

  • the processor going to do?

  • The processor is actually going to dive in and start

  • executing fib(3), as if it were an ordinary function call.

  • Yes, there's a strand available within the frame of fib(4),

  • but the processor isn't going to worry about that strand.

  • It's just going to say, oh, fib(4) calls fib(3),

  • going to start computing for fib(3).

  • Sound good?

  • And so the processor dives down from pink

  • strand to pink strand.

  • The instruction pointer for the processor

  • returns to the beginning of the fib routine,

  • because we're now calling fib once again.

  • And this process repeats.

  • It executes the pink strand up until the Cilk spawn,

  • just like ordinary serial code.

  • The spawn occurs-- and we've already seen this picture

  • before--

  • the spawn allows another strand to execute in parallel.

  • But it also creates a frame for fib(2).

  • And the processor dives into fib(2),

  • resetting the instruction pointer to the beginning fib,

  • P1 executes up to the spawn.

  • Once again, we get another string to execute,

  • as well as an invocation of fib(1).

  • Processor dives even further.

  • So that's fine.

  • This is just the processor doing more or less

  • ordinary serial execution of this fib routine,

  • but it's also allowing some strands

  • to be executed in parallel.

  • This is the one processor situation,

  • looks pretty good so far.

  • Right, and in the fib(1) case, it

  • doesn't make it as far through the pink strand

  • because, in fact, we hit the base case.

  • But now let's bring in some more processors.

  • Suppose that another processor finally

  • shows up, says I'm bored, I want to do some work,

  • and decides to steal some computation.

  • It's going to discover the green strand in the frame fib(4),

  • and P2 is just going to jump in there

  • and start executing that strand.

  • And if we think really hard about what this means,

  • P2 is another processor on the system.

  • It has its own set of registers.

  • It has its own instruction pointer.

  • And so what Cilk somehow allows to happen

  • is for P2 to just jump right into the middle

  • of this fib(4) routine, which is already executing.

  • It just sets the instruction pointer

  • to point at that green instruction,

  • at the call to fib of n minus 2.

  • And it's just going to pick up where processor 1 left off,

  • when it executed up to this point in fib(4), somehow.

  • In this case, it executes fib of n minus 2.

  • That calls fib(2), creates a new strand,

  • it's just an ordinary function call.

  • It's going to descend into that new frame.

  • It's going to return to the beginning of fib.

  • All that's well and good.

  • Another processor might come along and steal another piece

  • of the computation.

  • It steals another green strand, and so once again,

  • this processor needs to jump into the middle of an executing

  • function.

  • Its instruction pointer is just going

  • to point at this call of the fib of n minus 2.

  • Somehow, it's going to have the state of this executing

  • function available, despite having independent registers.

  • And it needs to just start from this location,

  • with all the parameters set appropriately,

  • and start executing this function

  • as if it's an ordinary function.

  • It calls fib(3) minus 2 is 1.

  • And now these processors might start executing in parallel.