字幕列表 影片播放 列印英文字幕 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. CHARLES LEISERSON: It is my great pleasure to welcome Jon Bentley, now retired from Bell Labs. Jon was my PhD thesis supervisor at Carnegie Mellon. I actually had two supervisors. The other one was HT Kung, who is now at Harvard. I guess people flee Carnegie Mellon like the plague or something. So Jon is, as you know because you've studied some of his work, is a pioneer in software performance engineering. And he's going to talk today about a particularly neat piece of algorithmic engineering sets that centers around the so-called traveling salesperson problem, which is an NP-hard problem. NP-complete problem in fact. And so, without further ado, Jon, why don't you tell us what you've got to say? JON BENTLEY: As Charles mentioned, I want to talk with you-- I want to tell you a story about a cool problem. This is a problem that I first heard when I was a young nerd-- not much older than this little pile of nerds in front of me-- in high school, the traveling salesperson problem. Who here has heard of the TSP at some point? I heard about this in high school, one of the things you read about it in the math books. And a few years later, I had a chance to work on it. In 1980, I was doing some consulting, and I said, well, what you need to do is solve a TSP. Then I went home and realized that all of the stuff that I learned about it was sort of relevant but it didn't solve the problem, so I started working on it then. Our colleague, Christos Papadimitriou, who's been at Berkeley for a long time after being at a lot of other places, once told me the TSP is not a problem. It is an addiction. So I've been hooked for coming up on 40 years now. And I want to tell you one story about a really cool program I wrote. Because this is one of the-- I've been paid to be a computer programmer for coming up on 50 years, since I've been doing it for 48 years now. This is probably the most fun, the coolest program I've written over a couple day period. I want to tell you a story. Start off with what recursive generation is. Then the TSP, what it is. Then I'll start with one program, and we'll make it faster and faster and faster. Again, I spend my whole life squeezing performance. This is the biggest squeeze ever. And then some principles behind that. We'll start, though, with how do you enumerate all the elements in a set? If I want to count-- enumerate the guys between 1 and a hundred, I just count. That's no big deal. But how can I, for instance, enumerate all subsets of the set from the integers from 1 to 5? How many subsets are there of integers from 1 to 5? AUDIENCE: 2 to the 5. JON BENTLEY: Pardon me? AUDIENCE: 2 to the 5. JON BENTLEY: 2 to the 5, 32. But how do you say which ones they are? How do you go through and count them? Well, you have to decide how you represent it. You guys know all about set representations. We'll stick with bit vectors for the time being. An iterative solution is you just count-- 0, 1, 2, 3, 4, 5, up to 31. That's pretty easy. But what does it mean to count? What does it mean to go from one integer to the next? How do you go from a given integer to the next one? What's the rule for that? It's pretty easy, actually. You just scan over all the 0's, turning the-- you start at the right-hand side, the least significant digit, scan over all the 0's, turn it to 1. Oh, I lied to you. You scan over all the 1's, turning them to 0 until you get to the first 0. And then you turn that to a 1. So this one goes to 10. This one goes to 11. This one goes-- that one becomes 0, that one becomes 0. Then it becomes 00100. So a pretty easy algorithm. You could do it that way. Just scan over all the 1's, turn them to 0's, take that first one and flip it around. But that doesn't generalize nicely. We're going to see a method that generalizes very nicely. This is a recursive solution to enumerate all 2 to the n subsets of a set of size n. And the answer is this all sets of size m is just put a 0 at this end, enumerate all sets of size m minus 1. How many of these will there be? 2 to the m minus 1. How many of those 2 to the m minus 1? What do they add up to? 2 to the m. But all of these have the 0 at that end, and the one at that end. Everyone see that recursive sketch and how that works? Here's the example. A period with 0's at this end and you fill it out. You have the 1 at that and you fill that out. If you do that, you notice that in fact we're just counting backwards-- 000, 001, 010, 3, 4, 5, 6, 7. That's the algorithm. And the cool thing is the code is really simple. I could probably write a program like that in most languages and get it correct. So if m equals 0 in generate all subsets of size m, this doesn't occur at 1. You have a pointer going down the array. Otherwise, set the rightmost bit to 0, generate all subsets recursively, set it to 1, do it again recursively. That's a starting program. If you understand this, everything else is going to be pretty straightforward. If you don't, please speak up. One thing that-- you people have suffered the tragedy of 14 or 15 or 16 years of educational system that has sort of beaten the creativity out of you and you're afraid to speak up. So even if something-- even if I'm up here spouting total bullshit, you'll ignore that fact and just sort of politely stare at me and nod. But this is important. I want you to understand this. If you don't understand this, speak now or forever hold it. Anyone have any questions? Please, please. AUDIENCE: What does mean, [INAUDIBLE]?? JON BENTLEY: I'm sorry. Why did we set p to the-- AUDIENCE: [INAUDIBLE]. JON BENTLEY: So here, first I go out to the extreme rightmost and I set it to 0. Then I recursively fill those in. Then I change it from a 0 to a 1 there, and I fill all those in. So this is a program that will go through, and as it enumerates a subset, it will call the visit procedure a total of 2 to the m times, then it comes down to the bottom of the recursion. Thank you, great question. Any other questions about how this works? OK, we'll come back to this. The traveling salesperson problem. I apologize. I will really try to say the traveling salesperson problem, but I will slip because I was raised with this being the traveling salesman problem. No connotations, no intentionality there, just senility galloping along. It's a cool problem. Abraham Lincoln faced this very problem in the years 1847 to 1853 when he-- everyone here has heard of circuit courts? Why do they call them circuit courts? Because the court used to go out and ride a circuit to go to a whole bunch of cities. Now people in the cities come to the court. But back in the day, in 1847 to 1853, Lincoln and all of his homies would hop on their horses-- a judge, defense lawyers, prosecutors-- and go around and ride the circuit here. And so this is the actual route that they rode where they wanted to do this effectively. It would be really stupid to start here in Springfield and go over there, then to come back here, then to go over there back and forth. What they did was try to find a circuit that minimized the total amount of distance traveled. That is the traveling salesperson problem. We're given a set of n things. It might be a general graph. These happen to be in the plane. But you really-- helicopter service was really bad in those days, so they didn't fly there from point to point. Whether they stayed on roads, what really matters here is the graph embedded in here. I'm going to speak at this. Everything I say will be true for a graph. It will be true for geometry. I'll be sloppy about that. We'll see interfaces, how you handle both, but just cut me some slack for that. I have actually face this myself when I worked on a system where we had a big mechanical plotter and we wanted to draw these beautiful maps where the maps would fill in dots. They happened to be precincts. Some of them were red, some of them were blue. And you wanted to draw all the red dots first and go around here. And, in fact, the plotter would draw this red dot, then that red dot, then this one, then that one. The plotter took an hour to draw the map. I was consulted on this. Aha, you have a traveling salesperson problem. I went down. I reduced the length to about 1/10 of the original length. If it took an hour before, how long would it take now? Well, it took about half an hour. And the reason is that the plotter took about half of its time moving around about, 30 minutes moving around, and about 30 minutes just drawing the symbols. I didn't reduce the time drawing the symbols at all, but I reduced the time moving things around from about 30 minutes to about 3 minutes. That was still a big difference. So I fixed it there. When I worked at Bell Labs, we had drills that would go around, laser drills, move around on printed circuit boards to drill holes. They wanted to move it that way I talked to people at General Motors at one point. On any day, they might have a thousand cars go through an assembly line. Some of the cars are red, some are white, some are green, some are yellow. You have to change the paint. Some of them have V6, some of them have V8. Some of them are two doors, some of them are four doors. In what order do you want to build those cars? Well, every time you change one part of the car, you have to change the line. And what you want to do is examine, as it were, all n factorial permutations of putting the cars through and choose the one that involves the minimum amount of change. And the change from one car to another is a well-defined function. Everyone see how that weird TSP is in fact a TSP? So all of these are cool problems. Furthermore, as a computer scientist, it's the prototypical problem. It's the E. coli of algorithmic problems. It was literally one of the first problems to be proven to be NP-hard. Held-Karp gave a polynomial time algorit