字幕列表 影片播放 列印英文字幕 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: Hi. I'm Srini Devadas. I'm a professor of electrical engineering and computer science. I'm going to be co-lecturing 6.006-- Introduction to Algorithms-- this term with professor Erik Domane. Eric, say hi. ERIK DOMANE: Hi. [LAUGHTER] PROFESSOR: And we hope you're going to have a fun time in 6.006 learning a variety of algorithms. What I want to do today is spend literally a minute or so on administrative details, maybe even less. What I'd like to do is to tell you to go to the website that's listed up there and read it. And you'll get all information you need about what this class is about from a standpoint of syllabus; what's expected of you; the problem set schedule; the quiz schedule; and so on and so forth. I want to dive right in and tell you about interesting things, like algorithms and complexity of algorithms. I want to spend some time giving you an overview of the course content. And then we're going to dive right in and look at a particular problem of peak finding-- both the one dimensional version and a two dimensional version-- and talk about algorithms to solve this peak finding problem-- both varieties of it. And you'll find that there's really a difference between these various algorithms that we'll look at in terms of their complexity. And what I mean by that is you're going to have different run times of these algorithms depending on input size, based on how efficient these algorithms are. And a prerequisite for this class is 6.042. And in 6.042 you learned about asymptotic complexity. And you'll see that in this lecture we'll analyze relatively simple algorithms today in terms of their asymptotic complexity. And you'll be able to compare and say that this algorithm is fasten this other one-- assuming that you have large inputs-- because it's asymptotically less complex. So let's dive right in and talk about the class. So the one sentence summary of this class is that this is about efficient procedures for solving problems on large inputs. And when I say large inputs, I mean things like the US highway system, a map of all of the highways in the United States; the human genome, which has a billion letters in its alphabet; a social network responding to Facebook, that I guess has 500 million nodes or so. So these are large inputs. Now our definition of large has really changed with the times. And so really the 21st century definition of large is, I guess, a trillion. Right? Back when I was your age large was like 1,000. [LAUGHTER] I guess I'm dating myself here. Back when Eric was your age, it was a million. Right? [LAUGHTER] But what's happening really the world is moving faster, things are getting bigger. We have the capability of computing on large inputs, but that doesn't mean that efficiency isn't of paramount concern. The fact of matter is that you can, maybe, scan a billion elements in a matter of seconds. But if you had an algorithm that required cubic complexity, suddenly you're not talking about 10 raised to 9, you're talking about 10 raised to 27. And even current computers can't really handle those kinds of numbers, so efficiency is a concern. And as inputs get larger, it becomes more of a concern. All right? So we're concerned about-- --efficient procedures-- for solving large scale problems in this class. And we're concerned about scalability, because-- just as, you know, 1,000 was a big number a couple of decades ago, and now it's kind of a small number-- it's quite possible that by the time you guys are professors teaching this class in some university that a trillion is going to be a small number. And we're going to be talking about-- I don't know-- 10 raised to 18 as being something that we're concerned with from a standpoint of a common case input for an algorithm. So scalability is important. And we want to be able to track how our algorithms are going to do as inputs get larger and larger. You going to learn a bunch of different data structures. We'll call them classic data structures, like binary search trees, hash tables-- that are called dictionaries in Python-- and data structures-- such as balanced binary search trees-- that are more efficient than just the regular binary search trees. And these are all data structures that were invented many decades ago. But they've stood the test of time, and they continue to be useful. We're going to augment these data structures in various ways to make them more efficient for certain kinds of problems. And while you're not going to be doing a whole lot of algorithm design in this class, you will be doing some design and a whole lot of analysis. The class following this one, 6.046 Designing Analysis of Algorithms, is a class that you should take if you like this one. And you can do a whole lot more design of algorithms in 6.046. But you will look at classic data structures and classical algorithms for these data structures, including things like sorting and matching, and so on. And one of the nice things about this class is that you'll be doing real implementations of these data structures and algorithms in Python. And in particular are each of the problem sets in this class are going to have both a theory part to them, and a programming part to them. So hopefully it'll all tie together. The kinds of things we're going to be talking about in lectures and recitations are going to be directly connected to the theory parts of the problem sets. And you'll be programming the algorithms that we talk about in lecture, or augmenting them, running them. Figuring out whether they work well on large inputs or not. So let me talk a little bit about the modules in this class and the problem sets. And we hope that these problem sets are going to be fun for you. And by fun I don't mean easy. I mean challenging and worthwhile, so at the end of it you feel like you've learned something, and you had some fun along the way. All right? So content wise-- --we have eight modules in the class. Each of which, roughly speaking, has a problem set associated with it. The first of these is what we call algorithmic thinking. And we'll kick start that one today. We'll look at a particular problem, as I mentioned, of peak finding. And as part of this, you're going to have a problem set that's going to go out today as well. And you'll find that in this problem set some of these algorithms I talk about today will be coded in Python and given to. A couple of them are going to have bugs in them. You'll have to analyze the complexity of these algorithms; figure out which ones are correct and efficient; and write a proof for one of them. All right? So that's sort of an example problem set. And you can expect that most of the problem sets are going to follow that sort of template. All right. So you'll get a better sense of this by the end of the day today for sure. Or a concrete sense of this, because we'll be done with lecture and you'll see your first problem set. We're going to be doing a module on sorting and trees. Sorting you now about, sorting a bunch of numbers. Imagine if you had a trillion numbers and you wanted to sort them. What kind of algorithm could use for that? Trees are a wonderful data structure. There's different varieties, the most common being binary trees. And there's ways of doing all sorts of things, like scheduling, and sorting, using various kinds of trees, including binary trees. And we have a problem set on simulating a logic network using a particular kind of sorting algorithm in a data structure. That is going to be your second problem set. And more quickly, we're going to have modules on hashing, where we do things like genome comparison. In past terms we compared a human genome to a rat genome, and discovered they were pretty similar. 99% similar, which is kind of amazing. But again, these things are so large that you have to have efficiency in the comparison methods that you use. And you'll find that if you don't get the complexity low enough, you just won't be able to complete-- your program won't be able to finish running within the time that your problem set is do. Right? Which is a bit of a problem. So that's something to keep in mind as you test your code. The fact is that you will get large inputs to run your code. And you want to keep complexity in mind as you're coding and thinking about the pseudocode, if you will, of your algorithm itself. We will talk about numerics. A lot of the time we talk about such large numbers that 32 bits isn't enough. Or 64 bits isn't enough to represent these numbers. These numbers have thousands of bits. A good example is RSA encryption, that is used in SSL, for example. And when you go-- use https on websites, RSA is used at the back end. And typically you work with prime numbers that are thousands of bits long in RSA. So how do you handle that? How does Python handle that? How do you write algorithms that can deal with what are called infinite precision numbers? So we have a module on numerics in the middle of the term that talks about that. Graphs, really a fundamental data structure in all of computer science. You might have heard of the famous Rubik's cube assignment from . 006 a 2 by 2 by 2 Rubik's cube. What's the minimum number of moves necessary to go from a given starting configuration to the final end configuration, where all of the faces-- each of the faces has uniform color? And that can be posed as a graph problem. We'll probably do that one this term. In previous terms we've done other things like the 15 puzzle. And so some of these are tentative. We definitely know what the first problem set is like, but the rest of them are, at this moment, tentative. And to finish up shortest paths. Again in terms past we've asked you to write code using a particular algorithm that finds the shortest path from Caltech to MIT. This time we may do things a little bit differently. We were thinking maybe we'll give you a street map of Boston and go figure out if Paul Revere used the shortest path to get to where he was going, or things like that. We'll try and make it fun. Dynamic programming is an important algorithm design technique that's used in many, many problems. And it can be used to do a variety of things, including image compression. How do you compress an image so the number of pixels