字幕列表 影片播放 列印英文字幕 MICHAEL D. SMITH: This afternoon I have the pleasure of introducing Mark Zuckerberg, which is one of our guest speakers this semester to come and talk a little bit about computer science in the real world. As most of you probably know, as you guys all do this much more than I do, founder of Facebook.com, which is a social networking program, whatever you want to call it. Used at over 2000 schools across the nation, and possibly the world too. Is it the world too, or just the nation? >> MARK ZUCKERBERG: [INAUDIBLE]. >> MICHAEL D. SMITH: OK. So good influence for doing some things in computer science. He's going to tell us some of the background of it and what's been important and so forth. So please join me in welcoming. >> MARK ZUCKERBERG: Yo. All right, cool. This is the first time I've ever had to hold one of these things. So I'm just going to attach it really quickly, one second. All right. Can you hear? Is this good? Is this amplified at all? >> AUDIENCE: Yeah. MARK ZUCKERBERG: All right. Sweet. This is like one of the first times I've been to a lecture at Harvard. I guess what's probably going to be most useful for you guys is if I just take you through some of the courses that I took at Harvard where I actually did go to lecture sometimes. I was joking. And sort of, like, how different decisions that I had to make when I was moving along with Facebook got impacted by different stuff that I was learning in the classes that I was taking. And if all goes according to plan, then maybe some of you guys will come out of this thinking that taking CS or engineering stuff at Harvard is actually sort of useful. So that's the game plan. >> I think that this is slotted for two hours. There's no way I'm going to speak for two hours. I'll probably speak for like 20 minutes, or 15 minutes, and then I'll just let you guys ask questions. Because I'm sure you guys have more interesting stuff to ask me than I could come up with to talk about myself. >> So I guess I'll just kind of get started. When I was here, I started off taking 121. I never actually took 50. You should have gotten the other guy who was doing Facebook, Dustin Moskovitz, who was my roommate. When we got started the site was written in PHP, which isn't something that you learned in one of these classes. But fortunately, if you have a good background in C, the syntax is very similar, and you can pick it up in a day or two. >> So I started writing the site and launched it at Harvard in February 2004. So I guess almost two years ago now. And within a couple of weeks, a few thousand people had signed up. And we started getting some emails from people at other colleges asking for us to launch it at their schools. >> And I was taking 161 at the time. So I don't know if you guys know the reputation of that course, but it was kind of heavy. It was a really fun course, but it didn't leave me with much time to do anything else with Facebook. So my roommate Dustin, who I guess had just finished CS50, was like, hey, I want to help out. I want to do the expansion and help you figure out how to do the stuff. So I was like, you know, that's pretty cool dude, but you don't really know any PHP or anything like that. So that weekend he went home, bought the book Perl for Dummies, came back and was like, alright, I'm ready to go. I was like dude, the site is written in PHP, not Perl, but you know, that's cool. >> So he picked up PHP over a few days because, I promise that if you have a good background in C, then PHP is a very simple thing to pick up. And he just kind of went to work. So I mean, the first big decision that we really had to make was in how to kind of expand the architecture to go from the single school type set up that we had when it was just at Harvard to something that supported multiple schools. >> So this was a decision that had to be made on a bunch of levels, both in the product and how we wanted privacy to work, but I think that one really important decision that's helped us scale pretty well is how we decided to distribute the data. >> So I don't know how much of complexity stuff like big O notation you guys in this class. So I mean, one of the most complicated computations that we do on the site is the computation to tell how you're connected to people. >> Because if you can imagine, that's stored as sort of a series of undirected-- it's not weighted-- so undirected, unweighted pairs of ID numbers of people in the database. Then if you want to figure out who is friends with someone, you have to look at all their friends. Right? So that's maybe like 100 or 200 people. >> But then if you want to figure out who's a friend of a friend, or what the closest connection is there, then you kind of have to look at the 100 or 200 friends of each of those friends. So it becomes at each level there's another factor of n multiplied n, where n is the number of friends that each of your friends has. So you can see that this kind of becomes exponentially difficult to solve for the shortest path between people. So if you're just looking for a friend of a friend, that's n squared. If you're looking for a friend of a friend of a friend, that's n cubed. And that's something that traditionally was pretty difficult for a lot of the predecessor sites to Facebook. And for example Friendster had large problems with this because they were trying to compute paths six degrees out, or like seven degrees out. >> And that's something that when you're doing like n seventh, that just is really very hard and it took down their site for a while. So one of things that we kind of had in mind when we were figuring out how to do this was how do you distribute the database in such a way that this computation becomes manageable. >> So what we decided was that everyone on the site does most of their activity at the school that they're kind of based at. So if you're at Harvard, then most of the people who you're going to be seeing and transacting with on the site are going to be at Harvard. It's actually probably like 90% of the stuff that you do on the site. >> So we decided to split up the databases and create one instance of MySQL database for each school in the network. And in doing that, if you notice the paths that we compute are only within the school. So instead of say, like now we're at six million users, and instead of having to do n cubed over some portion of six million, it's just n cubed over 10,000, which is a much more manageable type of computation. >> So that was sort of the first big architectural decision that we had to make that contributed to us not dying a few months later. And it was probably a pretty important one. >> So when we first set up the site we had just one computer that we were running. It wasn't in our dorm room. We were renting it. I kind of learned my lesson for trying to run a site out of my dorm room a few months earlier, and Harvard almost tried to kick me out. >> So I ended up renting a server off site this time. And I guess running originally the database and the web server. So Apache is what we were using in this instance to serve the pages from the same machine. And because we distributed the databases in the way that we did, we were able to, as time went on, just add more machines linearly and sort of grow the site without having any kind of exponential expansion on the amount of machinery that we had. >> But after we hit about like 30 or 50 schools, we started realizing that we could start getting more performance out of MySQL or Apache. Some of the way that stuff was set up just wasn't as optimal as it could. >> So for example, when you have MySQL machines and Apache running on the same server, then if something happens to that server, then not only does the database for that school or the schools on that server just stop kind of responding in a way that will get you anything useful, but you can't even load any web pages. So you get page not founds. And that kind of sucks. >> But another issue is that the variance and the use from school to schools is also not going to be perfect. So some schools are always going to have heavier use. We have schools now like Penn State that have 50,000 users. And then the majority of the schools still have less than 2000 users. Because there's a lot of small schools and a lot of schools