Placeholder Image

字幕列表 影片播放

  • MICHAEL D. SMITH: This afternoon I have the pleasure

  • of introducing Mark Zuckerberg, which is one of our guest speakers

  • this semester to come and talk a little bit about computer science

  • in the real world.

  • As most of you probably know, as you guys all do this much more

  • than I do, founder of Facebook.com, which is a social networking

  • program, whatever you want to call it.

  • Used at over 2000 schools across the nation, and possibly the world too.

  • Is it the world too, or just the nation?

  • >> MARK ZUCKERBERG: [INAUDIBLE].

  • >> MICHAEL D. SMITH: OK.

  • So good influence for doing some things in computer science.

  • He's going to tell us some of the background of it

  • and what's been important and so forth.

  • So please join me in welcoming.

  • >> MARK ZUCKERBERG: Yo.

  • All right, cool.

  • This is the first time I've ever had to hold one of these things.

  • So I'm just going to attach it really quickly, one second.

  • All right.

  • Can you hear?

  • Is this good?

  • Is this amplified at all?

  • >> AUDIENCE: Yeah.

  • MARK ZUCKERBERG: All right.

  • Sweet.

  • This is like one of the first times I've been to a lecture at Harvard.

  • I guess what's probably going to be most useful for you guys is if I just

  • take you through some of the courses that I took at Harvard where I actually

  • did go to lecture sometimes.

  • I was joking.

  • And sort of, like, how different decisions

  • that I had to make when I was moving along with Facebook

  • got impacted by different stuff that I was learning in the classes

  • that I was taking.

  • And if all goes according to plan, then maybe some of you guys

  • will come out of this thinking that taking CS or engineering stuff

  • at Harvard is actually sort of useful.

  • So that's the game plan.

  • >> I think that this is slotted for two hours.

  • There's no way I'm going to speak for two hours.

  • I'll probably speak for like 20 minutes, or 15 minutes,

  • and then I'll just let you guys ask questions.

  • Because I'm sure you guys have more interesting stuff

  • to ask me than I could come up with to talk about myself.

  • >> So I guess I'll just kind of get started.

  • When I was here, I started off taking 121.

  • I never actually took 50.

  • You should have gotten the other guy who was

  • doing Facebook, Dustin Moskovitz, who was my roommate.

  • When we got started the site was written in PHP, which isn't something

  • that you learned in one of these classes.

  • But fortunately, if you have a good background in C,

  • the syntax is very similar, and you can pick it up in a day or two.

  • >> So I started writing the site and launched it at Harvard

  • in February 2004.

  • So I guess almost two years ago now.

  • And within a couple of weeks, a few thousand people had signed up.

  • And we started getting some emails from people

  • at other colleges asking for us to launch it at their schools.

  • >> And I was taking 161 at the time.

  • So I don't know if you guys know the reputation of that course,

  • but it was kind of heavy.

  • It was a really fun course, but it didn't leave me with much time

  • to do anything else with Facebook.

  • So my roommate Dustin, who I guess had just finished CS50,

  • was like, hey, I want to help out.

  • I want to do the expansion and help you figure out how to do the stuff.

  • So I was like, you know, that's pretty cool dude,

  • but you don't really know any PHP or anything like that.

  • So that weekend he went home, bought the book Perl for Dummies,

  • came back and was like, alright, I'm ready to go.

  • I was like dude, the site is written in PHP, not Perl, but you know,

  • that's cool.

  • >> So he picked up PHP over a few days because, I

  • promise that if you have a good background in C, then

  • PHP is a very simple thing to pick up.

  • And he just kind of went to work.

  • So I mean, the first big decision that we really had to make

  • was in how to kind of expand the architecture

  • to go from the single school type set up that we had when it was just at Harvard

  • to something that supported multiple schools.

  • >> So this was a decision that had to be made on a bunch of levels,

  • both in the product and how we wanted privacy to work,

  • but I think that one really important decision that's

  • helped us scale pretty well is how we decided to distribute the data.

  • >> So I don't know how much of complexity stuff like big O notation you guys

  • in this class.

  • So I mean, one of the most complicated computations that we do on the site

  • is the computation to tell how you're connected to people.

  • >> Because if you can imagine, that's stored

  • as sort of a series of undirected-- it's not weighted-- so undirected,

  • unweighted pairs of ID numbers of people in the database.

  • Then if you want to figure out who is friends with someone,

  • you have to look at all their friends.

  • Right?

  • So that's maybe like 100 or 200 people.

  • >> But then if you want to figure out who's a friend of a friend,

  • or what the closest connection is there, then you kind of

  • have to look at the 100 or 200 friends of each of those friends.

  • So it becomes at each level there's another factor of n multiplied n, where

  • n is the number of friends that each of your friends has.

  • So you can see that this kind of becomes exponentially

  • difficult to solve for the shortest path between people.

  • So if you're just looking for a friend of a friend, that's n squared.

  • If you're looking for a friend of a friend of a friend, that's n cubed.

  • And that's something that traditionally was

  • pretty difficult for a lot of the predecessor sites to Facebook.

  • And for example Friendster had large problems with this

  • because they were trying to compute paths six degrees out,

  • or like seven degrees out.

  • >> And that's something that when you're doing like n seventh,

  • that just is really very hard and it took down their site for a while.

  • So one of things that we kind of had in mind when we were figuring out

  • how to do this was how do you distribute the database in such a way

  • that this computation becomes manageable.

  • >> So what we decided was that everyone on the site

  • does most of their activity at the school that they're kind of based at.

  • So if you're at Harvard, then most of the people

  • who you're going to be seeing and transacting with on the site

  • are going to be at Harvard.

  • It's actually probably like 90% of the stuff that you do on the site.

  • >> So we decided to split up the databases and create

  • one instance of MySQL database for each school in the network.

  • And in doing that, if you notice the paths that we compute

  • are only within the school.

  • So instead of say, like now we're at six million users,

  • and instead of having to do n cubed over some portion of six million,

  • it's just n cubed over 10,000, which is a much more

  • manageable type of computation.

  • >> So that was sort of the first big architectural decision

  • that we had to make that contributed to us not dying a few months later.

  • And it was probably a pretty important one.

  • >> So when we first set up the site we had just one computer that we were running.

  • It wasn't in our dorm room.

  • We were renting it.

  • I kind of learned my lesson for trying to run a site out of my dorm

  • room a few months earlier, and Harvard almost tried to kick me out.

  • >> So I ended up renting a server off site this time.

  • And I guess running originally the database and the web server.

  • So Apache is what we were using in this instance

  • to serve the pages from the same machine.

  • And because we distributed the databases in the way that we did,

  • we were able to, as time went on, just add more machines linearly and sort of

  • grow the site without having any kind of exponential expansion

  • on the amount of machinery that we had.

  • >> But after we hit about like 30 or 50 schools,

  • we started realizing that we could start getting more performance out

  • of MySQL or Apache.

  • Some of the way that stuff was set up just wasn't as optimal as it could.

  • >> So for example, when you have MySQL machines and Apache

  • running on the same server, then if something happens to that server,

  • then not only does the database for that school or the schools

  • on that server just stop kind of responding

  • in a way that will get you anything useful,

  • but you can't even load any web pages.

  • So you get page not founds.

  • And that kind of sucks.

  • >> But another issue is that the variance and the use from school to schools

  • is also not going to be perfect.

  • So some schools are always going to have heavier use.

  • We have schools now like Penn State that have 50,000 users.

  • And then the majority of the schools still have less than 2000 users.

  • Because there's a lot of small schools and a lot of schools