CS50 IDE的容器化和編排--2019年雲原生革命。 (Containerization and Orchestration of CS50 IDE - Cloud Native Revolution 2019)

字幕列表影片播放

So our next presenters are ready.
We are having Dr David Milan and Kareem Zidane from Harvard University.
Dr.
David J.
Milan is a Gordon McKay professor of the practice of computer science at Harvard University.
He teaches computer Science 50 otherwise known as CS 50 which is Harvard University's largest course, One of Yo University's largest courses and Ed X's largest massive own line are massive, open online course.
He also teaches that harvest Harvard Business School, Harvard Law School, Harvard Extension School and Harvard Summer School.
Karim's a Dane, is a software developer, system administrator and teaching fellow four CS 50 at Harvard University.
He is a self taught, a self taught programmer from Egypt who discovered computer science, including CS 50 itself online.
He is the chief architect of the CS 50 i.
D.
E.
The fun fact for the both of them.
When asked, what food would they never get tired of eating?
Dr.
David responded.
Seafood and Kareem responded.
Italian food.
Please join me in welcoming Dr David Milan.
And Kareem is the Dane.
Thank you.
I see it there, but not here.
Okay.
Thank you all so much.
I'll admit this is kind of intimidating.
Coming just minutes after a talk on whether we're alone in the universe.
Our talk is I'd actually little more down to earth.
We're here to talk about integrated development environment, and my name is David Malin.
I teach an introductory course at Harvard called CS 50 Computer Science 50 which is our introduction to computer science for majors and non majors.
And I'm here with a colleague Karim sedan, who was actually a student of ours online years ago.
But it's since joined us in Cambridge, Massachusetts, to work on the very set of problems that we're here to talk a little bit about.
So I thought I'd t up what it is we do and what problems it is that we aspire to solve.
And Kareem walk us through some of these solutions to that ultimately, so the start of the semester.
We have some 800 students in this undergraduate class.
It happens to be Harvard's largest class, predominantly non majors.
But computer science is now the second largest major at Harvard, and so it's a pretty large demographic of students.
We have each term the course.
We similarly make freely available to anyone online, not only as a massive open online course via FedEx.
It that's your l, including all of the materials, all of software, all of the sample solutions.
And like but also all of the course.
Where is available is open courseware in the spirit of M.
I.
T.
Is tradition such that anyone on the Internet can download an access any of these materials att all beyond, Though being just available to students online, we focused really on the community aspect and building up local support structure, so to speak around the world, each red dot above represents high school that happens to have adapted our adopted.
This course is 50 in some form, and each of the blue dots represents a university where some faculty or student group or non profit in the area has similarly done the same.
To provide a localized in person support structure for students following along and teaching themselves computer science.
The scale then that we operate on is fairly large, not only on campus but off on campus.
We might have 1000 students among the undergraduate students, Harvard's Extension School and also down the road at Yale University, where we have a cohort of students as well and then online.
Over the past several years have we had some one million registrants in the course, not all of whom engaged at any given moment in time?
But quite a few do.
In fact, on any given day might we have several 1000 students using the courses infrastructure, cloud based infrastructure that will soon address Andi?
Over the course of a month, we might have some 30,000 active participants banging on the servers and the virtual machines.
And now the containers that we use to support those students work to give you a sense of where it all began, though it all began much more modestly, much more locally in Cambridge, Massachusetts.
So back in the day late eighties early nineties, we just had a non clamp on campus cluster of servers that students each had home directories on user names and if it's mounted home directories in which they could do their work, our class is predominately software focused.
We tend to use tools like G, C, C and G, D B and the like it familiar to focus on sea and then, later in the semester to be transition to other languages like Python.
So it sufficed early on and to this day to really just give students a terminal window.
And so back in the day, they all tell, netted two until the world figured out Sshh, which point the SS H to centralize servers where they would do all of their work and submit all of their work.
And this was implemented on campus with a bunch of UNIX servers in front of which was a load balancer of some sort.
And then students were round robin distributed among the load there.
The fact, though, that the university ran this meant that, frankly, the software was always several years out of date.
We, as an individual course, did not have administrative access or sudo access to any of the accounts.
It was hard, frankly, to support students optimally if we just didn't have the software and the access that we wanted to be able to answer questions, help them die, diagnose bugs and the like.
And so cloud computing caught our eye pretty early on when AWS was just gaining steam for us.
Back in 2008 we hopped on board E.
C to the elastic compute cloud and at the time there were terribly few.
Service is on Amazon's menu.
There was no elastic load balancing.
There was no auto scaling or the like.
Anything you wanted to build.
You had to build yourself by wiring together some of the more primitive service's.
And so we did exactly that.
Pretty much re architect ing that same topology this time, though, in the cloud gaining for ourselves, sudo access and the ability to scale it up and down.
Depending on the day or the night of the week.
When a homework assignment was do, however, there was not without costs.
In fact, I think we appreciate it a little too long in the process that the wonderful thing Harvard was providing was system administrators who used to run all of this for us.
So all of a sudden, now it was us working not only 9 to 5 during business hours, but also 5 to 9 when the students were actually on these systems doing their work, particularly late into the night.
So not without some hidden human costs early on.
Also at the same time, because of the open courseware community, we had no ability to provide students online really, with their own Harvard University counts just by nature of not being able to scale and for reasons of privacy and security.
So they weren't able to tell another SS H to the same clusters on campus or off.
And so we eventually transition to a client side, virtual machine and appliance that students could download.
And so for several years we gave them virtual box.
Of'em were fusion or workstation with, like so they downloaded in environment onto their own Max or PC that they then did that all that same work it gave us.
Additionally, a nice graphical interface, which opened up new problems saying capabilities, since we were no longer restricted to just a command line environment.
But it, too, was not without its challenges.
Frankly, virtual box back in the day wasn't terribly reliable, Huh?
Was not uncommon for students at the end of the work.
I did the day in middle of a P set, close their laptop and boom virtual hard disk bricked because it wasn't on mounted correctly.
And so those kinds of headaches and windows and Mac OS particulars just got in the way of maintaining a common infrastructure for students in a way that just worked.
And so we transitioned ultimately to a cloud based idea in 2015.
And that's where we are now and where the story picks up where we went back into the cloud.
In this time, though focused more on software is a service using an open source tool called Cloud nine, formally implemented by a team at a start up.
Now part of Amazon Web service is and began to add some pedagogical motivated JavaScript base plug ins to this environment and ultimately to build out the back end that provided students with the same illusion that they had access to their own account or really their own service in the cloud.
For those unfamiliar, this is a screenshot of cloud nine or specifically see its fifties incarnation thereof with some pedagogical simplifications and you my modifications.
It provides students with a familiar tab based code editor like you would find on any modern system provides the most importantly with a terminal window, so they have full fledged access to the underlying system.
Ah, file browser A top left, but then also some or teacher friendly and programmer friendly features like the ability to chat in real time with someone with whom I met Pierre programming or a teacher who might want to help them remotely.
It provides them with an interactive graphical do bugger that provides them with the ability to step through their code line by line, the ability to roll back in time on a version control but this time graphically so students can see where they left off just minutes ago.
And ultimately, here's a screenshot of that same to bugger that allows them step step through stack frames and the like.
But Amazon eyes Cloud 92 out of the box connects on Lee by default to easy to servers, and it's designed to do this so that you start up your only see two server.
You run this Web based E Y, and from there, you get to do anything you want in it.
By way of this Kyle based I D.
So it was up to us, though, to decide how best to provide students with the same you why we didn't want them signing up for AWS with their own credit cards and the like administering their own servers because we would be regressing back to that point where we didn't have that same access ourselves.
So we tried a number of versions off solutions to this to build out a cloud based back end for Amazon's front end, including easy to four, compute an s3 for storage, then s3 alone or DBS for storage, then as three alone for persistent storage.
Until, finally, did we happen on Kubernetes and Dr Containers more generally, which was an extension of what we were already starting to do on campus for our own internal development.
And that's the story where Karim will now pick up as to the problems we encountered along the way and the solutions we ultimately forged using kubernetes and containers for CS 50.
Thank you, David.
So among the reasons why we adopted Cloud nine as the base soft Earth and Sea City I D E is that one.
It has a nice plug in model, and it allows us to extend the software in a way that was quite handy for students who are first coming into a computer science or programming.
It might be overwhelming for them to see the actual interface of cloud nine more other ideas so let me explain briefly how Clyde nine count kind of works.
Cloud nine.
So has the U I that we So a moments moments ago.
And then there's the compute resource that this you I connects to where the actual programs are running, where the actual files our store and this connection happens to be in a stage connection course by default in case of cloud nine This connection.
This this compute resource happens to be an easy two innocents by the pole.
So our first approach was to basically built her own very simple, very basic orchestration system that uses easy to for compute and uses GPS for restored.
So we would have a pool of easy two instances and a bunch of users in some availability zone on AWS, where each user gets one E.
B s volume.
And then we would basically reuse these easy to innocence is or like, terminate them, bring some up between between different users.
Of course, if we zoom out of Livermore, we have more than one availabilities on nutrition and we have multiple reasons that we want to operate.
And OK, so let's walk through some of the high level key steps in this implementation details, we would generate the Federated User.
It's basically generating credentials to grant these users access to certain resource is that we want them to use on our own evidence account, because this is all running in our own account.
Of course, we would get an available easy to innocence from the pool, and the reason being is that it makes the process faster.
We have, in essence, is ready to be used that we can assign to use a drug directly.
We would create a DBS volume and then attach it to the innocence.
We were format TVs, volume, so create a file system and so on.
And then we would mount the volume in the innocents.
We were drawn some commands to start the doctor container.
To authorize the public is a such key for the Cloud nine environment.
To connect to this container, we would get the public host name for the U.
C two innocents and then finally with connect the cloud nine environment to the innocents or to the container in this innocents, and then redirect it easier to the environment.
So this is just so you have a mental picture off how this is all implemented.
We have the cloud nine environment, the front, then connecting by.
It's a stage in a container running an easy two innocents.
And then we use a B s for persisting.
Um, students file for storage.
All right, so what are some of the challenges that we ran into when using this approach?
Well, first of all, we had to maintain a pool of easy two instances across different availability zones across different regions, which in which wasn't quite easy to do like we have to maintain the state on some idiot about this pool of instituted since we have to get down to level off running commands on the easy to innocence is directly to, um, start the containers to authorize the public a city using another Amazon Web service call.
This is him, which wasn't quite abstract enough.
We had to allocate an entire easy to innocents for each student for each cloud nine environment, even if they're not using the entire 82 instances in terms of free sources, CPU and memory and such on, then, at the end of the user session, we had to terminate the easy two instances which was a bit time consuming because we have to also wait for the EBS volume to be detached.
And another challenge with that is that when the user comes in next time for a new session they would get a different host name, which was quite annoying, maybe for students who were working on the right project getting a different host name every time.
And it was also quite challenging for us because the Cloud nine environment now needs a different parameters for this is a connection.
Um and so the final final thing is that we had to remove that you see two instances temporarily from the pool to update the doctor images that are back in the I d.
Okay, what are some challenges that we had with the B s for stores?
So again, we use a B s for persisting students files and the i d.
We had to also provisioned and allocate an entire EBS volume for each student, even if the seed is not really using the entire CBS volume which wasn't quite coast effective.
And we have to assign every user a single availability zone which was quite limiting because that means that even if in the pool somewhere else and some other availability zone there was an innocents available.
We couldn't assign this innocence to the user because it had to be in the same availability zone as the baby s William.
And so we in an attempt to resolve the store challenge, we move to s three for storage instead.
And while this actually held sort of eliminated this limitation of having to be in the same level of India's own, um, it introduced ah few challenges of its own.
So here's a mental picture for the same architecture, basically, with as three used for storage instead of PBS and some of the challenges that this will introduce.
So we would basically create the pulled or prefects on a nasty bucket for each student.
And then we will grant them Rina right permission to this prefix.
One of the challenges is that we have to set up credentials to be able to download this data and upload this day that periodically and we have to refresh this discussion shows across sessions.
It would be nice if we didn't have to do that.
Of course, we have download the user's data initially, which was quite time consuming.
It made the ideas lower to star.
We have to upload the data periodically, which wasn't quite robust.
So anything could go wrong.
And in the user we lose some data potentially and then we could not find an easy way to limit the size of this prefix industry by design.
History is probably meant to be sort of flexible in terms of how much you can store.
So we could not really limit storage in these prefixes.
So at this point, we started considering other service is for for our implementation and all we really wanted is a service that we could ask to give us some container that ideally has ah fixed host name that we can use that has persistent storage that's reachable from the Internet.
So we consider service is like aws fargate on because my manager orchestration system on AWS.
But unfortunately to downsides of aws, forget is that it doesn't support image cashing on DDE that made it slower to start ideas.
Well, when we tested that it took anywhere from 2 to 3 minutes to start each container which was very slow, and it doesn't support persistence toward.
So that was, ah, problem with AWS for gate.
Another service that we considered is AWS CCS, which is a little bit more managed a little bit less manage version off it ofhis forget where we actually own and manage.
Ah, bunch of container hosts.
But we were using E.
C s for some other application at the time, and we found the limitation that wouldn't wasn't gonna play nicely in the architecture.
The limitation that we found is that the Maur concurrent containers or tasks that we had running an easy yes, the slower it takes to start new E.
C s tasks or containers on that, of course, was gonna affect and the idea in this case and then finally reconsidered.
KUBERNETES.
Cooper News is another open source orchestration system and then hear some staffs where we create on i d e.
On with the communities approach.
So we created for Asian user like before two grand user's permission to use some resources.
We're creating name space per user and then we create a persistent of persistent volume claim.
This is basically clean for storage.
We ask for a certain amount of storage to be available for this user.
We authorize the public.
It's sticky.
Like before we create a single container pot or a container, we mount the persistent volume on it.
We mount this S H key to be able to authorize force a connection and then reconnect the cloud nine environment and finally, directly users.
So what challenges did we have with communities?
Let's talk more about what solutions that actually kookiness help us sold so previously where we had to maintain this pool of easy two instances across different availability zones and regions proven is actually manages the notes for us.
So if a note comes down, if an easy to innocence comes down, it will bring it back up automatically, which is quite nice.
Well, previously, we have to go down to the level off.
Running commands on each easy to innocence is we don't really have to do that anymore because you use the community's a p I to ask for a container, and it just we don't worry about how this container is created at all, how the volumes are mounted and so on and so forth.
We used to allocate an entire easy to, in essence, is for innocence.
for each user.
And now we're actually able to run multiple containers on one innocents and have them share.
Resource is ultimately And at the end of the session, we used to terminate each easy to witnesses, um, for EBS volume and get a different host name next time and everything.
And right now we actually have some determine its that host name per user, no matter what container the use thanks to some reverse proxy that's implemented an application running in the same Cubans cluster and the clusters, the NSS well, and finally we used to remove a C two innocents to update the doctor image of remove this easy to nurses from the pool to what?
The doctor image?
Well, now we can do that very easily, using something called the men sit in communities, which is essentially a container that runs on every innocents.
Every working note that we have in the cluster and effectively pulls the image down.
So these air some challenges that we had with storage B, B s and S three and we resolve these challenges using a tool Cold port works and again storage.
Persistent storage on Cuban is is not a nisi thing to do as well.
And eso port works really helped us sort of solve some of these problems.
And the way Port works works is that it basically creates a set off E.
B s volumes for us.
We don't really care about how to how it creates them.
We give it just the credentials to create them.
We don't care which reason we don't care which availability zone, and then it gives us basically a storage pool.
So a virtual storage pool that we can just use to allocate virtual volumes from on Ben.
Finally, we reuse that we reuse the capacity in this storage volume by basically taken snapshots two s three off these virtual volumes, deleting them.
And then later, when the user comes in, we basically created a virtual volume again from this snapshot.
Another advantage of port works is that storages tenley pra vision.
And that means that if we ask for one gigabyte for one user, for example, it's not necessarily it doesn't necessarily mean that we're gonna lose a one gigabyte of the storage capacity until the user's actually using one gigabyte.
So that was quite on optimal thing to have Um And so here is how this is implemented.
Here's a diagram of how this is implemented on the communities Closer we have the cloud nine environment, like before, connects to a container and sees you in a sense, by S H, except that we have a proxy right now in the middle that helps resolve this V Anish issue and allow us to have a deterministic coast name for each user, no matter what container they're using.
And then this is, of course, all part of a cluster.
And then we use port works for storage.
All right, so I guess one main takeaway here from from this talk is that we are kind of using communities in a way that's not, um, that's not usually that it's not usually used for, but it's lending itself really nicely to our implementations.
Lending itself really nice guitar solution.
Here's some Here's some graph for the number of users that we see since July.
I think on, and we can see anywhere from 4002 8.5 maybe 1000 users every day, which is quite scale place really nicely on scale.
And here's some future work as well So one of the one of the challenges that we have with this implementation, even with the previous implementation, is fraud detection and, you know, maybe Maur more monitoring.
So it turns out that some users, some bad people out there in the world, used the I d for things that it's not meant to be used, for example, to Beth Coal mine or two D lost a server somewhere.
And so we want to get better.
It really liking this.
We are using AWS Gateway, another Amazon or service that helps us monitor some of the traffic that's going into and out of the cluster on dhe.
We have some other mechanisms in place is well to monitor some final names and processes, names and so on.
We want to use multiple clusters across different regions, and finally, we wantto have multiple ID's per user, which is quite easily, I think, temperament, um, in this model.
All right, so that's it for the CS 50 i d.
I encourage you all to go ahead and try it out at i d dot CS 50 that I Oh, let me invite David back on stage.