Placeholder Image

字幕列表 影片播放

  • [MUSIC PLAYING]

  • DAVID MALAN: All right.

  • This is CS50, and last time where we left off

  • was here, focusing on data structures.

  • And indeed, one of the last data structures we looked at

  • was that of a hash table.

  • But that was the result of a progression of data structures

  • that we began with this thing here, an array.

  • Recall that an array was actually a data structure that was actually introduced

  • back in week two of CS50, but it was advantageous at the time,

  • because it allowed us to do things efficiently, like binary search,

  • and it was very easy to use with its square bracket notation

  • and adding integers or strings or whatever it is.

  • But it had limitations, recall.

  • And among those limitations were its lack

  • of resizeability, its lack of dynamism.

  • We had to decide in advance how big we wanted this data structure to be,

  • and if we wanted it to be any bigger, or for that matter any smaller,

  • we would have to dynamically ourselves resize it and copy

  • all of the old elements into a new array, and then go about our business.

  • And so we introduced last time this thing here instead,

  • a linked list that addresses that problem by having us on demand

  • allocate these things that we called nodes,

  • storing inside them an integer or really any data type that we want,

  • but connecting those nodes with these arrows pictured

  • here, specifically connecting them or threading them together using something

  • called pointers.

  • Whereby pointers are just addresses of those nodes in memory.

  • So while we pay a bit of a price in terms of more memory in order

  • to link these nodes together, we gain this flexibility,

  • because now when we want to grow or shrink this kind of data structure,

  • we simply use our friend malloc or free or similar functions still.

  • But we then, using linked lists-- and at a lower level,

  • pointers as a new building block, did we begin to solve other problems.

  • We considered the problem of a stack of trays in the cafeteria,

  • and we presented an abstract data type known as a stack.

  • And a stack supports operations like push and pop.

  • But what's interesting about a stack for our purposes

  • recall is that we don't need to commit necessarily

  • to implementing it one way or another.

  • Indeed, we can abstract away the underlying implementation

  • details of a stack, and implement it using an array if we want,

  • if we find that easier or convenient.

  • Or for that matter we can implement it using a linked list,

  • if we want that additional ability to grow and shrink.

  • And so the data type itself in a stack has these two operations, push and pop,

  • but they're independent, ultimately, of how we actually

  • implement things underneath the hood.

  • And that holds true as well for this thing you here.

  • A line, or more properly a queue, whereby

  • instead of having this last in, first out or LIFO property,

  • we want something more fair in the human world, a first in, first out.

  • So that when you NQueue or Dqueue some piece

  • of data, whatever was NQueued first is the first thing

  • to get out of that queue as well.

  • And here too did we see that we could implement these things using a linked

  • list or using array, and I would wager there is yet

  • other possible implementations as well.

  • And then we transitioned from these abstract data types

  • to another sort of paradigm for building a data structure in memory.

  • Rather than just linking things together in a unidirectional way, so to speak,

  • with a linked list, we introduced trees, and we introduced things

  • like binary search trees, that so long as you

  • keep these data structures pretty well balanced, such that the height of them

  • is logarithmic and is not linear like a linked list,

  • can we achieve the kind of efficiency that we saw back in week zero

  • when we did binary search on a phone book.

  • But now, thanks to these pointers and thanks to malloc and free

  • can we grow and shrink the data structure without committing in advance

  • to an actual fixed size array.

  • And similarly did we solve another real world problem.

  • Recall that a few weeks ago we looked at forensics,

  • and most recently did we look at compression, both of which

  • happen to involve files.

  • And in this case, the goal was to compress information,

  • ideally losslessly, without throwing away any of the underlying information.

  • And thanks to Huffman coding did we see one technique

  • for doing that, whereby instead of using seven or eight bits for every letter

  • or punctuation symbol in some text, we can instead come up with our own coding

  • that we use one bit like a one to represent a super common letter like e,

  • and two or three or four or more bits for the less

  • common letters in our world.

  • And then again we came to hash tables.

  • And hash tables too is an abstract type that we could implement using an array

  • or using a linked list or using an array and a linked list.

  • And indeed, we looked first at a hash table as little more than an array.

  • But we introduced this idea of a hash function, that

  • allows you, given some input, to decide on some output

  • and index a numeric value, typically, that allows

  • you to decide where to put some value.

  • But if you use something like an array, of course,

  • you might paint yourself into a corner, such

  • that you don't have enough room ultimately for everything.

  • And so we introduced separate chaining, whereby a hash table in this form

  • is really just an array, pictured here vertically, and a set of linked

  • lists hanging off that array, pictured here horizontally,

  • that allows us to get some pretty good efficiency in terms

  • of hashing, finding the chain that we want in pretty much constant time,

  • and then maybe incurring a bit of linear cost

  • if we actually have a number of collisions.

  • Now today, after leaving behind these data structures-- among them

  • a try, which recall was our last data structure that allowed us

  • in theory in constant time to look up or insert or even delete words in a data

  • structure, depending only on the length of the string,

  • not how many strings were in there-- do we continue to use these ideas,

  • these building blocks, these data structures.

  • But now today we literally leave behind the world of C

  • and starts to enter the world of web programming,

  • or really the world of web pages and dynamic outputs

  • and databases, ultimately, and all of the things that most of us

  • are familiar with every day.

  • But it turns out that this time we don't have to leave behind those ingredients.

  • Indeed, something like this, which you'll soon

  • know as HTML-- the language in which web pages are written-- HyperText Markup

  • Language-- even this textual document, which seems to have a bit of structure

  • to it, as you might glean here from the indentation, can underneath the hood

  • be itself represented as a tree.

  • A DOM, or Document Object Model, but indeed, we'll

  • see now some real world, very modern applications of the same data

  • structures in software that we ourselves use.

  • Because today, we look at how the internet works,

  • and in turn how we actually build software atop it.

  • But first, a teaser.

  • [VIDEO PLAYBACK]

  • [MUSIC PLAYING]

  • -He came with a message, with a protocol all his own.

  • He came to a world of cruel firewalls, uncaring routers,

  • and dangers far worse than death.

  • He's fast.

  • He's strong.

  • He's TCPIP, and he's got your address.

  • Warriors of the Net.

  • [END PLAYBACK]

  • DAVID MALAN: All right, so coming soon is how the internet works.

  • And it's not quite like that.

  • But we'll see in a bit more detail.

  • But let's consider first something a little more familiar, if abstractly,

  • like our own home.

  • So odds are, before coming to a place like this,

  • you had internet access at home or at school or at work or the like.

  • And inside of that building-- let's call it your home--

  • you had a number of devices.

  • Maybe a laptop, maybe a desktop, maybe both, maybe multiple.

  • And you had some kind of internet service provider, Comcast or Verizon

  • or companies like that, that actually run some kind of wired connection,

  • typically-- though it could be wireless-- into your home,

  • and via that connection are you on your laptop or desktop able to get out

  • onto the internet.

  • Well it turns out that the internet itself is a pretty broad term.

  • The internet is really just this interconnection

  • of lots of different networks.

  • Harvard here has a network.

  • Yale has a network.

  • Google has a network.

  • Facebook has a network.

  • Your home has a network and the like.

  • And so the internet really is the interconnection

  • of all of those physical networks.

  • And on top of this internet, do there run services, things like the web

  • or the world wide web.

  • Things like email.

  • Things like Facebook Messenger.

  • Things like Skype.

  • And any number of applications that we use every day

  • run on top of this physical layer known as the internet.

  • But how does this internet itself work?

  • Well, when you first plug in your computer to a home modem

  • that you might get from Verizon or Comcast-- it might be a cable

  • modem or a DSL modem or another technology still-- or more commonly

  • these days, you connect wirelessly, such that your Mac or PC

  • laptop connects somehow wirelessly to this device, what actually happens?

  • Like the first time you have internet installed on your home,

  • how does your computer know how to connect to that device,

  • and how does that device know how to get your laptop's data

  • to and from the rest of the internet?

  • Well, odds are you know on your Mac or PC

  • you at least get to choose the name of your network,

  • whether it's Harvard University or Yale or LinkSys or Airport Extreme

  • or whatever it is at home, and then once you're connected to that,

  • it turns out that there's special software running

  • on this device in your home called a router.

  • And actually, it can be called any number of things.

  • But one of its primary functions is to route information,

  • and also to assign certain settings to your computer.

  • Indeed, running inside of this so-called router in your home

  • typically is a protocol, a special type of software called DHCP-- Dynamic Host

  • Configuration Protocol.

  • And this is just a fancy way of saying but that little device in your home

  • knows how to get you onto the internet.

  • And how does it do that?

  • Well, the first time you turn on your Mac or PC

  • and connect to your home network-- or Harvard's or Yale's for that matter--

  • you are assigned, thanks to this technology DHCP an IP address,

  • a numeric address, something of the form something

  • dot something dot something dot something that uniquely in theory

  • identifies your computer on the internet,

  • so long as your computer speaks this protocol IP, or the Internet Protocol.

  • And we'll see in a bit that IP and TCP-- or more

  • commonly known as TCPIP-- is really just a set of conventions

  • that governs how computers talk to each other on the internet.

  • And the first way they do that is by agreeing upon in advance

  • what each of their addresses look like.

  • Now, these addresses are actually changing in format over time,

  • because frankly, we're running out of these addresses.

  • But the most common address right now still

  • is an IP version 4, or V4 address, that is literally of the form something dot

  • something dot something dot something.

  • And so when your computer first turns on in your home network,

  • you are given a number that looks a little something like that.

  • And via that address now can you talk to other computers on the internet,

  • because this is like your from address in the physical world,

  • and you can receive responses from computers on the internet,

  • because they now know you via this address.

  • So much like the CS building here is that 33 Oxford Street

  • Cambridge, Massachusetts, or the CS building at Yale

  • was 51 Prospect Street, New Haven, Connecticut, much as those

  • addresses uniquely identified those two buildings,

  • so do IP addresses in the world of computers uniquely identify computers.

  • So here for instance just happens to be by convention

  • what most of Harvard's own IP addresses look like.

  • Now that I'm on this network here, odds are my IP address starts

  • with 140.247 dot something dot something,

  • or 128.103 dot something dot something.

  • Or at New Haven at Yale, it might look like 130.132 dot

  • something dot something, or 128.36 dot something dot something.

  • And it turns out that each of these somethings

  • simply is by definition a number between 0 and 255.

  • 0 to 255.

  • I feel like we've heard these numbers before.

  • And indeed, if you can count from 0 to 255,

  • that means you're using what 8 bits.

  • And so each of these numbers is 8 bits plus 8 plus 8 plus 8.

  • So that's 32 bits.

  • And indeed, an IP address typically these days-- at least version 4--

  • is a 32-bit value which means there can be total no more than 4 billion

  • or so computers on the internet.

  • And we're actually starting to bump up against that, because everything

  • these days seems to be on the internet, whether it's your phone, laptop,

  • or even some smart device in your home.

  • And so there is a way to mitigate that.

  • It turns out that your computer, even if you're on campus,

  • might not quite have one of those Harvard or Yale IPs.

  • You might instead have depending on where you are on campus a private IP