Placeholder Image

字幕列表 影片播放

  • [MUSIC PLAYING]

  • DAVID MALAN: All right.

  • This is CS50, and last time where we left off

  • was here, focusing on data structures.

  • And indeed, one of the last data structures we looked at

  • was that of a hash table.

  • But that was the result of a progression of data structures

  • that we began with this thing here, an array.

  • Recall that an array was actually a data structure that was actually introduced

  • back in week two of CS50, but it was advantageous at the time,

  • because it allowed us to do things efficiently, like binary search,

  • and it was very easy to use with its square bracket notation

  • and adding integers or strings or whatever it is.

  • But it had limitations, recall.

  • And among those limitations were its lack

  • of resizeability, its lack of dynamism.

  • We had to decide in advance how big we wanted this data structure to be,

  • and if we wanted it to be any bigger, or for that matter any smaller,

  • we would have to dynamically ourselves resize it and copy

  • all of the old elements into a new array, and then go about our business.

  • And so we introduced last time this thing here instead,

  • a linked list that addresses that problem by having us on demand

  • allocate these things that we called nodes,

  • storing inside them an integer or really any data type that we want,

  • but connecting those nodes with these arrows pictured

  • here, specifically connecting them or threading them together using something

  • called pointers.

  • Whereby pointers are just addresses of those nodes in memory.

  • So while we pay a bit of a price in terms of more memory in order

  • to link these nodes together, we gain this flexibility,

  • because now when we want to grow or shrink this kind of data structure,

  • we simply use our friend malloc or free or similar functions still.

  • But we then, using linked lists-- and at a lower level,

  • pointers as a new building block, did we begin to solve other problems.

  • We considered the problem of a stack of trays in the cafeteria,

  • and we presented an abstract data type known as a stack.

  • And a stack supports operations like push and pop.

  • But what's interesting about a stack for our purposes

  • recall is that we don't need to commit necessarily

  • to implementing it one way or another.

  • Indeed, we can abstract away the underlying implementation

  • details of a stack, and implement it using an array if we want,

  • if we find that easier or convenient.

  • Or for that matter we can implement it using a linked list,

  • if we want that additional ability to grow and shrink.

  • And so the data type itself in a stack has these two operations, push and pop,

  • but they're independent, ultimately, of how we actually

  • implement things underneath the hood.

  • And that holds true as well for this thing you here.

  • A line, or more properly a queue, whereby

  • instead of having this last in, first out or LIFO property,

  • we want something more fair in the human world, a first in, first out.

  • So that when you NQueue or Dqueue some piece

  • of data, whatever was NQueued first is the first thing

  • to get out of that queue as well.

  • And here too did we see that we could implement these things using a linked

  • list or using array, and I would wager there is yet

  • other possible implementations as well.

  • And then we transitioned from these abstract data types

  • to another sort of paradigm for building a data structure in memory.

  • Rather than just linking things together in a unidirectional way, so to speak,

  • with a linked list, we introduced trees, and we introduced things

  • like binary search trees, that so long as you

  • keep these data structures pretty well balanced, such that the height of them

  • is logarithmic and is not linear like a linked list,

  • can we achieve the kind of efficiency that we saw back in week zero

  • when we did binary search on a phone book.

  • But now, thanks to these pointers and thanks to malloc and free

  • can we grow and shrink the data structure without committing in advance

  • to an actual fixed size array.

  • And similarly did we solve another real world problem.

  • Recall that a few weeks ago we looked at forensics,

  • and most recently did we look at compression, both of which

  • happen to involve files.

  • And in this case, the goal was to compress information,

  • ideally losslessly, without throwing away any of the underlying information.

  • And thanks to Huffman coding did we see one technique

  • for doing that, whereby instead of using seven or eight bits for every letter

  • or punctuation symbol in some text, we can instead come up with our own coding

  • that we use one bit like a one to represent a super common letter like e,

  • and two or three or four or more bits for the less

  • common letters in our world.

  • And then again we came to hash tables.

  • And hash tables too is an abstract type that we could implement using an array

  • or using a linked list or using an array and a linked list.

  • And indeed, we looked first at a hash table as little more than an array.

  • But we introduced this idea of a hash function, that

  • allows you, given some input, to decide on some output

  • and index a numeric value, typically, that allows

  • you to decide where to put some value.

  • But if you use something like an array, of course,

  • you might paint yourself into a corner, such

  • that you don't have enough room ultimately for everything.

  • And so we introduced separate chaining, whereby a hash table in this form

  • is really just an array, pictured here vertically, and a set of linked

  • lists hanging off that array, pictured here horizontally,

  • that allows us to get some pretty good efficiency in terms

  • of hashing, finding the chain that we want in pretty much constant time,

  • and then maybe incurring a bit of linear cost

  • if we actually have a number of collisions.

  • Now today, after leaving behind these data structures-- among them

  • a try, which recall was our last data structure that allowed us

  • in theory in constant time to look up or insert or even delete words in a data

  • structure, depending only on the length of the string,

  • not how many strings were in there-- do we continue to use these ideas,

  • these building blocks, these data structures.

  • But now today we literally leave behind the world of C

  • and starts to enter the world of web programming,

  • or really the world of web pages and dynamic outputs

  • and databases, ultimately, and all of the things that most of us

  • are familiar with every day.

  • But it turns out that this time we don't have to leave behind those ingredients.

  • Indeed, something like this, which you'll soon

  • know as HTML-- the language in which web pages are written-- HyperText Markup

  • Language-- even this textual document, which seems to have a bit of structure

  • to it, as you might glean here from the indentation, can underneath the hood

  • be itself represented as a tree.

  • A DOM, or Document Object Model, but indeed, we'll

  • see now some real world, very modern applications of the same data

  • structures in software that we ourselves use.

  • Because today, we look at how the internet works,

  • and in turn how we actually build software atop it.

  • But first, a teaser.

  • [VIDEO PLAYBACK]

  • [MUSIC PLAYING]

  • -He came with a message, with a protocol all his own.

  • He came to a world of cruel firewalls, uncaring routers,

  • and dangers far worse than death.

  • He's fast.

  • He's strong.

  • He's TCPIP, and he's got your address.

  • Warriors of the Net.

  • [END PLAYBACK]

  • DAVID MALAN: All right, so coming soon is how the internet works.

  • And it's not quite like that.

  • But we'll see in a bit more detail.

  • But let's consider first something a little more familiar, if abstractly,

  • like our own home.

  • So odds are, before coming to a place like this,

  • you had internet access at home or at school or at work or the like.

  • And inside of that building-- let's call it your home--

  • you had a number of devices.

  • Maybe a laptop, maybe a desktop, maybe both, maybe multiple.

  • And you had some kind of internet service provider, Comcast or Verizon

  • or companies like that, that actually run some kind of wired connection,

  • typically-- though it could be wireless-- into your home,

  • and via that connection are you on your laptop or desktop able to get out

  • onto the internet.

  • Well it turns out that the internet itself is a pretty broad term.

  • The internet is really just this interconnection

  • of lots of different networks.

  • Harvard here has a network.

  • Yale has a network.

  • Google has a network.

  • Facebook has a network.

  • Your home has a network and the like.

  • And so the internet really is the interconnection

  • of all of those physical networks.

  • And on top of this internet, do there run services, things like the web

  • or the world wide web.

  • Things like email.

  • Things like Facebook Messenger.

  • Things like Skype.

  • And any number of applications that we use every day

  • run on top of this physical layer known as the internet.

  • But how does this internet itself work?

  • Well, when you first plug in your computer to a home modem

  • that you might get from Verizon or Comcast-- it might be a cable

  • modem or a DSL modem or another technology still-- or more commonly

  • these days, you connect wirelessly, such that your Mac or PC

  • laptop connects somehow wirelessly to this device, what actually happens?

  • Like the first time you have internet installed on your home,

  • how does your computer know how to connect to that device,

  • and how does that device know how to get your laptop's data

  • to and from the rest of the internet?

  • Well, odds are you know on your Mac or PC

  • you at least get to choose the name of your network,

  • whether it's Harvard University or Yale or LinkSys or Airport Extreme

  • or whatever it is at home, and then once you're connected to that,

  • it turns out that there's special software running

  • on this device in your home called a router.

  • And actually, it can be called any number of things.

  • But one of its primary functions is to route information,

  • and also to assign certain settings to your computer.

  • Indeed, running inside of this so-called router in your home

  • typically is a protocol, a special type of software called DHCP-- Dynamic Host

  • Configuration Protocol.

  • And this is just a fancy way of saying but that little device in your home

  • knows how to get you onto the internet.

  • And how does it do that?

  • Well, the first time you turn on your Mac or PC

  • and connect to your home network-- or Harvard's or Yale's for that matter--

  • you are assigned, thanks to this technology DHCP an IP address,

  • a numeric address, something of the form something

  • dot something dot something dot something that uniquely in theory

  • identifies your computer on the internet,

  • so long as your computer speaks this protocol IP, or the Internet Protocol.

  • And we'll see in a bit that IP and TCP-- or more

  • commonly known as TCPIP-- is really just a set of conventions

  • that governs how computers talk to each other on the internet.

  • And the first way they do that is by agreeing upon in advance

  • what each of their addresses look like.

  • Now, these addresses are actually changing in format over time,

  • because frankly, we're running out of these addresses.

  • But the most common address right now still

  • is an IP version 4, or V4 address, that is literally of the form something dot

  • something dot something dot something.

  • And so when your computer first turns on in your home network,

  • you are given a number that looks a little something like that.

  • And via that address now can you talk to other computers on the internet,

  • because this is like your from address in the physical world,

  • and you can receive responses from computers on the internet,

  • because they now know you via this address.

  • So much like the CS building here is that 33 Oxford Street

  • Cambridge, Massachusetts, or the CS building at Yale

  • was 51 Prospect Street, New Haven, Connecticut, much as those

  • addresses uniquely identified those two buildings,

  • so do IP addresses in the world of computers uniquely identify computers.

  • So here for instance just happens to be by convention

  • what most of Harvard's own IP addresses look like.

  • Now that I'm on this network here, odds are my IP address starts

  • with 140.247 dot something dot something,

  • or 128.103 dot something dot something.

  • Or at New Haven at Yale, it might look like 130.132 dot

  • something dot something, or 128.36 dot something dot something.

  • And it turns out that each of these somethings

  • simply is by definition a number between 0 and 255.

  • 0 to 255.

  • I feel like we've heard these numbers before.

  • And indeed, if you can count from 0 to 255,

  • that means you're using what 8 bits.

  • And so each of these numbers is 8 bits plus 8 plus 8 plus 8.

  • So that's 32 bits.

  • And indeed, an IP address typically these days-- at least version 4--

  • is a 32-bit value which means there can be total no more than 4 billion

  • or so computers on the internet.

  • And we're actually starting to bump up against that, because everything

  • these days seems to be on the internet, whether it's your phone, laptop,

  • or even some smart device in your home.

  • And so there is a way to mitigate that.

  • It turns out that your computer, even if you're on campus,

  • might not quite have one of those Harvard or Yale IPs.

  • You might instead have depending on where you are on campus a private IP

  • address, or if you're in your home, you similarly

  • might have one of these addresses.

  • And these are private in the sense that they

  • are used to route information within your home or within your school

  • or within your company, but these addresses are not

  • meant to be used by the outside world.

  • Instead, what you get from Harvard or Yale

  • or Comcast or Verizon when you connect to their network

  • typically is at least the ability to have one or more public IP

  • addresses that the rest of the world knows you by.

  • So what does this actually mean?

  • Well, sometimes it doesn't really mean anything at all.

  • And in fact, if you look at popular media today or various television

  • shows, you'll see that IP is either miscommunicated or outright

  • misunderstood.

  • Let's take a look.

  • [VIDEO PLAYBACK]

  • -It's a 32-bit IPv4 address.

  • -IP, as in the internet?

  • -Private network.

  • To meet is private network.

  • It's just so amazing.

  • It's in their IP address.

  • She's letting us watch what she's doing in real time.

  • [END PLAYBACK]

  • DAVID MALAN: No, no, that is not what a hacker does in real time,

  • and that is not how you watch a hacker in real time.

  • Indeed, if you zoom in on this screen here,

  • you'll see that what's actually being looked at

  • has nothing to do with networking per se.

  • This is actually programming code written

  • in a language called Objective C, which happens

  • to be used conventionally for Mac applications

  • or more recently iOS applications.

  • And of all the things for them to have pulled out,

  • they use this code, which has to be something

  • related to some kind of drawing program insofar as it's talking about crayons.

  • Moreover, if you actually look at one of the other scenes from this show,

  • this was the IP address in question.

  • This too is not technically accurate.

  • What's wrong with this IP address in this frame here from the show?

  • Yeah, so if the IP addresses can only be from 0 to 255,

  • 275 is definitely too big.

  • Now, in their defense, this is probably a good thing,

  • because now they're not broadcasting some random, unsuspecting person's

  • actual IP address.

  • But there too there's a technical limitation.

  • But of course, we humans, when we visit websites using Safari or Chrome or IE

  • or Edge or whatever, we rarely if ever type

  • in the address of websites or servers by these numeric IP addresses.

  • Rather, we seem to use more user-friendly words,

  • like www.google.com, or harvard.edu, or yale.edu, or facebook.com, or the like.

  • And thankfully, there exists in the world

  • another system, another technology known as DNS-- Domain Name System.

  • And what DNS does is it simply converts numeric IP addresses

  • to more human-friendly host names, or fully qualified domain names.

  • Which is to say when I first sit down at my Mac or my PC on my home network

  • or Harvard's or Yale's and I type in something like www.google.com and hit

  • Enter, the way that my computer actually talks to google.com

  • is by way of those numeric IP addresses.

  • But the way my Mac or PC figures out what that IP address is of google.com

  • is it asks the local operating system-- Mac OS or Windows--

  • and if Mac OS or Windows doesn't know, my operating system asks

  • Harvard's network or Yale's network or Comcast's network,

  • wherever I physically am, because each of those networks

  • has their own DNS server, whose purpose in life is to convert IP

  • addresses to host names and host names to IP addresses.

  • And in the event that Comcast or Yale or Harvard, wherever I am,

  • doesn't know the answer to what is the IP address for www.google.com,

  • there exist root servers in the world.

  • Servers that are globally administered at the end of the day

  • can at least help those DNS servers figure out what the answers are.

  • And indeed, when you buy or when you rarely

  • rent a domain name, among the things you're doing

  • is informing the world via a set of standards

  • what your server's IP addresses are.

  • And so that's exactly what Google and others have done.

  • But of course, the data at the end of the day still has to get from my laptop

  • to Google.

  • And then my search results have to get from Google to me.

  • And how does that happen?

  • I mean, most of Google's servers are probably

  • out in Mountain View, California or maybe here on the East Coast somewhere,

  • if they have multiple servers.

  • Or maybe somewhere in the world.

  • And indeed, big companies these days have servers all over the place.

  • So how does one little old laptop know how

  • to request search results from Google or how

  • to request my news feed from Facebook or how to do any number of other things

  • on the internet?

  • Well it does it by way of these things called routers.

  • It turns out that between me and most any other point on the internet,

  • there's one or more routers-- special servers

  • that could be this big, this big, any number of sizes these days.

  • They're just computers that typically live in data centers of some sort.

  • And these routers' purpose in life is to quite simply route information.

  • So when my Mac wants to talk to google.com,

  • my Mac constructs what we call a packet of information inside of which

  • is my request.

  • Give me all of your search results for cats,

  • for instance, if that's what I'm searching for.

  • And that packet is handed off to the nearest router.

  • That router happens to be, at this point in the story, at Harvard here.

  • Harvard has its own routers.

  • And Harvard's routers are somehow wired or wirelessly connected

  • to other routers in the world.

  • And those routers, typically no more than 30 routers away,

  • can get my data by routing it, routing it, routing it, routing it, routing it,

  • until it eventually reaches its correct destination.

  • In its simplest form, what you can think of these routers

  • as doing is looking at those IP addresses-- something

  • dot something dot something dot something-- and deciding, based

  • on those numbers, which direction to go.

  • So maybe if my IP address starts arbitrarily with 1,

  • maybe the packet should go that way to that router.

  • If it starts with 2, it should go that way

  • and be routed to that router, or that way, or that way.

  • It doesn't really matter.

  • This all happens dynamically thanks to software.

  • But routers just use those IP addresses to decide

  • which way to route your information.

  • And we can actually see this.

  • Let me go ahead into CS50 IDE, and Macs and PCs and other computers

  • have the same software.

  • This will allow me to do a number of things at my command line here.

  • For instance, suppose that I wanted to check

  • what the IP address is for google.com.

  • Because if I want to send Google a letter, like a packet of information

  • requesting a whole bunch of search results about cats,

  • I need to know their IP address.

  • So what I can do at the command line here

  • is run a command that's pretty popular called nslookup-- names server lookup.

  • And I can type in something like www.google.com Enter,

  • and wala, I seem to get the answer here that Google's IP address is apparently

  • 172.217.4.36.

  • And I know that answer, because Harvard's server--

  • and I know it's Harvard, because it starts with 140.247-- Harvard's DNS

  • server somewhere here on campus just knew that result.

  • But it's non-authoritative, in the sense that Harvard does not run google.com.

  • But Harvard has previously asked Google or someone else

  • for Google's IP address.

  • And so Harvard is answering the question for me, but not authoritatively.

  • It's a delegate who is relaying that information to me.

  • Now, suppose I want to do this for another site.

  • Let me go ahead and search for nslookup say www.facebook.com.

  • And you'll see here that Facebook's IP address is apparently 31.13.80.36.

  • And there's some more cleverness going on here.

  • It turns out there's other types of DNS records

  • or entries, starmini.c10r.facebook.com.

  • I don't really know what that means.

  • Facebook's a big enough company that there's probably

  • a lot more complexity going on.

  • But just out of curiosity, let me go ahead and copy this IP address here.

  • And in a browser, go to http:// that IP address.

  • Enter.

  • And wala, I make my way to Facebook.com.

  • But it would be pretty bad for business if everyone in the world

  • had to know that Facebook's IP address is this.

  • Back in the day when people still used phone numbers,

  • you might have services like 1-800-COLLECT, C-O-L-L-E-C-T,

  • these mnemonics, so that it was easier for humans to remember phone numbers.

  • Thankfully, DNS does all of this automatically.

  • We just have to remember facebook.com, and DNS

  • does that conversion even more dynamically than the old school

  • 1-800-COLLECT tricks that the world adopted.

  • So that's how my computer would get the TO address.

  • So at this point in the story, if I want to send a request to google.com--

  • and this is just an envelope in which I might send a letter--

  • I need to have two pieces of information.

  • I need to have the TO address here, which for Google recall--

  • let me look it up again-- is 172.217.4.36.

  • 7 And so I'm going to put that in the TO field of this envelope.

  • And now I need to know my own IP address.

  • So it turns out my computer has its own IP address.

  • And so when I send this request over the internet to Google,

  • I'm going to need to include my own IP address, which Windows or Mac

  • OS knows for me.

  • And so in the top corner of this envelope might

  • I write my actual IP address as well.

  • So now I have to actually route this information.

  • I first have to write Google a note, and I

  • might say on this blank sheet of paper, search for cats.

  • So this might be my search request.

  • And I'm going to go ahead and just bundle this

  • up, put this inside of this envelope.

  • But now I need to send this envelope or this so-called packet of information

  • to www.google.com.

  • And who knows where they are?

  • Maybe they're in California.

  • Maybe they're here on the East Coast.

  • Maybe they're somewhere else.

  • How do I route this information?

  • Well, turns out that Harvard has a router, again,

  • and Harvard's routers know of other routers.

  • And in turn, and we using the same command prompt

  • can we actually see the path that my data should

  • take if I trace the route one query at a time from here to www.google.com.

  • And now what you see, one row at a time, is the following.

  • The first hop between me and Google is apparently this router here.

  • Row number, mr-sc-1-gw-vl427.fas.net.harvard.edu.

  • Don't quite understand all of that, but I

  • do know just from knowing the people there, MR is the machine room.

  • So here at Harvard Science Center, there is a room with machines.

  • And that's where this server apparently is.

  • SC means Science Center.

  • GW by convention means gateway, which is just

  • a synonym for router, this kind of device.

  • And then I don't know what VL427 means.

  • But I do know that if we continue to the next hop here,

  • row two, Core Science Center gateway, or Core Science Center router.

  • So one router is connected to another router.

  • The third hop to which my data is delivered

  • is bdrgw2, which I know by convention means border gateway.

  • And so this data is being passed from hop one to two to three.

  • And once it goes there, it goes to hop four or router

  • number four, which is nox1sumgw.

  • So nox is the northern crossroads, which is a common peering point here

  • in the Northeast of the US, which just means lots of different internet

  • service providers interconnect their cabling and their technology

  • so as to route data to and from locations.

  • That's apparently where we're connected here.

  • Then I don't know where row five is, but it

  • looks like its owned by internet two, which

  • is a fast level of internet service that a lot of universities use.

  • Then router 6, 7, 8, 9, 10, and 11 don't even disclose that they have names.

  • And they might not.

  • Routers don't and computers don't need to have

  • domain names or human-friendly terms, it's just useful for us humans.

  • But then lastly in hop 12, we finally make our way to whatever this is,

  • which seems to be some kind of synonym or alias for one of Google's servers.

  • So it seems that in just 12 hops, I can get data from here to Google.

  • And you know how long it takes to get from here to Google, wherever they are?

  • 9 milliseconds in total.

  • That's pretty darn fast to make a request from my computer

  • to some other computer, especially when that computer could be most anywhere

  • in the world or in the country.

  • Now, there's a lot of variability.

  • If you look at each of these rows-- 1.5 milliseconds, 1.9, 2.9, 25, 25, 25.

  • These aren't cumulative.

  • What my computer is doing is sending a packet to the first router,

  • then to the second rather, then to the third router,

  • and measuring each time how long it takes.

  • So you really just get a rough sense, an average of sorts,

  • based on running this command like this.

  • So it seems to take between 10 and 30 milliseconds

  • to get my data from me to Google.

  • Now, I don't know where Google's servers are,

  • but I do know that UC Berkeley is in California,

  • and their servers I do think are in California.

  • So let's do another by tracing the route to www.berkeley.edu

  • where some of our friends there are.

  • That was super fast, even though it still took some 93 milliseconds.

  • So I'm going to infer that the server of Google's

  • that I'm talking to isn't all the way in California,

  • because to get to California in reality seems to take a good 100 or 90

  • milliseconds.

  • But let's see what we can glean here.

  • So Machine Room Science Center.

  • It's a core gateway.

  • It's a border gateway to Northern Crossroads, to an unnamed server.

  • Don't know what this one is.

  • But I can guess maybe what this is.

  • And notice in particular, router number six jumps from seven

  • milliseconds to like 49.

  • That's a pretty good distance.

  • And indeed, if you look at the name here, Hous, this I'm guessing

  • is a router that's in Houston, Texas, halfway across the country.

  • After that, maybe Los Angeles here in step 8.

  • And that, indeed takes a little more time.

  • So you can probably infer that it's farther away.

  • No name, no name.

  • This one here, I'm not really sure.

  • But now we seem to be in Berkeley's campus and CalWeb-- California web,

  • their server farm production.

  • Indeed, it takes some 90 seconds in total to get to Berkeley.

  • What about MIT?

  • MIT should be pretty close.

  • Let's do a trace route to MIT.edu.

  • And it takes-- all right, so it seems that two routers between us and MIT

  • aren't even cooperating, and that's their prerogative.

  • Not actually responding to our requests.

  • And so in about 10 milliseconds, we get to MIT's server,

  • which seems to be hosted by a third party

  • company called Akamai, which is a content delivery network,

  • among other things.

  • Which means MIT has outsourced to some third party

  • the physical hosting of their servers, which is not uncommon.

  • But let's do one more.

  • Let's do one like for CNN, but not here in the US.

  • But maybe .co.jp for the Japanese version of CNN's website.

  • Let's go ahead and run this.

  • Initially following the same route, Machine Room, Core Gateway, border.

  • And then wala, 189 milliseconds later, we seem to have gotten to Japan.

  • But what can we glean from these numbers?

  • I'm not quite sure where all of these hops are.

  • But what is interesting to me is this one here between routers 8 and 9, what

  • do you notice?

  • That's a sizable jump in time.

  • And it's not a fluke.

  • It's not an anomaly, because indeed, it seems to persist.

  • So if we go farther and farther into this trace, then

  • indeed it's staying at 170 plus milliseconds.

  • So what do you think is in between routers number 8 and 9?

  • What would be between these?

  • I dare say there's an entire ocean between them.

  • And we can see that thanks to this animation here,

  • there's a whole lot going on between points A and B,

  • including sometimes some pretty big cables and some pretty big oceans.

  • Let's take a look.

  • [MUSIC PLAYING]

  • All right, there's something about really cool music

  • that makes lines cool.

  • But indeed, those pictures capture the complexity

  • of all the wiring that's actually interconnecting all of the continents

  • and countries of the world that actually explains more technically some

  • of those differences in timings.

  • But at the end of the day, this packet has to get somewhere.

  • And suppose it does make its way over to Google servers,

  • and Google receives this packet of information,

  • realizes, oh, someone is searching for cats again.

  • What does Google actually do in order to respond to that request?

  • Well, it turns out that Google too is going to use a whole bunch of packets.

  • And whereas previously, it was their address in the TO field

  • and my address in the FROM field, now they're

  • just going to simply reverse this so that the TO field now is to me,

  • the FROM field is from Google.

  • And inside of this envelope is going to be their various search results.

  • Now turns out we found one such search result here.

  • So if Google has decided to send me back this search result.

  • Maybe I was feeling lucky and clicked that button.

  • So I just get back one result. They're going to put the cat into the envelope.

  • But sometimes, the data is pretty big.

  • Sometimes this image might be kilobytes, megabytes, or if it's a video file,

  • could be gigabytes large.

  • And it would be kind of rude if Google, in order

  • to send me a really big response, shoved a really big piece of information

  • in its packet and then clogged the internet so-called tubes on their way

  • back to my laptop, thereby preventing anyone else from talking to Google

  • or nearby websites at that same moment in time.

  • So indeed, what Google and what many websites do

  • is they leverage a feature of IP, and its sister protocol

  • TCP that lets us fragment this.

  • And indeed, they will take this perfectly nice picture of a cat,

  • and they will fragment it, thanks to IP, into maybe four different pieces, each

  • of which is smaller than the original.

  • And inside of this envelope then goes one piece at a time.

  • And so if I put one such piece in this first envelope.

  • I can then much more efficiently clearly proceed to transmit this.

  • And then if I do the same with a second and a third

  • and maybe a fourth envelope, now Google can respond with one, two, three

  • and maybe more packets of information that make their way on the internet,

  • not even necessarily following the same path.

  • In fact, there's no guarantee that A to B

  • is going to be the same route as B to A. Things change dynamically over time.

  • But Google's going to have to include a little bit

  • more information on this envelope.

  • It's not sufficient anymore just to send me four envelopes.

  • What else had they probably best do so that I can actually

  • see my cat when it gets back to me?

  • I've got to know how many packets they sent me,

  • and I need to know in what order.

  • So it turns out that what Google is probably going to do

  • is something like this, write on this envelope the number of the packet

  • and really how many there are.

  • And this is a bit of a white lie, it's actually

  • done a little differently thanks to some other fields

  • that are inside of this envelope.

  • But we can think of it really as 1/4, 2/4, 3/4, 4/4,

  • so that if I only get two of these envelopes

  • or three of these envelopes or four, I now know definitively, wait a minute,

  • I only got 3/4 of my cat.

  • And moreover, the ones I did get, I know the order in which

  • I can reassemble those packets.

  • Now, I mentioned this other protocol, TCP,

  • that, indeed often works in conjunction with IP.

  • And you can think of IP as giving you features like addressing, signing

  • every computer in the world a unique address, and fragmentation,

  • being able to chop things up.

  • But TCP further allows us to associate sequence numbers with packets

  • that allows me the receiver to know, wait a minute,

  • I'm missing one or more packets.

  • So TCP is often said to guarantee delivery, and it is this protocol.

  • So long as your Mac or your PC or your computer

  • supports it, which they all do these days.

  • If it determines, hey, wait a minute, I'm missing this packet,

  • TCP is the protocol, the set of conventions, that say Google,

  • I need this packet again or these packets again,

  • and they will be retransmitted.

  • Now, you pay a price in terms of performance,

  • because now you might have to wait for the rest of the cat.

  • So there might be a bit of a latency in order to get back that response.

  • And that might not always be desirable.

  • And indeed, I can think of some scenarios,

  • like if you're watching a baseball game on TV or soccer or football

  • where you're watching a live stream-- or maybe it's the Oscars or the Emmys,

  • or something live, where you really want to stay in sync with that broadcast,

  • even if sometimes there's network issues or there's

  • buffering-- you don't necessarily want it to buffer.

  • You don't necessarily want lost information to be retransmitted.

  • You'd rather just lose a few seconds of the show

  • so that at least you're staying current, especially if you're there

  • with a bunch of other people and it would be just silly if you

  • gradually over time drift out of date.

  • And so the rest of the world is finished watching the show or the game,

  • and you're still chugging along.

  • So as an alternative to TCP, there's other protocols, one of which

  • is called UDP that's very often used for live streaming

  • and for video and applications like that, where you really just want

  • the software to forge ahead, rather than wait for some new data

  • to get transmitted.

  • But there's other things we can do with the internet.

  • And indeed, there's lots of things we ourselves do every day.

  • It's not just the web, like in downloading cats from Google.

  • But there's email, and there's Skype, and Facebook Messenger,

  • and any number of other services.

  • So how in the world does a computer upon receiving a packet of information

  • know if it is an email or if it is a web page, or put more concretely,

  • how do I know if I should show this user this cat in his or her email program

  • or in his or her browser, which might be the same?

  • In other words, how do I distinguish between one type of program

  • running on the internet from another?

  • Well, turns out that TCP also provides a standardization of services.

  • And that is just a fancy way of saying that in addition to saying

  • on this envelope to who it is and what number it is and from whom it is,

  • I also need to uniquely identify the type of service

  • whose information is in that packet.

  • And I do this just by writing a number.

  • And I typically write one of these numbers.

  • 80 if that packet is meant to be web information.

  • So HTTP is the string that most of us type most every day-- or at least

  • see these days, even though our browsers generally fill it in

  • if we don't explicitly type it.

  • It turns out that the world decided years ago

  • that if you want to send information from yourself

  • to a web server like Google to request cats,

  • you had better write the number 80 in the TO field in addition

  • to Google's IP address.

  • This way, Google knows it's not an email destined for Gmail,

  • knows it's not a message destined for Google Hangouts or the like.

  • Google servers can actually distinguish this

  • as an HTTP request or web request from any number of other services.

  • If you're using encryption, HTTPS, that special number

  • that the world standardized on is 443.

  • You rarely see this, but it's on the envelopes

  • that your Macs or PCs are actually sending to Google servers.

  • Meanwhile, there's other port numbers, so to speak.

  • If you've ever heard of FTP, file transfer protocol.

  • This is software that's not recommended anymore,

  • because it's comply unencrypted.

  • But it's still unfortunately popular in some applications

  • or with some less expensive web services.

  • 21 is the number that identifies that service.

  • And that just means inside of this packet

  • is information related to transferring files, not a web page per se.

  • 22, SSH, Secure Shell.

  • This is a very popular protocol, at least among computer scientists

  • and others, that allows you to run commands on your Mac or PC

  • on a remote server, but in an encrypted way.

  • And those kinds of packets contain the number 22.

  • SMTP-- Simple Mail Transfer Protocol-- is what email generally

  • is for outbound email.

  • So if you send an email, your envelopes have 25 on them.

  • And then lastly, DNS is again that service

  • that converts host names to IP addresses and vice versa.

  • So when your Mac or PC asks the world, hey, wait a minute,

  • what is the IP address for www.google.com?

  • That envelope has the number 53 on the outside.

  • And dot dot dot, there's dozens or even hundreds of these other things,

  • for Skype and for Google Hangouts and the like.

  • But these here are just some of the most common.

  • So the envelope, at the end of the day, has

  • a decent amount of information on it.

  • The TO address, the FROM address, and that TO address furthermore

  • has a port number associated with it.

  • And then, if it's been fragmented especially,

  • there's got to be some kind of number that

  • identifies the packet itself so that you can detect if something is missing.

  • But there's kind of a side effect, or really a feature

  • of having this level of detail on each of these envelopes.

  • You've probably heard of a firewall.

  • Maybe not in the real world.

  • In the real world, a firewall is literally

  • a wall that's meant to block fire, typically

  • in like strip malls and offices or stores that

  • are next to each other physically.

  • A firewall is meant to keep a fire that breaks out

  • in one store from traveling into another store, creating even more damage.

  • But in the software world, a firewall is a piece of software

  • that really keeps packets out that you don't want coming in,

  • or keeps packets in that you don't want going out.

  • So a firewall might be used by parents to prevent kids

  • from accessing Facebook or Google, or silly things

  • during the day for instance, if they want them focusing on other things.

  • It might be used by universities or corporations

  • to block access to certain websites that you simply

  • don't want your students or your staff actually accessing.

  • It might be used to keep corporate data inside,

  • so that nothing accidentally leaks out-- financial information, or emails,

  • or the like.

  • You can use a firewall to block outbound access as well.

  • But this invites the question then, how is a firewall implemented?

  • Well, it's not all that hard, really.

  • Because if the internet is just a whole bunch of these packets flying

  • back and forth between computers, between routers, leaving and entering

  • our own network, whether that's my home or my campus or my company,

  • I could just have my routers, for instance,

  • look at every one of those envelopes, look at the TO address,

  • maybe look at the FROM address, and just blacklist certain addresses.

  • Indeed, if I know that I don't want my employees accessing Facebook,

  • I could, for instance, just say to my routers, configure my routers,

  • do not allow any data going to or from IP address 31.13.80.36.

  • Now, it might be easier said than done, because in reality, Facebook probably

  • has multiple IP addresses.

  • So we might have to grow this list or dig a little deeper in order

  • to block them.

  • And better yet, we could potentially look inside of the envelopes themselves

  • to see, is this a Facebook packet?

  • But if they're using encryption, which they do by default

  • these days, that might not really be feasible.

  • So we can have kind of a heavy-handed solution

  • there, and just block everything we think is Facebook.com.

  • But certainly, things might leak out potentially over time if things change.

  • But what else could we do?

  • Suppose that I really don't want people Skyping during the day,

  • or I don't want people using Facebook Messenger,

  • or some software that has its own unique TCP port number

  • that some company or the world has standardized on.

  • You could block all outbound email by just blocking port 25, it would seem,

  • or a few other ports that are popular.

  • You could block all web access by blocking 80 and 443.

  • You could block all DNS traffic, if you really want.

  • And indeed, a lot of companies do this, especially

  • like Starbucks kind of places, internet cafes in airports and the like.

  • Sometimes they only want you using their DNS server,

  • not your own company's or your own home's.

  • And so they can block access to any DNS server other than their own.

  • This is unfortunately often or sometimes for advertising

  • reasons, so that they can actually keep track of what you're accessing

  • and where and why-- or where, at least.

  • But it's all possible technologically with this underneath the hood.

  • So what are some of the defenses in place,

  • especially when you want to visit some site that isn't necessarily encrypted?

  • Or maybe you want to visit some site that is blocked,

  • and you want to simply be able to work around this, because you're traveling

  • or you need to be able to access something privately

  • at your home or your work.

  • Well it turns out, that there are services called VPNs or Virtual Private

  • Networks.

  • And Harvard has one VPN at vpn.harvard.edu.

  • And Yale has one as well at access.yale.edu.

  • And this is simply software that you generally download to your phone

  • or your computer that allows you to connect via some protocol and some port

  • to your company or to your home's network, but in an encrypted way.

  • So a VPN gives you an encrypted tunnel, so to speak,

  • so that you are connected to the internet.

  • That's a precondition.

  • You have to get on the internet itself.

  • But then you configure your Mac or PC to route

  • all-- in theory-- of your internet traffic through the VPN.

  • So even if I'm just visiting Gmail or Facebook or whatever on my Mac,

  • if I'm connected to Harvard's VPN, all of that traffic by design

  • is going through Harvard.edu first, and then it's

  • going out to Facebook or Google or wherever it's destined.

  • Similarly, if I'm traveling in a foreign country that

  • happens to block a lot of internet access, if they do allow VPN access,

  • I can, in my hotel room or wherever, connect to Harvard or to Yale, route

  • all of my internet traffic through Harvard or Yale, and then from Yale

  • to Harvard to wherever I'm going on the internet.

  • And the upside of this is that it's entirely encrypted,

  • which means no one at that company or that country in theory

  • knows what data is going through the tunnel.

  • But it also potentially costs me a good amount of time.

  • We've seen that we're really only talking milliseconds,

  • but hundreds of milliseconds can certainly add up.

  • So if I'm abroad, for instance, trying to connect to some website that's

  • going from that country to Harvard, to the destination, back to Harvard,

  • back to the country I'm in, your internet connectivity might be slower,

  • but at least it's not actually permanently blocked.

  • So if you've ever heard of friends of yours actually accessing services

  • like Netflix or Hulu, that for licensing reasons,

  • do restrict you typically to being in this country--

  • this is why you might have read that Hulu and Netflix and others are

  • cracking down on people using VPNs, whether it's

  • Harvard's or Yale's or a third party companies,

  • so as to circumvent those licensing restrictions.

  • But technologically, all it's doing is giving you

  • an encrypted tunnel between you and someone

  • you have an affiliation with, like Harvard or Yale,

  • and encrypting all of your traffic in between there,

  • and routing all of your traffic through it.

  • So with that said, we've looked at DNS, and we've looked at DHCP,

  • and we've looked at routers.

  • And there's other hardware still, whether, it's

  • in your home or campus or office, there's

  • things like switches, which are fairly simple devices that just have lots

  • of ethernet jacks, so to speak, that you can plug physical cables into,

  • and those cables can then intercommunicate,

  • so that you can wire computers together en mass.

  • There are things called access points or APs.

  • Those are the things around campus that have the little bunny ear

  • antennas that are often blinking.

  • Those are the wireless access points.

  • And access points often have firewalls, often have routing software built in.

  • So the line is increasingly blurry these days as to what these small devices do.

  • So it really is the services that matter.

  • And indeed, while a little dated, I thought

  • it would be fun to take a look now at a longer form version of the 60

  • second trailer of Warriors of the Net that

  • was made a few years ago to paint a more visual picture of how

  • the internet works.

  • It definitely takes some liberties with shall we say accuracy.

  • But it also helps paint a picture of what

  • really is going on underneath the hood.

  • So let's take a look at the internet.

  • [MUSIC PLAYING]

  • [VIDEO PLAYBACK]

  • -For the first time in history, people and machinery

  • are working together, realizing a dream.

  • A uniting force that knows no geographical boundaries, without

  • regard to race, creed, or color.

  • A new era, where communication truly brings people together.

  • This is the Dawn of the Net.

  • Want to know how it works?

  • Click here to begin your journey into the net.

  • Now exactly what happened when you clicked on that link?

  • You started a flow of information.

  • This information travels down into your own personal mail

  • room, where Mr. IP packages it, labels it, and send it on its way.

  • Each packet is limited in its size.

  • The mailroom must decide how to divide the information, and how to package it.

  • Now the package needs a label, containing important information,

  • such as sender's address, receiver's address, and the type of packet it is.

  • Because this particular packet is going out on to the internet,

  • it also gets an address for the proxy server, which has

  • a special function, as we'll see later.

  • The packet is now launched onto your Local Area Network, or LAN.

  • This network is used to connect all the local computers, routers,

  • printers, et cetera for information exchange

  • within the physical walls of the building.

  • The LAN is a pretty uncontrolled place, and unfortunately, accidents

  • can happen.

  • The highway of the LAN is packed with all types of.

  • Information these are IP packets, Novell packets, Apple Talk packets.

  • They're going against traffic, as usual.

  • The local router reads the address, and if necessary,

  • lifts the packet onto another network.

  • Ah, the router.

  • A symbol of control in a seemingly disorganized world.

  • [METHODICAL MUTTERING]

  • There he is, systematic, uncaring, methodical, conservative,

  • and sometimes not quite up to speed.

  • But at least he is exact, for the most part.

  • As the packets leave the router, they make their way

  • into the corporate internet and head for the router switch.

  • A bit more efficient than the router, the router switch

  • plays fast and loose with IP packets, deftly routing them along the way.

  • A digital pinball wizard, if you will.

  • [ERRATIC MUTTERING]

  • As packets arrive at their destination, they're

  • picked up by the network interface, Ready to be sent to the next level.

  • In this case, the proxy.

  • The proxy is used by many companies as sort of a middleman

  • in order to lessen the load on their internet connection,

  • and for security reasons as well.

  • As you can see, the packets are all of various sizes,

  • depending on their content.

  • The proxy opens the packet and looks for the web address or URL.

  • Depending upon whether the address is acceptable,

  • the packet is sent on to the internet.

  • There are, however, some addresses which do not

  • meet with the approval of the proxy.

  • That is to say, corporate or management guidelines.

  • These are summarily dealt with.

  • We'll have none of that.

  • For those who make it, it's on the road again.

  • Next up, the firewall.

  • The corporate firewall serves two purposes.

  • It prevents some rather nasty things from the internet

  • from coming into the intranet, and it can also

  • prevent sensitive corporate information from being sent out onto the internet.

  • Once through the firewall, a router picks up the packet,

  • and places it onto a much narrower road, or bandwidth, as we say.

  • Obviously, the road is not broad enough to take them all.

  • Now, you might wonder what happens to all those packets

  • which don't make it along the way.

  • Well, when Mr. IP doesn't receive an acknowledgement

  • that a packet has been received in due time,

  • he simply sends a replacement packet.

  • We are now ready to enter the world of the internet, a spider

  • web of interconnected networks which span our entire globe.

  • Here, routers and switches establish links between networks.

  • Now, the net is an entirely different environment

  • than you'll find within the protective walls of your LAN.

  • Out here, it's the Wild West.

  • Plenty of space, plenty of opportunities,

  • plenty of things to explore and places to go.

  • Thanks to very little control and regulation,

  • new ideas find fertile soil to push the envelope of their possibilities.

  • But because of this freedom, certain dangers also lurk.

  • You'll never know when you'll meet the dreaded ping of death.

  • A special version of a normal request ping, which some idiot thought up

  • to mess up unsuspecting hosts.

  • The path our packets take may be via satellite, telephone lines, wireless,

  • or even transoceanic cable.

  • They don't always take the fastest or shortest routes possible,

  • but they will get there eventually.

  • Maybe that's why it's sometimes called the world wide wait.

  • But when everything is working smoothly, you

  • can circumvent the globe five times over at the drop of a hat, literally.

  • And all for the cost of a local call or less.

  • Near the end of our destination, we'll find another firewall.

  • Depending upon your perspective as a data packet,

  • the firewall could be a bastion of security or a dreaded adversary.

  • It all depends on which side you're on and what your intentions are.

  • The firewall is designed to let in only those packets that meet its criteria.

  • This firewall is operating on ports 80 and 25.

  • All attempts to enter through other ports are closed for business.

  • Port 25 is used for mail packets, while port 80 is the entrance for packets

  • from the internet to the web server.

  • Inside the firewall, packets are screened more thoroughly.

  • Some packets make it easily through customs,

  • while others look just a bit dubious.

  • The firewall officer is not easily fooled,

  • such as when this ping of death packet tries to disguise itself

  • as a normal ping packet.

  • For those packets lucky enough to make it this far,

  • the journey is almost over.

  • It's just a line up on the interface to be taken up into the web server.

  • Nowadays, a web server can run on many things,

  • from a mainframe to a webcam to the computer on your desk.

  • Why not your refrigerator?

  • With a proper set up, you can find out if you

  • have the makings for chicken cacciatore or if you have to go shopping.

  • Remember, this is the dawn of the net.

  • Almost anything's possible.

  • One by one, the packets are received, opened, and unpacked.

  • The information they contain, that is, your request for information,

  • is sent on to the web server application.

  • The packet itself is recycled, ready to be used again, and filled

  • with your requested information, addressed, and sent out on its way

  • back to you, back past the firewall, routers,

  • and on through to the internet, back through your corporate firewall,

  • and onto your interface, ready to supply your web browser with the information

  • you requested, that is, this film.

  • Pleased with their efforts, and trusting in a better world,

  • our trusty data packets ride off blissfully

  • into the sunset of another day, knowing fully they

  • have served their masters well.

  • Now isn't that a happy ending?

  • [END PLAYBACK]

  • DAVID MALAN: All right, so that is how the internet works.

  • And as has been our tendency over the past few weeks,

  • now that we know how we can get data from point A to point B,

  • we can abstract above that, and just take

  • for granted now that we can move data from point A to point B

  • and start moving the actual data.

  • So that invites the question now of what is inside this envelope.

  • When I get a response back from Google containing a whole bunch of cats,

  • or when I get back my news feed from Facebook, or my inbox from Google.

  • Well, inside of these packets quite often

  • is messages that conform to HTTP, the Hypertext Transfer Protocol.

  • So this is just one of those services that we alluded to earlier.

  • Among them also were SSH, and DNS, and SMTP, and yet others.

  • But HTTP is perhaps by far the most common one in so far

  • as we use the web so much these days.

  • So inside of HTTP, there are certain types

  • of messages, messages that conform to certain patterns by which we

  • get information.

  • Now, what is the P in HTTP?

  • HTTP, Hypertext Transfer Protocol.

  • Well, let me borrow Arthuro over here.

  • And we have this silly human convention of course

  • that when you meet someone for the first time or the first time in a while,

  • you say, oh, hi, my name is David.

  • Nice to meet you, Arthuro.

  • And we exchange hands.

  • And when I put out my hand, Arthuro knows to put out his hand.

  • And then we do this silly handshake.

  • Why is that?

  • Well, it's just a protocol.

  • It's a convention.

  • It's a set of conventions that we humans for better or for worse

  • have adopted by which we greet each other.

  • Similarly do computers have protocols via which they communicate,

  • and sets of conventions that govern how you start to communicate

  • and how you finish communicating.

  • So what do those messages actually look like?

  • The simplest of them is quite literally this verb

  • here, get, whereby inside of this envelope, when

  • I'm requesting information of Google for the first time--

  • and indeed, I put that message before, search

  • for cats-- that actually has a certain message at the top of it, really,

  • that is literally get.

  • There's a little more information, but at the end of the day, it just is get.

  • Specifically, these are the first couple of lines

  • inside of any request that my browser makes of a web server,

  • like in this case, harvard.edu.

  • If I want to get the default home page of Harvard., I literally,

  • inside of my envelope, write this message-- GET slash space HTTP/1.1,

  • which is the latest version of HTTp that people use.

  • Then below that, I specify the host that I want to talk to, just in case

  • Harvard or Google or whoever has multiple domain names physically

  • running on the same servers, which is possible.

  • So I say host, www.harvard.edu.

  • And then maybe there's some other text.

  • But this first line or two is really the most important.

  • And then what comes back from the server,

  • whether it's being sent to Harvard or being sent to Yale,

  • is a response that hopefully says is literally, OK, inside

  • of which is the cat or inside of which is the inbox for Gmail

  • or inside of which is my news feed from Facebook.

  • All of which typically are in this language here,

  • HTML-- HyperText Markup Language.

  • So whereas HTTP is a protocol, like a sort of handshake agreement

  • that governs that when I want to request information of a server,

  • I should say GET and then a few other words,

  • and then the server should respond with OK and a few other words,

  • HTML is the language in which the actual web

  • pages that are coming back from Google or Facebook or Harvard or Yale

  • are actually written in.

  • It's not a programming language like C or Scratch.

  • It's a markup language, as we'll see, that

  • really controls formatting and layout.

  • There aren't ifs and loops and other such constructs instead.

  • But that's what's below the dot dot dot when the response comes back

  • from Harvard or Yale or Google is this language HTML.

  • Now, 200 is a status code, so to speak, that we almost never actually see

  • from a server.

  • But odds are, some of you have seen at least one of these status codes before.

  • And perhaps the most obvious or the most familiar

  • is probably this one here, when you've requested some web page,

  • and either it doesn't exist anymore or you have a typo more commonly

  • or the URL is broken for some reason.

  • Odds are you have literally seen the status code 404,

  • because the server is just showing it to you.

  • But at a lower level, these numbers are actually

  • typically sent in these packets of information

  • back and forth from the server to me.

  • But we'll see before long that you can use status codes like 301 and 302

  • to you induce redirects, so to speak.

  • If you want to send the user from one URL to another-- maybe

  • the domain name is changed-- you can do that there.

  • For efficiency, a server can say 304, not modified.

  • As in, you already asked me for this page.

  • It hasn't modified since you asked me for it,

  • I'm not going to send it to you again, thereby

  • saving a bit of time and bandwidth.

  • Unauthorized or forbidden generally means

  • that you don't have access to the file for some reason.

  • And 500's actually pretty bad.

  • So we'll probably induce this ourselves before long when we actually write

  • programs that run on a web server.

  • But 500 means there's generally a problem in your code

  • that's supposed to be serving up web content to browsers.

  • So let's actually see these kinds of things too.

  • It turns out that I can pretend to be a browser at my command line here.

  • In fact, I can use a program called Telnet,

  • which is an older program, similar in spirit to something called SSH,

  • which I mentioned earlier, but it's not encrypted.

  • But it allows me to connect to a remote server specifically on a certain port.

  • So I for instance, can connect to harvard.edu and on port 80

  • specifically.

  • I could actually with textual commands send emails to Harvard in this way,

  • or send chat messages if they support that.

  • But for now, we're focusing only on HTTP, the unencrypted version.

  • And if I go ahead and hit enter, you'll see

  • that I'm connected to www.harvard.edu.cdn.cloudflare.net,

  • which is curious.

  • But it turns out-- and we could see this if we poked around with nslookup again.

  • It turns out that Harvard is also outsourcing its home

  • page to a third party CDN-- Content Delivery Network-- called Cloudflare,

  • so Harvard's servers really live elsewhere.

  • And now I talked too long and the connection got automatically closed.

  • So let me go ahead and redo this, and just pretend to be a browser by typing

  • GET/HTTP/1.1 host www.harvard.edu and then Enter Enter twice.

  • And it flew across the screen, but let me scroll back up to the top.

  • This is-- even though it might look cryptic to you at the moment

  • if you've never made web pages before-- this is this language called HTML.

  • And it's quite a lot of HTML, so let me keep scrolling up and up and up and up.

  • Until hopefully if we go up high enough-- oh, I've exceeded my buffer.

  • So I'm going to do this differently.

  • I'm going to go ahead and-- you might recall

  • from a past problem, where you can actually redirect the output to a file.

  • So I'm going to go ahead and save this in a file called output.txt.

  • GET/HTTP/1.1 host www.harvard.edu, enter, enter.

  • And now I'm going to go ahead and open this file, which is here.

  • And you can see that what just happened was this.

  • The server responded with 200 OK, which is great.

  • And then the date of the server in Greenwich Mean Time.

  • And then a bunch of information.

  • Cookies, we'll come back to these before long.

  • But those will be germane to when we actually

  • write our own software for the read.

  • Drupal, seems that Harvard's website is using

  • Drupal, a popular content management software for websites.

  • And then there's some other stuff about caching

  • and when the site expires and so forth.

  • This is a little strange.

  • Harvard's website apparently expired in 1978.

  • But more on that another time.

  • And so there's some interesting HTTP headers

  • besides things like the host field that we sent and the GET and the OK

  • that I mentioned earlier as well.

  • Now, Telnet is not a very user-friendly way to do this.

  • I'm going to actually redo this with a different command, Curl,

  • whereby I can do a curl-I, and I'm going to then do

  • the full URL-- www.harvard.edu, Enter.

  • And now what's nice with curl.

  • Is that I don't actually see the HTML.

  • I only see in this case the HTTP headers, which are still quite a few,

  • but we can now at least see them a little more readily.

  • In fact, let me go and do the same now for yale.edu,

  • and see if we can glean any differences in their servers.

  • There we go here.

  • So the headers that are coming back for Yale are these that I've highlighted.

  • And it looks too that there's some interesting stuff going on.

  • It seems that Yale also uses Drupal.

  • So it seems that both universities are doing something rather familiar.

  • But most of this information is not all that useful.

  • But it is useful if maybe we do this.

  • What if we visit, for instance-- why don't we

  • go to HTTP-- how about we go to reference.cs50.net, which you might

  • use as an alternative to man pages.

  • And this is a little curious.

  • It moved permanently.

  • This is not 200 OK.

  • Move permanently.

  • Where did it go?

  • Well, wait a minute, let me go ahead and highlight that URL.

  • And let me go ahead in another tab and just go there.

  • OK, it's there.

  • So where did it move to?

  • And in fact, if I look at the domain again,

  • it is indeed there, but notice this.

  • Almost all of CS50's website's actually run not over HTTP per se

  • but HTTPS, where the S means secure, whereby

  • all of our websites for the most part are encrypted.

  • But that's not what I typed.

  • I just went to http://reference.cs50.net.

  • And yet when I do that with this command line interface, which

  • mimics the behavior of a browser, if I visit HTTP, I'm told by CS50's server,

  • moved permanently, status code 301.

  • But notice this one other header that's kind of interesting-- location.

  • This location header-- and a header to be clear

  • is just a word, a colon, and then a value.

  • This header specifies where we move to.

  • So this seems to be a mechanism whereby using HTTP

  • headers-- sort of messages inside the envelope

  • that the human doesn't really see, but that the browser doesn't understand.

  • This seems to be a way that we can forcibly

  • redirect all users from the insecure version of our website

  • to the secure version, so that thereafter, all of the information

  • is secure.

  • And frankly, there's not all that much private information going on there.

  • But if you don't really want the whole world or the NSA

  • or Harvard or Yale knowing what pages, what functions you

  • need to look up on reference.cs50.net, by forcing everything

  • to HTTPS, in theory, everything is perfectly secure now so

  • that only you know what pages you're visiting.

  • And we, since we run the server.

  • But no one in between.

  • And indeed, that's one of the biggest values of using HTTPS-based URLs,

  • so that even if there is some man in the middle,

  • so to speak, a bad guy, an adversary between you and that remote server,

  • whether it's here on campus or in Starbucks or the airport

  • or some random adversary on the internet, he or she in theory

  • should not be able to see anything between points A and B

  • if you are, as before using a VPN between those points or two,

  • using a protocol like HTTPS that by design is encrypting information.

  • And suffice it to say the encryption is far fancier than Caesar or Vegener.

  • But it is indeed similar in spirit, where

  • those zeros and ones going back and forth are scrambled in some way

  • that only you and the point B server can actually decode them or decrypt them.

  • So let's visit an actual website now, Google.

  • But before we do that, let's turn off some of the more modern features

  • by going to Setting, going to Search Settings,

  • and turn off so-called instant results.

  • Because for our purposes today, instant results

  • use a technology or language called JavaScript,

  • which we'll get to in a few weeks' time, but for now it's

  • just going to be a distraction from the underlying HTTP feature.

  • So I'm going to go ahead and indeed never show instant results.

  • So that now when I search for something like cats on google.com and hit Enter,

  • I'm going to find myself at a fairly long URL, indeed this URL here.

  • And I have no idea what most of this URL means, not knowing

  • how Google works underneath the hood.

  • But I'm looking for some familiar patterns.

  • And indeed, if I pretty much a little ignorantly but hopefully cleverly just

  • delete anything I don't understand, I'm going to deliberately leave myself

  • with just the essence of this URL.

  • So notice, I didn't type this URL.

  • I ended up at this URL after I typed in cats to that search box and hit Enter.

  • Now I found myself in a really long URL and then

  • I just started deleting things I didn't understand to distill

  • this URL into quite simply this.

  • https://www.google.com/search?q=cats.

  • Well, it turns out that much like in the world of C,

  • you have functions from CS50 like getString and getInt,

  • or if you implement them yourself, scanF or other such functions

  • whereby you can get user input.

  • It's less obvious at first glance how a web server can get input from a user.

  • Because there is no-- well, rather, you can see the search

  • box that I typed into, but until I hit Enter,

  • the server doesn't see that information necessarily.

  • And that's a bit of a white lie, because nowadays thanks to JavaScript

  • and thanks to autocomplete, Google's actually

  • seeing every keystroke you type.

  • But in theory, when I hit Enter, only when I hit Enter,

  • do they see the full word cats.

  • And how do they get access to it not having physical access to my keyboard?

  • They see it in the URL here.

  • And so indeed HTTP, beyond supporting status codes and the sort

  • of digital equivalent of my handshake with Arthuro,

  • also supports input, specifically input parameters that in this case

  • is arbitrarily but reasonably called q, because back in the day,

  • Google decided that the default input to its search page would be q for query.

  • And indeed, if I hit Enter now, the results seem no different.

  • So for whatever reason, Google uses by default a lot more parameters,

  • all of which I deleted.

  • But the only necessary one is cats.

  • And notice even without changing the page, I can go up in here

  • and change my cats to dogs and hit Enter.

  • And now notice I've searched for dogs just as though I had typed this myself.

  • But indeed, the only thing I've been changing up here is the keyword.

  • And if I search for mice now, I'm changing the search result.

  • So it seems that the essence of an HTTP request

  • boils down to what is sent here.

  • So let's try this as well.

  • Let me go ahead and copy that URL.

  • And just for good measure, I can go ahead and do something like curl

  • and then paste this URL.

  • And let me go ahead and quote it, just because it has a question

  • mark that could break things.

  • And hit Enter.

  • It's pretty overwhelming here, but this is all of the HTML

  • that's coming back from google.com.

  • So when I see these search results in google.com,

  • this web page is written in this language called HTML.

  • And HTML, as we'll see, is a little overwhelming perhaps at first glance,

  • but follows some very simple patterns.

  • And we can see them better in browsers like Chrome as follows.

  • If you Control-Click or right click on your web page, most any web page

  • if you're using Chrome, you can choose Inspect.

  • And there's keyboard shortcuts and other menu options

  • by which you can access this.

  • And notice among the elements tab here that just popped up.

  • And notice now, again a little overwhelming.

  • But what's nice about Chrome-- and Edge can do this and Firefox and Safari

  • and others-- it can pretty print your HTML.

  • Sort of like Style 50 you can sort of see through any messiness,

  • similarly, can the browser kind of look at the mess that

  • just came across the wire from Google and format it as follows.

  • And indeed, it looks like this language HTML follows a certain pattern.

  • There's always this at the top, open bracket, exclamation point, doc type,

  • HTML, close bracket.

  • Then there's open bracket html in lower case, then some other words and quotes

  • and equals signs perhaps.

  • Then a head, then a body.

  • Maybe some divs for divisions of the page.

  • And even though this is quite a lot, let's look at a simpler one

  • just for kicks real fast.

  • Let's go to harvard.edu and hit Enter.

  • And indeed-- well, actually, it looks just about as complicated.

  • Here's the HTML that composes harvard.edu.

  • So let's try to distill this into its essence.

  • I showed a web page earlier.

  • Let's go back to that to point out-- to be clear,

  • these were called query strings.

  • Let's come back to HTML.

  • So HTML is up to version 5 these days.

  • And this governs what syntax you should use when writing HTML.

  • And here per the earlier slide is perhaps

  • the simplest web page we can make.

  • So the key components-- and there's others we can add

  • and others we will soon add-- boil down to this.

  • This first line, this is so-called document type declaration.

  • This is just a fancy way of saying, you have to type this line first

  • in your file in order to tell the browser that's

  • reading this file top to bottom, left to right this web

  • page is written in version 5 of HTML.

  • Previous versions either didn't have this or had longer versions of this.

  • Is just a globally-understood symbol that means version 5.

  • Then below that is your actual HTML tags.

  • So web pages are composed of HTML tags, or more properly, elements.

  • And most elements have an open tag and a closed tag-- a start tag

  • and an end tag-- that are identical, except for typically the slash.

  • So indeed, notice the symmetry.

  • This tag here, and so far it's what we'll call an open tag or start tag,

  • means hey browser, here comes a web page written in HTML.

  • Hey browser, here comes the head of the web page.

  • Hey browser, here comes the title of the web page.

  • And there's no technical reason I wrote this all on one

  • line instead of putting hello world on its own line and this other tag

  • on its own line.

  • It just felt short enough to just write in one line, so I went with it.

  • But notice that title is open tier.

  • Then there's literally some hard coded text, hello world.

  • And then there is the opposite so to speak, of the tag.

  • It's the same word for the tag, but this forward

  • slash inside of the tag, which closes or ends the tag

  • and sort of ends the whole title element.

  • Meanwhile, that's it for the head, at least in this example.

  • So hey browser, that's it for the head.

  • Oh hey, browser, here comes the body.

  • Hey browser, here's some actual text.

  • Hey browser, that's it for the body.

  • Hey browser, that's it for the web page.

  • So I've also by convention-- and for stylistic purposes like in C--

  • indented things to be very pretty printed, very readable to humans.

  • But the browser certainly doesn't care.

  • And indeed, we saw when we looked at the mess that is Google's website,

  • it's just a big mess of tags and markup so to speak.

  • But for Google, that makes sense, because you

  • don't want to have to transmit any characters unnecessarily.

  • Indeed, if you think about it, if Google's website gets

  • visited by a billion people per day, which

  • actually feels kind of reasonable.

  • And suppose that a programmer at Google hits the space bar just one extra time

  • and saves Google's home page.

  • Well what's the implication of Google having just one

  • additional space in their web page?

  • If that web page is downloaded a billion times,

  • that's a billion extra ASCII characters that gets downloaded per day.

  • And a billion ASCII characters is a billion bytes, which is one gigabyte.

  • So just by hitting the spacebar can really big players like Google

  • cost themselves a huge amount of space and maybe cost or time.

  • So that's why a lot of big websites minify or compress their information,

  • whereas we will be a little more lax here,

  • because it's more important for now certainly that things

  • be readable and understandable.

  • But the white space does not matter to the browser.

  • So let's actually do something with this.

  • Keeping in mind the following, just as this indentation kind of implies,

  • this really if you think about it is a tree structure.

  • There's some document on the screen, which I will literally call document,

  • because that's what browsers do.

  • The top element of which-- I'll draw with a rectangle,

  • distinguish it from the document itself-- is the HTML element that

  • starts here and ends here.

  • And in so far as it starts here and ends here, everything that's inside of it,

  • you can think of as children in a family tree.

  • And the first child is head, the second child

  • is body, left and right respectively.

  • The head tag meanwhile has the title child,

  • and so that's why we see title here.

  • And then I'll draw it with an ellipse, just different shape

  • because it's raw text.

  • It's not an actual tag.

  • And similarly does body have some text below it.

  • So this is just a tree.

  • It's not a binary tree, although it might be by coincidence here,

  • because there aren't many children.

  • But it's some kind of tree structure, each of whose nodes has zero

  • or more children.

  • And indeed, underneath the hood what is IE,

  • what is Edge or Firefox or Chrome or Safari actually doing

  • when it downloads a web page like this?

  • Some programmer or programmers have after taking classes like CS50

  • and knowing what these data structures are implemented in code a tree that

  • represents that web page.

  • And indeed, once in a few weeks we get to JavaScript

  • using yet another language will you be able to manipulate

  • that tree in real time to change the contents of a web page

  • and what a user is seeing.

  • Indeed, if you kind of fast forward in your mind,

  • suppose that you do use something like Facebook and Messenger

  • built into it for sending messages to people or Gmail,

  • where you suddenly get new rows of emails and your web page,

  • what's really happening?

  • Every time you get a message in Facebook,

  • it's just as though this tree is getting modified with like another child

  • somewhere in here.

  • Every time you get a new email in Gmail, it's

  • like another node is appearing in this tree.

  • So there really is this equivalence to this markup language HTML and the tree

  • structures that we've just come from in recent weeks.

  • So let's actually now do something with this.

  • I'm going to go over to CS50 IDE, and I'm

  • going to go ahead and make if you will the simplest of web pages as follows.

  • I'm going to go ahead and create a new file, a text file.

  • I'm going to call it hello.html.

  • And I'm going to go ahead and populate this

  • with exactly what we saw a moment ago.

  • Doc type, HTML.

  • Open bracket, HTML.

  • And notice that CS50 IDE is trying to be helpful here,

  • and when it notices you typing something familiar,

  • it's going to try to finish your thought for you.

  • So indeed, it did.

  • I'm going to go ahead and open now the head of the page.

  • It's going to complete that thought.

  • I'm going to open the title of the page, hello world.

  • And now I'm going to move my cursor down here physically

  • to do body, close bracket, hello comma world, save.

  • So I have written code.

  • It's source code, but it's code written in HTML-- HyperText Markup Language.

  • And indeed, you see no loops or conditions or functions.

  • There's no logic.

  • This is just markup.

  • Do this, stop doing this.

  • Do this, stop doing this.

  • It's fairly mundane.

  • But it's going to allow us to actually visit this file in a browser.

  • Indeed, let me go into a browser now and visit this page hello.html.

  • Incredibly underwhelming.

  • Indeed, this is a huge screen.

  • And all I've created is a web page that says hello world up here.

  • And if I scrolled up, I could actually see the tab

  • whose title is also hello world.

  • But that's my first web page.

  • And if I now apply a lesson learned, if I go ahead and right click

  • or Control-Click Chrome's backdrop and choose inspect,

  • now you'll notice finally here's a simple web page,

  • and not all the messiness that was Harvard's or Google's.

  • You can actually see your HTML.

  • You can't permanently change the files here,

  • because you need to do that in CS50 IDE and change the files.

  • And so here's where there's a potential point of confusion.

  • CS50 IDE is of course a cloud based service,

  • and it's where I'm writing and saving my files.

  • And it just so happens that built into CS50 IDE

  • is its own web server just for serving students work.

  • So when I visit this web here in another tab, I'm visiting not CS50 IDE per se,

  • but the web server running on a certain port on CS50 IDE

  • so I can serve up these web pages.

  • So let's go ahead and do something a little more interesting than that.

  • Let me go ahead now and create another file say as follows.

  • Let me go ahead and copy this just for good measure

  • so I don't have to recreate the whole thing.

  • And let me go ahead and create a new file called Image.html.

  • Paste this in here.

  • And instead of hello world, I'm just going to write say image up here.

  • And how do I embed an image?

  • Well, turns out that there is that literally an image

  • tag-- img to be succint.

  • Indeed, you might want to write out this.

  • But nope, back in the day people decided that img is sufficient.

  • I'm going to go ahead and give it a source.

  • What should the source of this be?

  • Well, let me just do a quick search for like a grumpy cat.

  • And there's a good one.

  • So I'm going to go ahead and Control-Click or Right

  • Click for our purposes now just the image address here.

  • We'll assume this is my image and I'm grabbing the address here

  • for the moment.

  • I'm going to paste it in here, in that there is the URL of a JPEG that

  • is of a grumpy cat.

  • Now with an image, there isn't really the same concept

  • of like starting an image and stopping an image like there

  • is start the title stop the title, start the body, stop the body.

  • And so there are so-called empty elements in HTML

  • that you can express either by doing this, which feels a little silly.

  • Like you're opening the image tag and then immediately closing it,

  • which feels a little ridiculous.

  • And so there's shorter hand syntax where you can actually

  • put the slash inside of the open tag like this so

  • that the element is empty so to speak.

  • Open and closed.

  • It's not strictly required, but at least this way

  • we're making clear our intent is to open and close the thing all at once.

  • Now for accessibility purposes, for someone who has trouble with vision,

  • you might want to provide some alternative text like grumpy cat

  • so that if they're using a screen reader or some other device, there

  • it can actually have a system support explaining what it

  • is that you might otherwise be seeing.

  • So let me go ahead now and open this file image.html.

  • And it's pretty darn simple.

  • But there is my own web page with this big white background,

  • and nothing else yet and this grumpy cat.

  • All right, but of course this web page doesn't do anything.

  • It would be nice if I could click on something and go somewhere.

  • So let's do that.

  • Let's do another example whereby-- I'll call this link.html.

  • And in here-- let me get started just by copying and pasting

  • that-- instead of the cat, let me go ahead and do a an anchor.

  • So it's a little counterintuitive.

  • It's not link, it's anchor.

  • And then anchor, confusingly, has a hyperreference,

  • which is the link to which it goes.

  • And I'm going to go ahead and do something clever

  • like https://www.google.com/search?q=cats.

  • And then close bracket.

  • And now notice CS50 IDE is trying to be helpful.

  • It closes the tag for me, and I can just write the word cats.

  • But let me finish this thought.

  • Let me say search for cats period.

  • And so now, even though we've seen only some simple tags so far,

  • you can use to HTML in line, so to speak, sort of

  • in the middle of another thought.

  • If I want to convey the sentence search for cats,

  • but I want cats to be clickable so that when you click on the word cats

  • it actually goes to Google and searches for cats,

  • I can borrow the idea from earlier-- and I just

  • happen to remember that q is the query that I have to pass in.

  • And notice that I surround cats with the open tag and the close tags.

  • So that now if I open a browser with this file,

  • I see again, a very simple web page.

  • And I can even zoom in to make this more clear.

  • All it says is search for cats period.

  • But notice, it's the link alone that's underlined.

  • And it happens to be purple by default, because we already

  • searched for cats earlier, and browsers typically remember URLs you visited.

  • So that's why it's purple and not say blue, which tends to be the default.

  • But if I click on this, indeed, I get a page full of cats.

  • I can combine these ideas.

  • Let me actually go into the IDE, and instead of the word cats,

  • let me go ahead and paste the image tag.

  • So it's a little hard to see all on one line

  • here, but notice I can search for a href, close this tag.

  • And then immediately open the image tag with its same value as before.

  • And then close that.

  • And then close the anchor tag.

  • Save that, reload.

  • Now it's a little stupid grammatically.

  • Search for cat picture.

  • But notice if I hover over the cat, my cursor becomes a little pointer.

  • And indeed, if I look in Chrome's bottom left corner, I'll see that if I click,

  • it's going to lead me to a URL.

  • And indeed, if I click on the cat, anywhere on the cat,

  • now I've made a hyperlink.

  • So now the world wide web so to speak is getting more interesting.

  • It's getting pretty ugly, but at least it's getting more interesting.

  • So what are these things?

  • They're not tags, per se.

  • These are what we'll call attributes.

  • So indeed, it seems that based on these simple examples

  • alone certain tags like image can have their behavior modified

  • with these attributes.

  • And the format for those is a keyword like alt for alternative

  • equals and then quote unquote some value,

  • and source-- src-- which is by design.

  • You can't write out source S-O-U-R-C-E. You'd have to do src per

  • the documentation equals quote unquote some URL.

  • And you would only know that these things

  • exist by googling around, reading some online documentation, taking a class.

  • But thankfully, there's not terribly, terribly many of them.

  • And most every one can be looked up on demand when

  • you're curious how to do something.

  • In fact, let's take a look at a few other tags

  • some this time that I've put together in advance.

  • We have a whole bunch of online examples that you're welcome to look for online.

  • Here's one that has a whole bunch of paragraphs.

  • So in this page here, notice that I've done a couple of things.

  • Inside of my body, I have a bunch of Latin paragraphs.

  • Sort of nonsensical Latin, but I've wrapped each of them

  • in an open p tag and a closed p tag, simply because I want these

  • to be three separate blocks of text.

  • And let me go ahead into my browser now and open this file in today's directory

  • as paragraphs.html.

  • And that's it.

  • It's a little more interesting now that it fills the screen.

  • But indeed, there are distinct paragraphs.

  • There's one other tag that I proactively included here, which

  • is a little cryptic at first glance.

  • But this is a metatag that has to go in the head of the web page.

  • And here too you would know this from some online reference.

  • And it's cryptic only insofar as there's a lot of words here.

  • But the effect of this essentially is that if this same web

  • page is viewed not on my browser but on my phone, which might otherwise

  • be pretty small to look at, and I'd have to squint to see the text,

  • this tag is one technique for actually telling the web

  • page to sort of resize itself and the text for whatever the device with is.

  • So without this tag, these three paragraphs

  • you might have to squint to actually read them pretty well on an Android

  • phone or an iPhone.

  • With that tag, the font size will sort of

  • grow to take into account the fact that this is a smaller device

  • and everything should not just be squeezed in on there.

  • But otherwise, syntactically, everything else there is the same.

  • Let's look at another example.

  • If I go into headings.html, this one doesn't do all that much.

  • But it seems to demonstrate tags called H1 through H6, literally saying

  • one, two, three, four, five, six.

  • And by convention, though this differs ever so slightly by browsers,

  • H1 is big bold text.

  • H2 is not quite as big, but still bold text.

  • H3 is not quite as big.

  • H4 not quite as big.

  • Headings that you might see in a research paper

  • or in the chapters and sections or subsections of a book.

  • It's a way of adding sort of semantic headings to a web page that in our case

  • might look ultimately like this.

  • From bigger to smaller.

  • And so these might just be the section headings in some book

  • or some kind of reference like that.

  • What about lists, which are pretty common?

  • Well, if we go into list.html, it's pretty common on the web

  • or in various applications to have bulleted lists or ordered lists.

  • This is in an unordered list of bullets, foo, bar, and baz, which

  • are just silly variable names in computer science.

  • And if we want to see what this one is, if I go into list.html,

  • you'll see quite simply that we just have a little more nesting.

  • Body, UL, and LI.

  • So UL us Unordered List, LI is List Item, and foo, bar, and baz

  • are each of the three list items.

  • If I change this ever so slightly to OL, Ordered List,

  • and then go back to that web page and reload,

  • now it's an automatically numbered list.

  • So there's a lot of features you sort of get for free here,

  • not unlike a typical Word processor.

  • If we want to go really all out and see a lot of nesting,

  • you can see a table here, which might be useful

  • if you want to show a whole bunch of tabular data for research purposes

  • or maybe sports scores and data on a ESPN site or the like.

  • It's a little more involved, but if you just read it top to bottom,

  • it all becomes pretty intuitive.

  • Inside of this page's body there's an HTML table.

  • This table has a TR, Table Row.

  • And that table row has table data, table data, table data.

  • So three columns, left to right.

  • And another row with another three columns,

  • another row with another three, columns another row with another three columns.

  • And I chose these values arbitrarily just

  • to kind of markup an old school telephone keypad, because indeed,

  • if we go into this with table.html, you see this.

  • You can add borders, and we'll see ways you can actually tweak the aesthetics.

  • But it's just laying things out in a grid here,

  • like you might tabular style data.

  • But none of these have been all that pretty thus far.

  • Indeed, I'm just using the default fonts and sizes, which apparently are just

  • black text, white background, Times New Roman

  • font, and pretty small text at that.

  • The web of course these days is much prettier than this.

  • So how do you actually start to stylize things?

  • Well, as we often do, let's take a progression of ideas.

  • Let me go into version zero of this file.

  • css0.html.

  • That does something terribly simply.

  • It's more interesting than any of the pages

  • we've seen thus far, if only because we have some slightly differing

  • font sizes and some actual content, but it's still pretty simple.

  • So what am I doing?

  • This is big and bold and centered.

  • This is kind of medium and bold and centered.

  • And this is kind of small, this copyright holder there.

  • So let's solve this in one way, but then iteratively improve upon this

  • as follows.

  • Let me go into css0.html, and we'll see that I've introduced amazingly already

  • another language.

  • CSS-- Cascading Style Sheets-- is another language

  • that is almost always used in conjunction with HTML these days.

  • And whereas HTML is all about formatting-- rather,

  • all about markup and all about layouts and sort

  • of semantically tagging things in a way that makes sense,

  • CSS is used to kind of take things the last mile

  • and stylize things so that they look and appear in exactly the way

  • that you intend.

  • So this is a little messy at the moment, because I

  • seem to be co-mingling my HTML and CSS literally as follows.

  • Turns out that in HTML there's a generic tag

  • called the div for just a division of the page.

  • If you want to think of the page as having rectangular regions,

  • div would be one way of doing that.

  • Or you could use a p tag or paragraph.

  • And I can add a style attribute here that's a style font

  • size colon 36 pixels semi-colon font weight colon bold semi-colon.

  • And not all of the semi-colons, at least on the end there, are necessary.

  • But this is two CSS properties.

  • A property called font size with a value of 36 pixels,

  • and a property of font weight with a value of bold.

  • And then similarly, notice what I've done in a div of tag outside of this

  • have I wrapped it with text align center.

  • And that's a property called text align.

  • Its value is center, and it's going to center all of its children so to speak.

  • So we can use the same language from our discussion of data structures

  • and trees.

  • Meanwhile, you'll notice that my middle div is slightly smaller at 24 pixels

  • and not bold, and my last one is 12 pixels.

  • But this is a little messy now, because I've

  • co-mingled my HTML markup with my CSS.

  • It would be kind of nice if we could factor out the aesthetics,

  • put them in one central spot to make it easier to edit.

  • And so let me propose this instead.

  • I've now simplified the body of my page to just have three divs, each of which

  • has a unique ID.

  • Turns out there's an attribute in HTML called ID that

  • allows you to have a unique identifier.

  • You can use that almost any word you want,

  • though there are some restrictions on the letters you can use,

  • or where you can have numbers, and so forth.

  • But I'm just going to sort of conveniently call

  • the top div top, middle, and bottom.

  • And those are unique.

  • And now that I have the ability to identify those divs uniquely,

  • let's look at another tag up here.

  • Inside of the head of my web page now, notice I have a style tag.

  • Not a style attribute, an actual style tag.

  • And the syntax here is a little different from before,

  • but it's kind of reminiscent of C. But none of this

  • has to do with programming per se, this is just aesthetics now.

  • This syntax here says, hey, browser, apply to the body tag

  • the following CSS properties in between curly braces.

  • Text align center for the entire body.

  • Hey, browser, apply the following properties

  • to whatever HTML tag has a unique ID of top.

  • So the hashtag here means ID.

  • It's just a symbol that the world has adopted.

  • So this means whatever HTML tag has a unique ID of top,

  • apply these two properties to it.

  • Notice the semi-colon's on the end, and I've invented everything

  • to keep things nice and pretty.

  • Middle will have this property, bottom will have that property.

  • So now it's cleaner in that I've relegated to the top

  • to one central spot all of the aesthetics of my web page.

  • I've left all of the lower level markup down here.

  • So that if on a whim tomorrow I want to change

  • the font size or the color or the layout,

  • I can do that very simply without actually changing the data.

  • So the data is things like these white words here.

  • And I've got some metadata, these red tags and green attributes,

  • here, so that I can uniquely identify things in the page.

  • But the aesthetics are now fundamentally separated.

  • But it's still a little messy, because they're still in the same file.

  • So let me open a third version of this, css2.html,

  • which makes the file even smaller.

  • What do I seem to have done here?

  • So in this case, I seem to have similarly given

  • IDs to these three divs.

  • But I've introduced into the head of the page not a style tag,

  • but a link tag, confusingly named, because it's not an anchor tag,

  • it's link with an href.

  • So even more confusing.

  • But all this means is hey, browser, grab the contents of this file-- css2.css--

  • the relation to this file is that of style sheet.

  • So it's stylisation.

  • And then apply it to this web page.

  • What is in css2.css?

  • It's just those same tags as before, but in their own file.

  • So what's the purpose of this?

  • At the end of the day, the result in each of these three cases

  • is an identical web page.

  • All three of these things look exactly like this, so there are no prettier.

  • But from a design perspective underneath the hood,

  • these things are fundamentally better designed,

  • because now this CSS file in theory could be shared across multiple pages.

  • Multiple pages of mine could now have this one link tag up top,

  • so that once a browser downloads css2.css or whatever the file is,

  • it can reuse and cache the file for my entire website

  • so that as the user clicks around to my website,

  • they don't have to re download the CSS file.

  • And indeed, even if the browser tries, it

  • can get that HTTP 304 not modified message so that it doesn't waste time

  • or bandwidth redownloading the file.

  • So this also allows me to use, as we'll eventually

  • see in future problems, third party libraries.

  • It turns out that a lot of people in the world who are better than little old me

  • at design certainly have created files ending in .css that have some really

  • beautiful stylizations that you can apply to your own web pages so that you

  • don't have to worry about as much the aesthetics.

  • Bootstrap is one such tool formerly from Twitter,

  • and other such libraries exist that allow you to stylize your site just

  • by using themes or skins, so to speak, that other people have created.

  • There is one last piece of syntax here I should draw attention

  • to is this thing here.

  • So this cryptic sequence of characters is what's known as an HTML entity.

  • It turns out there are some symbols that to my knowledge

  • I can't type on my Mac's keyboard, like the copyright symbol.

  • You can maybe do it on iOS these days via special software support.

  • But this is the canonical way of putting certain special characters inside

  • of a web page that you might not be able to express or easily

  • express on your keyboard.

  • And these are standardized, too.

  • So if I actually googled HTML entities, I

  • could actually see whole charts telling me

  • that ampersand hashtag 169 semi-colon will give me the copyright symbol.

  • And just to be clear, when that's actually rendered,

  • you don't see that in the page.

  • You instead see the more familiar copyright symbol there.

  • So let's now finally try to tie some of these things together.

  • I know that Google supports search queries via GET.

  • And this is in contrast just to be clear with one other thing.

  • That is POST.

  • It would be a little worrisome if every time you

  • logged into Facebook or Google or any website,

  • or any time you bought something on Amazon or any website,

  • if your credit card and your password and all your sort

  • of semi-private information appeared in the URL

  • just like these Google search queries.

  • So it turns out that HTTP supports another verb.

  • And there's a few others, but the two we'll focus on are GET and POST.

  • And POST is inside the envelope's initial message,

  • just like my handshake to AJ, almost identically.

  • But instead of GET, it's POST.

  • What do you want to post information to and what protocol do you want to use?

  • This is an example of a snippet of how I might log into Facebook.

  • When I log in to Facebook, I don't want my friends or my siblings or my family

  • members being able to see in my browser's history or the search box

  • what my user name or really what my password is.

  • And that's exactly what HTTP GET does by design.

  • POST is just another way of submitting information to a server,

  • still using the same conventions of HTTP parameter equals some value.

  • And indeed, you can send multiple ones by separating them

  • in this case with an ampersand.

  • No relationship to the ampersand we just saw in an HTML entity.

  • But notice that this email and password are deliberately

  • below the HTTP headers.

  • So they're not in URL bar, there instead deeper

  • inside the envelope, if you will.

  • But I need to know this because when I make my own web pages,

  • this becomes relevant.

  • Let me go ahead and create a super simple web page called search.html

  • that again has the doc type declaration at the top, that then has my HTML tags,

  • my head tags, my title tags-- and I'll call this search.

  • And then over here I will have the body of the page.

  • And then I'm just going to do an H1 for CS50 search, which

  • is just a big bold heading on the page.

  • And now I'm going to have a form.

  • And I'm going to have action equals https://www.google.com/search.

  • The method I want to use is necessarily GET, not POST.

  • Though in different contexts, I might want to use POST.

  • But I'm not doing logons or something like that.

  • I'm using Google search engine.

  • So now I have the HTML form element, which we've not yet seen.

  • But it turns out there's another tag called input

  • that you can give a name to like q, that can be a type like text,

  • and it's empty.

  • And then we can have another input whose type

  • might be quote unquote submit and close that tag.

  • And then save the page.

  • If I now go back into this file and go to search.html, if I zoom in,

  • we see if you will, version one of Google, without any aesthetics.

  • And indeed, the actual version one of Google

  • wasn't all that much more complicated.

  • But if I now type in cats, submit this query, I go to actual Google,

  • typing in effectively cats, because of the URL

  • I was redirected to-- which is to say that using HTML,

  • we can reconstruct exactly what Google's been doing all this time.

  • Because if you distill the essence of Google into just a few lines of code,

  • this is it.

  • And indeed, this is essentially what Google looked like a few years ago.

  • Although, to be fair, they also had this.

  • They had another input whose type was submit,

  • and whose value even early on was I'm Feeling Lucky.

  • And if we save this, it's going to actually do anything,

  • because we need a little more logic in order to make that work.

  • But if I reload, now we get the second Google button as well.

  • And so all we've implemented for now is the front end of Google, so to speak.

  • We have completely punted to Google's back end, their own databases,

  • their own software, the actual searching of things, because that's

  • because we don't really have a language yet,

  • a way of expressing searches ourselves.

  • Indeed, we could using C and using HTML and using

  • CSS start to build our own server, and we could actually

  • write code in C that receives something like q equals cats, parse the cats,

  • like to read it, extract it from that string,

  • then figure out in our own database where can I find some cats.

  • But it's going to be incredibly, incredibly tedious to do that in C.

  • In fact, if you think back to the problems Vigenere and Caesar

  • and the like, even just manipulating strings in C is really non-trivial

  • and gets quickly tedious.

  • And so we really need a better language.

  • And that language is going to be in the coming weeks Python, which

  • is a higher level language than C. In fact, the Python interpreter

  • so to speak itself is written in C. So the world some years ago

  • used C to write support for really what many would call a better language

  • for solving problems like this.

  • And so not only can you use Python for command line applications

  • and processing and analyzing data like a data scientist might use it for.

  • We can also use Python to actually write the back end of google.com,

  • or the back end of Facebook, or the back of any web server

  • that has to read the parameters, understand them, maybe look up

  • some data or store some data in a database,

  • and respond to the user with dynamic output.

  • So all that and more in the weeks ahead.

  • [MUSIC PLAYING]

[MUSIC PLAYING]

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級 美國腔

2016年CS50--第六週--HTTP (CS50 2016 - Week 6 - HTTP)

  • 21 5
    小克 發佈於 2021 年 01 月 14 日
影片單字