看 BBC 學英文
DAVID J. MALAN: This is CS50.
And today, we transition from the world of C and, with it, pointers
and some of the struggles that you might have felt over the past few weeks
to a more familiar world, that of web programming.
I'm using web browsers and mobile devices and laptops and desktops
and creating more graphical and more interactive experience
than our traditional command-line terminals have allowed.
And we'll see, though, along the way that a lot of the ideas that we've
been exploring over the past few weeks are still going to remain with us.
And we're going to see them in different ways.
We're going to see them in the form of other languages and other syntax.
But the ideas will remain quite reminiscent of what
we did back in week 0.
So TCP/IP is perhaps the most technical way
and the most low-level way we can quickly make the web uninteresting.
But you've probably, at least, seen this acronym somewhere, maybe
on your Mac, your PC, some setting maybe once upon a time.
And this, actually, just refers to a protocol
or, really, a pair of protocols, languages of sorts
that computers speak in order to transmit information
from one computer to another.
And this is what makes most of the internet today work.
The fact that you can pull up your laptop and desktop
and talk to any computer on the internet is because of these protocols,
conventions that humans decided shall exist some years ago.
And they just dictate how computers intercommunicate.
But let's make it a lot more familiar.
In our human world, you've probably, at some point, sent or received a letter.
These days, it's perhaps more electronic.
But, at least, you've gotten one such letter
from probably a human, maybe a grandparent or the liked,
or sent something yourself.
But before you can actually send that message to the recipient
and put it through the US mail or the international mail services,
what needs to go on the envelope?
DAVID J. MALAN: Yeah-- so some kind of address.
And what does an address consist of?
DAVID J. MALAN: Name.
AUDIENCE: Where they are.
DAVID J. MALAN: Where they are.
DAVID J. MALAN: So where they are might include a street address and a city,
a state, a ZIP code in the US, or a postal code, more generally,
and the country, if you really want to be specific.
And so all of that goes on the front of the envelope,
generally in the center of the envelope.
And then what often goes on the top left-hand corner in most countries?
AUDIENCE: The return.
DAVID J. MALAN: Yeah.
So the return address-- so that if something goes wrong,
albeit infrequently, that letter can get-- make its way back to you,
and also the recipient knows just immediately who actually sent them
So that is enough information to get a letter from point A
to point B because these addresses, these postal addresses
in our human world, uniquely identify houses or buildings or people,
in some sense, in the world.
So right now, we're at 45 Quincy Street, Cambridge, Massachusetts, 02138, USA.
That is probably enough specificity for anyone in the world
to mail us a postcard saying "Hello world" in written form
and get it to this building.
Meanwhile, if we wanted to send something to the Science Center,
1 Oxford Street, Cambridge, Mass, 02138, USA, that's its unique address.
So it stands to reason that computers, including our own Macs and PCs
and Android phones and iPhones and the like,
all have unique addresses, as well, because, after all, they
want to communicate.
And they need to get bits, zeros and ones, from point A to point B.
But they're not quite as verbose as those kinds of addresses.
Computers have what you probably know as IP addresses,
Internet Protocol addresses.
And this just means that humans decided years ago
that every computer in the internet is going
to have a unique number identifying it.
And that number is generally of the form something dot something dot something
And, as it turns out, each of these somethings between the dots
is a number from 0 to 255.
And now, after all these weeks of CS50, your mind
can probably jump to a quick answer.
How many bits must each of these numbers be taking up
if the range is from 0 to 255?
So eight-- and why is that eight?
So 256 has been a recurring theme.
And if you don't recall, that's fine.
But yes, this is eight bits, eight bits, eight bits, eight bits,
which means the numbers that we humans use to uniquely identify our computers
on the internet are 32 bits in total.
Well, there's probably another number that can roughly come to mind.
If you've got 32 bits, how high can you count, roughly speaking, from 0 to--
I heard a murmur--
AUDIENCE: Four billion.
DAVID J. MALAN: Four billion.
So it's roughly four billion.
And we brought that up in week 0 with a four billion-page phone book,
So four billion is roughly what you can count up to with 32 bits.
So that means there can be four billion computers, devices, or anything
on the internet, uniquely identified-- small white
lie because that's actually not quite enough these days with all the devices
and all the humans in the world.
But we found workarounds for that.
DAVID J. MALAN: But only half of them at the time.
So yes, if by 2023 or whatever year humans are projected
to be almost entirely online, and there's
some-- billions and billions of people, eight billion or so,
then that's a problem for this system.
Thankfully, as long ago as 20 years ago did people realized,
mathematically, this was going to be a problem.
And so there's actually a newer version of IP, Internet Protocol.
This is version 4 we're talking about, which is still
pretty omnipresent in the world.
Version 6 actually uses not 32 bits, but 128 bits, which is massive.
And I can't even pronounce how big of a number that is.
So we're thinking about it.
And the biggest companies of the world have already
transitioned to using bigger addresses rather than these 32-bit addresses.
But these are still pretty common in almost any device you might own or see
on campus or elsewhere.
So if you have a unique address, that's enough to put
on the front of the envelope.
And it turns out that if you're sending an email or a chat message
or whatever, you, too-- your Mac, PC, or phone-- has an IP address.
So that's enough to put in the top left-hand corner, conceptually.
But you need one more piece of information.
It turns out that on the internet, there are servers, computers,
that are just constantly listening for people to connect
to them, like us, checking our email and visiting Facebook
and Gmail and other such websites.
And those servers, though, can do multiple things.
Google has lots of businesses.
They give you email and web services and video conferencing
and lots of other internet-based services.
And so humans also decided, years ago, to identify
all of these possible internet services with just unique numbers--
names also, but also unique numbers.
And it turns out that humans decided years ago
that when you visit a website, there's one more piece of information that's
got to go on this envelope, not just the server's IP address
that you're trying to connect to, but also the number 80
because 80 equals HTTP, acronym you're surely familiar with by now.
And that just denotes this is a web request.
If, instead, it said something like 25, that's SMTP, which is email.
So that might mean inside of this virtual envelope
is actually an email message going to Gmail or the like.
And there's bunches more numbers.
But the point is that there are numbers that uniquely identify.
So when Google gets a virtual envelope, just a whole bunch of bits, zeros
and ones, that, in some way, has an IP address on it as the destination,
it also knows, oh, is this an email or is this a video conference message
or is this a chat message or something else.
So just to make this more real then, if I'm
going to go ahead and write this down, my IP address to whom
I'm sending something might be 220.127.116.11.
Generally, then, I'm going to send it to, say, port 80.
Maybe my IP address is 18.104.22.168.
And so an envelope--
I'll be at [INAUDIBLE]---- and it's really just going to have those pieces
of information-- the destination address, colon,
and then the number of the service you care about, HTTP or whatever,
and then your own IP address, and more information.
But the point is both sender and recipient in dresses--
that's enough to get data from one computer in the world to another.
And there's so much more complexity.
This is a whole field in computer science of networking,
if you like this kind of stuff.
But that's how, in a nutshell, the internet gets data from point A
to point B. And this envelope just represents
a whole bunch of zeros and ones.
But what's inside of that envelope?
And that's where we'll focus today and in the weeks to come.
It's actually content.
It's the email you care about or the web page you care about.
And how do we actually decide what server we're connecting to?
Well, typically, you might go to a so-called URL, Uniform Resource
A URL is just the address of a server.
And that's going to be the-- really, the ultimate recipient of that envelope
that we're trying to send.
But this, of course, is not an IP address.
This does not follow the pattern something dot
something dot something dot something.
So if all of us humans are constantly typing stuff like this
into our browsers, yet the whole story just
told is about numbers and port numbers and low-level stuff,
where's the connection?
Does anyone already know how you get from typing this
to a bunch of zeros and ones that are somehow addressed with numbers?
DNS, I heard.
So it turns out there's a technology in the world-- domain name system,
And DNS, Domain Name System, is just a type of service on the internet
that Harvard maintains and Yale maintains,
and Comcast and Verizon and a lot of the big players
in the world, whose purpose in life is to run
servers that convert what are called domain names to IP addresses,
and vice versa, so that when we humans type in www.example.com into a browser,
it's our Mac or PC or phone that contacts a local server, a DNS server,
on the local campus or university or apartment or whatever,
asks what is the IP address for www.example.com.
And then what your Mac or PC or phone does
is it writes that address on the envelope.
But it puts a request for specific web page inside of the envelope.
And when you get back a response from that server,
it's going to be your address that's on the front of the envelope.
And inside of the envelope is going to be
the web page or the email or the chat message
or whatever it is you were trying to actually access.
So let's tease this apart into some of its components.
First of all, this thing here highlighted in yellow
is officially the domain name.
You've probably all used this term before.
It's usually something dot something.
"Com" typically refers to commerce or commercial, although anyone,
for any purpose, can use .com.
Back in the day, very popular were .com, .net, .org, .edu, .gov, .mil.
And these were all very US-centric because it
tended to be the United States that really kicked off
this use of the internet and DNS.
But now it's certainly spread globally.
And so there's hundreds now of what are called TLDs, Top-Level Domains.
They tend to be three or more characters if they denote a word.
And they tend to be two characters if they denote a country,
like US is United States, JP is Japan, UK--
United Kingdom, and so forth.
Those are just country codes that do the same thing.
But what's this at the front?
Worldwide web, or www, here, more generally, is an example of what,
What is this?
What does this mean?
DAVID J. MALAN: It's a subdomain-- is one way of thinking about it.
In fact, all of you, many of you here, probably
have email addresses of the form college.harvard.edu or g.harvard.edu
or the like.
Those are subdomains.
Harvard's such a big place that they actually
put everyone in different categories of domains, otherwise known as subdomains.
And that might be a word or a phrase that comes before the domain name here.
But it can also just mean the name of a server.
So if example.com is the company or business whose website you're trying
to visit, their domain is example.com.
And they bought that domain name some years ago.
And they spent a few dollars every year, probably, renewing the fee for that.
And they have at least one server whose name is www.
And that exists within their domain.
They might have dozens or hundreds or just one server.
Each of them can have a name.
So this is generally called the hostname.
So when it's an email address, it often implies a subdomain,
like a category of addresses.
But when it's in a URL like this, it means probably a specific machine
or a specific set of machines-- conventionally,
the web servers that the company runs--
doesn't have to be called www.
For historical purposes, MIT tends to use web.mit.edu.
But almost everyone else in the world uses www or nothing at all.
It's not required.
You can actually just visit many websites without visiting any hostname.
And it just works, as well, thanks to DNS giving you the IP address.
But what about the file you're actually requesting?
What does it actually mean to visit this URL?
Well, on many servers, this implicitly means, hey, web server,
give me a file, just a text file, called index.html.
That's the name of the file, a text file,
that you could create with CS50 IDE or even Notepad or TextEdit
on your own Mac or PC that contains a language called HTML.
And we'll take a look at that language in just a bit.
And some of you might have seen it before.
But the language in which web pages are written is HTML.
And we'll give you the building blocks, conceptually and practically,
for that today.
You'll use it over the coming weeks in many different contexts.
But we'll use it, ultimately, to create the contents of websites.
But today, we'll focus first on this, HTTP.
Anyone know what that stands for?
DAVID J. MALAN: Yeah.
HyperText Transfer Protocol.
And honestly, in most of technology, it's
not so much what the acronyms represent that's all that important,
but, really, what the technology does.
And in this case, HyperText Transfer Protocol--
we'll see hypertext in a moment.
That's another way of saying HTML.
Transfer Protocol-- P for Protocol-- that's another buzzword.
So protocols are not programming languages, per se.
They are conventions.
And we humans have conventions, too.
For instance, if I were to meet someone for the first time,
I probably wouldn't stand on stage and lean down like this to do it.
But I might say, hi, I'm David.
DAVID J. MALAN: Stephan, nice to meet you.
And we have this weird handshake that was aborted prematurely there--
that we have this weird convention-- us humans, at least in the US,
of greeting someone with a handshake.
And Stephan just knew to do that, however awkwardly.
And then he disengaged because the transaction was complete.
And that's not unlike what a web server does.
When you request a web page, you're sending a request to someone
as though you're extending your hand.
You're expecting something in return.
But in the case of a computer, of course,
it's like the web page itself coming back in an envelope from point B
to point A.
So that's what a protocol is.
We just have been programmed to know what
to do when we want to request a greeting or information
and get something back in return.
It's like a client-server relationship in a restaurant.
A customer requests something off the menu.
The server, the waiter or waitress, brings it to them
and, thus, completes that transaction as well.
And that's what the internet is, too--
clients and servers, browsers and servers, computers
and other computers, ultimately.
So with that relationship in mind, let's take a look
at what's actually inside of this envelope.
In the case of Stephan's and my greeting, it was more visual.
But in the case of a computer, it's going to be more textual, literally.
So inside of the envelope the, virtual envelopes,
so to speak, that your browser sends to a server
when trying to request a web page, is actually
a message that looks like this.
Thankfully, it's not terribly cryptic, although the dot, dot, dot
implies there's more contents inside of the envelope.
But the keyword here literally is gets, a verb.
And there's other verbs that the browser can use.
And this one literally means, get me the following home page.
What home page you want to get?
Well, the default one.
This forward slash, as it's called, just represents the default web page
on a website.
And in many cases, that implicitly means an actual file called index.html, just
It can be called other things and not exist at all.
But in many cases, that means, implicitly,
get me a file called index.html.
And we'll see what that looks like in a moment.
Http/1.1 just means, hey, Stephan, I speak HTTP version 1.1.
Hopefully, you do as well.
There can be other and newer and older versions of the same thing.
Notice down here, though-- whoops-- notice now here, though,
that the hostname is also in this envelope
because it turns out that web servers can do multiple things at once.
And they can serve multiple domains.
You don't need your own personal unique server to serve a website.
You can have tens, hundreds, thousands of different websites
all on the same server.
And if any of you ever paid for your own domain name or your own personal home
page or the like, you are probably paying someone
for shared space on one server or more servers,
not for your own personal dedicated one.
But again, this might implicitly mean the same thing as this.
Give me index.html.
So what is it that actually comes back from the server?
The server, hopefully, responds with a message that looks like this.
It responds with confirmation of the version of the protocol it speaks.
That's like Stephan saying, yes, I speak HTTP 1.1 as well.
200 is a numeric code that signifies literally OK.
All is well.
I understood you.
Here is the information you requested.
And Content-Type, below it, is a more technical way of saying,
the type of content I'm handing back to you in my own envelope
from point B to point A, or from Stephan to me,
is in a language called HTML that happens to be text.
Why does it look like this?
Humans, years ago, just decided that this
would be the sequence of characters that computers literally
send to communicate that information.
So let's actually try this in one case, maybe, for instance, with harvard.edu,
and see what actually happens to see what else we might see.
So let me go ahead and open up Chrome, or any browser,
for that matter, that supports some kind of debugging and diagnostics.
And I'm going to do this.
And you can access this in different places.
I'm going to go up to View, Developer, and View Developer Tools.
This is something that comes with Chrome.
You sometimes have to enable it in Safari and other browsers.
But almost every browser these days has this capability.
And you'll notice that this just opened up a whole bunch of tabs
at the bottom of my screen here that I'm going
to be able to use to actually explore what is--
did I kick something else?
It's back-- won't step on there.
So what is this going to allow us to do?
Well, notice there's a lot of features here.
It's overwhelming at first glance.
But there's a tab here called Network.
And it turns out that one of the features Chrome gives to developers,
which you now all are-- is software developers--
is the ability to see what's going on underneath the hood of a browser,
to see what is inside of these virtual envelopes
that your browser has all those years been sending
from itself to servers elsewhere.
So I'm going to go ahead and do this.
I'm going to go ahead and actually visit http://harvard.edu and hit Enter.
And you'll see a whole bunch of stuff happens,
including the web page appearing at the top of the screen.
I'm going to ignore all of this stuff at the bottom
except for the very, very first request.
If I zoom in on this, notice that highlighted in blue
here is the very first request, harvard.edu.
And if I click on that, I'm going to see a little more information at right.
And if I go scroll down to what are called
request headers, the lines of text that were inside the message
that my browser sent, this is literally what
my browser sent inside the envelope, unbeknownst to me,
when I visited harvard.edu.
Thankfully, it confirms my prediction earlier, get/http/1.1,
because I requested harvard.edu's home page.
Host is harvard.edu.
Then there's the dot, dot, dot, the stuff that we don't particularly
care about today.
But let me go ahead and look at the response.
So this was my request.
This was my hand going out to Stephan.
Let's see what his or the server's response
is by scrolling up to this, which is called response headers.
Harvard's server, fortunately, does speak the same protocol
as me, 1.1 of HTTP.
But apparently, Harvard moved permanently.
What does that mean?
I went to http://harvard.edu, not there.
Where is it?
Well, there's a little more information here.
There's a lot of dot, dot, dot, things we don't care about.
But if we focus on one that-- oh, location--
where is Harvard now, apparently?
DAVID J. MALAN: Yeah.
It looks like Harvard "moved" permanently from http://harvard.edu to,
and let me highlight it, https://www.harvard.edu,
with two notable changes.
One, there's the www.
And two, there's also what that might catch your eye?
S, which most of you probably know these days means secure,
and which implies encryption in the spirit of Caesar and Vigenere,
but much more secure than those simple ciphers.
The information is somehow scrambled now when I'm communicating
between myself and harvard.edu.
So there's two decisions there.
Harvard has decided that they want to allow and, indeed, require
users to visit their website securely so that no one--
no company, no government, no family members--
can necessarily see what is being requested of Harvard's website
because that is scrambled information, much like using something
like Caesar or Vigenere.
And Harvard also, probably for branding reasons,
but also partly for technical reasons, decided,
we want you to think of our website as www.harvard.edu.
And it's a mix of marketing and technical for a few different reasons,
one of which is www we humans just all know means website.
And if you see harvard.edu--
this is less true these days--
might not necessarily imply as obviously that this is a websites URL.
Frankly, not too many years ago, even advertisements and TV ads and printed
ads and the like would even show http:// to really make clear to viewers that
this is a web address.
But gradually, as more and more people get on the internet
and understand technology and URLs and the like,
we can just start dropping the stuff that is unnecessary clutter because all
of us now know intuitively, oh, harvard.edu-- it's
probably a web address that I can just type into a browser.
And the browser or the server will finish my thought for me
and actually prepend the secure URL or the www or the like.
So we still haven't actually found Harvard, it seems.
So let's do this instead.
Let me go ahead and zoom out and visit a different URL.
Let me go ahead and, again, go to View, Developer, Developer
Tools, Network Tab.
And now let me visit that more verbose URL, more precise URL, and hit Enter.
Again, a whole bunch of stuff gets requested--
more on that some other time.
But now, if I click on the first such request
and look at my response headers, you'll actually
see, albeit in a different format now, that the status of this request is 200,
which, recall, meant--
DAVID J. MALAN: OK.
So now these are two numbers that, honestly, you've
probably not really seen or cared all that much about, 200 and 301.
But odds are you've seen at least one other number when visiting URLs.
For instance, besides actually seeing 200 and 301, you've probably seen 404.
Now, it apparently refers to Not Found.
But more in real terms, what does that mean?
How do you induce that error?
AUDIENCE: The site doesn't exist.
DAVID J. MALAN: The site doesn't exist.
You mistyped a URL.
The web page doesn't exist.
A system administrator just changed the name on something or it's an old URL.
Any number of reasons can mean that the file was not found.
That file might have been index.html or any other URL.
But all this time when you visited a website and you've seen 404,
it's not clear, frankly, why servers have been bothering to tell us 404.
Most people don't need that level of information.
But it derives from that HTTP response, that first line
of text inside the envelope coming back from Stephan or the web server,
more generally, that says 404, Not Found.
And that means the user probably did something wrong
or if the data has simply disappeared from the server.
And there's so many more of these things as well.
And in fact, you might get responses, like we just
did from Harvard, supporting not just 1.1, but version 2 of HTTP.
So just realize if you tinker with your own Mac or PC,
the messages might look a little different
based on your browser and the website.
And that's just because things are evolving over time.
And versions are changing.
But there's so many others of these.
And this is just a short, abbreviated list.
200 and 301 we saw.
404 you yourselves have probably seen.
401 and 403 generally refer to you haven't logged in
or you're just not authorized to access information
because it doesn't belong to you, for instance.
500 you're all going to experience before long--
that 500 is Internal Server Error, which is not
so much the server's error as your fault and my fault
when we've written buggy code.
So in the weeks to come, not this week, but when
we start writing Python code and SQL to talk to databases,
we're all going to screw up at some point.
And a browser will often see a 500 error from a server
if, indeed, there's a problem with code.
418 doesn't actually exist.
This was a April Fools' joke, I think, in, like, 1988,
where some people with a lot of free time
wrote up a whole formal specification for an HTTP status code, a 418,
I am a teapot.
And it's still kind of exists in lore, internet lore.
So those are just some of the numbers you might see.
But they're not all that technical if you just know where to look for them
and you know, as a developer now, what they signify for you.
DAVID J. MALAN: Good question.
What's the difference between 200 OK and 302 Found?
So 302, if you read into the documentation,
would actually tell you that this also induces
a redirect, whereby, just like 301, when the browser gets a 301 or a 302,
the browser should be redirected to the new URL that we saw in the header,
so to speak, called location, colon, whatever it was.
The difference is that Moved Permanently means
that the browser should remember that this redirection is happening
and stop bothering the server with the same original quest.
Just remember what the new URL is.
302 means found it, but don't rely on this.
Keep asking me again and again.
So it's just a performance optimization so you
don't annoy the server unnecessarily in the case of 301s, which just
costs time and money, in some sense.
So you might have heard about this before--
can only get away with this Cambridge, not so much New Haven.
Has anyone ever visited safetyschool.org?
DAVID J. MALAN: You're welcome to on your laptop or your phone.
So some very clever Harvard students, I think, years ago bought this domain.
Frankly, they've probably been paying, like, $10 or more
per year ever since just to keep this joke alive.
But it's wonderfully illustrative because if we go back
to Chrome or any browser--
and let me go ahead and open up a browser tab and go to safetyschool.org,
Where did I get redirected?
DAVID J. MALAN: Hey.
So the more interesting question for us is, how are they doing that?
Well, let me go back into the IDE for a--
or actually, let me go into my browser and open up a new tab--
View, Developer, Developer Tools.
Look at the Network tab.
And now let me go ahead--
whoops-- let me go ahead and visit http://safetyschool.org.
Scroll back up to the top, where I see the first request.
And you can see, more technically, if this doesn't take the fun out
of the joke, all these Harvard students did
years ago was configure this domain name to return a 301,
Moved Permanently to Yale University.
Now, it's only fair, especially since the Yale students are watching
this live right now from New Haven--
let's take a look at one other site called harvardsucks.org.
So this domain, too, does exist.
Let me clear that screen and go to http://harvardsucks.org.
And this is an actual website.
So not only did these enterprising Yale students buy the domain name,
they've also been hosting the website for years since.
There's a wonderful YouTube video there that actually speaks to a very fun hack
that they did some years ago at Harvard-Yale, the football game.
But you can see here, oh, that--
so there's a minor one.
So harvardsucks.org actually now lives at www.harvardsucks.org.
But then you actually stay there.
And so I encourage you to go to this site, as well as the other, for all
your Harvard and Yale shopping needs.
So that is HTTP.
HTTP is the protocol, the set of conventions, that browsers
use when talking to web servers.
And it's the protocol that governs how those web
servers respond to the browsers.
We've quantized this in the form of these virtual envelopes, which is just
a physical incarnation of the zeros and ones that are technically going
back and forth across the internet.
But it's embodied in my handshake with Stephan, what's really happening.
And it's like a client-server type relationship.
So how do you actually now do creative work?
How do you make yale.edu?
How do you make harvardsucks.org?
How do you make CS50's own website or Google or Facebook?
Well, what really matters now what's--
is what's deeper inside of that envelope.
In addition to these headers, this textual information,
like 200 OK or 301 Moved Permanently, there's
another language embedded inside of that envelope, deeper down,
called HTML, HyperText Markup Language.
This is the language, which is also text, in which web pages are written.
And so if you've ever visited a website on the internet,
and I just noticed that Erin is doing that on repeat, isn't she, what's--
you're looking at is a browser's rendering of HTML.
So HTML is just text.
And we're going to see it in a moment.
The browser reads that text top to bottom, left to right, much like Clang
reads your C code top to bottom, left to right.
But rather than convert your text to zeros and ones, what a browser does
is interpret it line by line by line.
And it does what you say.
So if you say, hey, browser, put Erin's photo on the screen,
it is going to do that.
If you say, hey, browser, write the words "staff" in big black text,
the browser's going to do that.
If you tell the browser to lay out a whole menu, it's going to do that.
And we'll see, in just a moment, how you convey those terms.
HTML is not a programming language.
It is, indeed, a markup language, which means it just lays things
out structurally and aesthetically.
So the website here that we're looking at has a bunch of images, all of which
are what are called animated GIFs, which are
very much in vogue these days on Reddit and phones and iMessage and the like.
But those are just images, files, that are actually being transferred
from CS50 server to your browser.
But if I go up to View, Developer, and now View Source, and you can--
could have been doing this all these years--
you can actually see the so-called HTML that drives CD50's website.
So this is all of the HTML, and I'm deliberately
scrolling fast through it, that implements that CS50 staff page.
And if we scroll all the way to the bottom,
you'll see that 1,008 lines later is the web page done.
But it's just text.
And, in fact, let me scroll back up to the top and just point
out a few salient details.
You'll see familiar patterns in the examples
we're about to start looking at.
The very first line probably is that, DOCTYPE HTML, which
is like a little hint to the browser that says,
quite explicitly, hey, browser, the document type you're about to see
is indeed HTML.
But the rest of the web page follows a structural pattern.
And you'll see that it's already nicely indented,
even though some of these lines are a little long and are wrapping.
But you'll see this convention, an open bracket, which is an angled bracket,
like a less than sign, the keyword html, maybe some pattern like this,
lang equals en-us--
this sounds like language-- a US English, maybe--
more on that in a bit-- and then this close bracket, or a greater than sign,
that completes the thought.
Then inside of that HTML tag, so to speak, indented beneath it,
is this, the head of the web page.
The head of the web page something that you mostly can't see.
It generally refers to the tab at the top of the page and just
And if I scroll down further, we'll see, really, the guts of the web page,
which are in the so-called body of the web page.
So these things that I've just been highlighting,
albeit in a very big context of a big, 1,000-line web page,
are just called HTML tags.
HTML is a tag-based language, a markup-based language,
where you just say what you want to appear where you want it to appear.
So what does that actually mean?
Well, let's take a look at a simpler example
in the form of this slide, which is perhaps the simplest web page
that you can make, this one here.
This is perhaps the simplest correct, syntactically correct, web
page you can write that's saying, hey, browser, the type of document is HTML.
Hey, browser, here's the start of my HTML page.
Hey, browser, here's the head of my web page.
Hey, browser, here comes the title of my web page.
Hey, browser, the title of this page shall be, for the sake of discussion,
But you could say literally anything there that you want.
But now things get interesting.
And some of you have certainly seen HTML before, and some of you haven't.
But you can probably just infer, even if you
haven't seen HTML, what this tag is doing because it looks the same,
but yet a little different.
So if this is saying, hey, browser, here comes the title,
what is this probably saying, intuitively?
AUDIENCE: Just ends.
DAVID J. MALAN: Yeah.
That's it for the title.
Hey, browser, that's it for the title.
So you might call this a start tag and this an end tag,
or an open tag and a close tag.
Think about it however you want.
But in HTML, there's generally this nice symmetry.
Once you start something, you eventually finish it.
And you do it in the right order.
So you do-- you start tags in one order.
And then you close them in reverse order so that everything is nicely symmetric.
And indeed, the indentation, just like in C,
technically doesn't matter at all.
You could have a really, really ugly web page with no whitespaces whatsoever.
And it would still work fine for the browser because it doesn't care--
just much harder for us humans to read.
So this convention is to indent, just like in C,
just so it's more clear what the hierarchy or the nesting
is, so to speak.
This line here means, hey, browser, that's it for the head.
It's another close tag.
Hey, browser, here comes the body of the page.
So much like head here, body here, most of the page's content
is, indeed, in the body of the web page.
That's what you, the humans, actually see.
And mostly in the head, we'll just see things like the title
and just a couple of other things in a little bit.
The message inside this web page is apparently, "hello, body,"
then close body, close html.
And that's it.
So when I said earlier that inside of these envelopes
is just a whole bunch of text, all I meant was this.
This is what's inside of this envelope just below
the protocol information, the HTTP information, that just said 200 OK
or any of those other messages.
So when the browser receives this envelope, it opens it up.
It reads it top to bottom, left to right.
And then it literally interprets that file top to bottom,
doing exactly what you tell it to do.
So how do we go about actually doing this?
You can write HTML on any text program.
You can write it in TextEdit, on a Mac, on Notepad, on a PC.
You can, technically, use Microsoft Word or Google Docs.
But that's out of context and bad.
Those give you features you don't want.
But you generally want a text editor.
And we, of course, have a text editor in CS50 IDE.
So let me actually go there.
I'm going to go into CS50 IDE.
And I'm going to go up to File, New.
And I'm going to go and preemptively just save the file with the only file
name I remember from earlier, which was index.html.
Just like C programs end in files called something .c,
HTML files often end in .html, sometimes .htm, but often .html.
So let me go ahead and click Save there.
And now I'm going to go ahead and do a-- type exactly that same code--
so open bracket, exclamation point.
And that's the only exclamation point we'll expect.
The first line is, unfortunately, a little different from all the others.
Then I'm going to do open bracket, html, close bracket.
And you'll notice that, just like with C, the IDE tries to be a little helpful
and finish your thought.
So it already closed the tag for me.
Now it's just on me to hit Enter to move it into place.
Now I'm going to-- what came next inside the--
What came next?
The head-- so open bracket, head, close bracket.
Inside of head was--
And then I think it just said, "hello, title,"
though I could call that anything I want.
Then below the head, but inside the html tag still, was my body.
So let me type that here.
And I think I said, "hello, body."
So-- bdoy, boday.
OK, body-- save.
So now I have a text file in the IDE.
It seems to match up with what we showed as a canonical page before.
Now we need to load it in a browser.
And this is a little paradoxical because I'm,
obviously, writing this text in a browser,
and yet I need the browser to read it.
So this is just because the IDE, Integrated Development Environment,
that we've been using is, itself, web-based.
That's just an incidental detail.
The fact that I have written this code in a file now is what's important.
It could be in the cloud as it is.
It could be on my Mac.
It could be on my PC.
It could be on any other server on the internet.
The point is I need to access this file somehow.
And so it turns out that we're not going to compile it.
There are no zeros and ones involved anymore.
There is no machine code.
We're going to leave it just like this.
HTML is interpreted, literally, line by line, top to bottom--
no zeros and ones needed.
But I am going to need to run my own web server, not the IDE itself.
I want to run, as the developer, my own web server.
What is a web server?
It's like Stephan.
It's just a program sitting there, waiting and waiting
and waiting for something to happen.
And that's something is, presumably, a request from a browser, at which point
it will respond with a handshake or, more specifically, with this file.
So how do I do this?
Well, in the IDE, we actually include a free program called http-server.
All of the software in CS50 IDE is free and open source.
So we've simply chosen some of the most popular packages, one of which
is called, literally, http-server.
And if I go ahead and hit Enter, you'll see somewhat cryptic information
But let's see.
It's starting up the http-server.
It's serving dot slash.
Well, what does dot mean?
So just serve up the contents of this current folder that I'm in.
Now it's saying it's available on this URL.
And this URL's going to vary by who is running this.
If you're running it, you're going to see a different URL.
But what is interesting is the number--
turns out that, because this is my little own personal web server,
it's not using port 80, which I claimed earlier was the default.
It's using a different convention, 8080.
8080 is just a human convention.
It's not standardized in the same way.
But this way, I can serve files separate from the IDE
because the IDE itself is actually listening on port 80,
or, technically, 443, because it's using HTTPS.
And I don't want to confuse my files with CS50 IDE's own files,
the actual user interface that you're all familiar with.
So, just like Stephan can hear from--
say hello to multiple people and Google servers can handle multiple services,
so can my own IDE listen on multiple ports, as they're called--
80, 25, 443, or, in this case, 8080.
So what does this all mean?
I'm going to go ahead and literally click on this URL,
open it in another tab on my browser, and you'll see somewhat cryptic output.
But this is just a succinct way of saying, here is the index, the listing,
of slash, which is now the default area of my website.
I've got two folders, source 5, which is on the course's website--
it's all of today's files in case we want to look them up without writing
them from scratch--
and then the file I just created, index.html.
So if I go ahead now and click on index.html, there we have it-- hello,
And we don't see the tab just because I full-screened Chrome.
But if I actually remove that full screening
and zoom up to the top of the tab, you see "hello, title" there.
And if I go back into this file, meanwhile, and I say,
"hello, body, nice to meet you"-- this one got weird--