Placeholder Image

字幕列表 影片播放

  • [MUSIC PLAYING]

  • SPEAKER: All right.

  • This is CS50, and this is week nine.

  • You'll recall that last time, we took a look back at such things as Scratch.

  • Because recall that when we introduced Scratch,

  • we introduced a whole number of programming

  • constructs-- loops and conditions and functions and variables and more.

  • And then just one week later did we transition from that world of Scratch

  • to C, where the syntax was much more cryptic looking, certainly

  • at first glance, and perhaps a little bit still.

  • But the ideas were ultimately the same.

  • And now in week nine as we transition from the world of C to Python,

  • you'll find that that same finding is the case, whereby

  • we are using the same ideas.

  • We're leveraging the same concepts.

  • But we have to translate it now to a slightly different domain

  • and a slightly different syntax.

  • But things really start to get interesting this week

  • and beyond, especially as we build on our ability

  • to write Python code, our ability to serve up web applications.

  • Because now we can begin to leverage so much more functionality

  • than comes with C, than comes with the CS50 library,

  • because there's such a large community of software developers

  • who have created some really amazing things that we can try out.

  • So let's look further today at Python, and let's look further

  • at the applications for a language like this.

  • You'll recall that perhaps the simplest program we could have written last time

  • was actually just one line-- print Hello World.

  • But you'll recall as you began to dive into some of this past week's

  • challenges, you might have needed or wanted

  • to actually start wrapping code like that in a method or function like main,

  • and then calling it by default with this magical incantation here.

  • But this was the building block toward which we were starting

  • to develop more interesting programs.

  • But now we're going to really context switch from a command line environment

  • over to a web-based environment.

  • And the world has been writing web-based applications for quite some time.

  • Even I 20 years ago recall made that Frosh IMs-- freshman intramural sports

  • website.

  • And even since then have the languages changed and the paradigms changed,

  • and we humans have learned a lot about programming for web-based applications.

  • And this, for instance, is one design pattern or one architecture

  • that has arisen, whereby MVC refers again

  • to this pattern, whereby you put your intelligence, your so-called business

  • logic in your controllers.

  • This is one or more files that has all the conditions,

  • a lot of the functionality calls, and actually does something

  • with your program.

  • Then you have the views, which are often templates or files that

  • render information that you might have dynamically generated

  • or input from users.

  • And then model.

  • And today, we start to dwell on the M in MVC, model, because last week,

  • we didn't really leverage much, if any, of a model.

  • But this week we're finally going to demand of ourselves the ability

  • to save data, retrieve data, search data, delete data, and more.

  • And we really haven't had this capability, other than very simple CSV

  • files, for instance back in the day of C,

  • and even last week when we dabbled with those in Python.

  • And you'll recall, last time we introduced this.

  • Flask is what's called a micro framework.

  • So a bunch of files, a bunch of code that a community of people

  • have written that just make it easier to make web-based applications.

  • It's absolutely not required.

  • In fact, one of the sample programs among last week's distribution code

  • if you like to go back and play was a program called serve.py

  • that doesn't use any of this.

  • It just uses built-in Python functionality.

  • But you'll find that it's pretty cryptic.

  • It's pretty heavyweight in order just to do something simple.

  • And so things like Flask have come around that just make it easy

  • and dare say more pleasurable to write web-based applications,

  • because people have realized in writing web app after web app after web app

  • that they're just repeating themselves, or borrowing code

  • they've written in the past.

  • And so in frameworks, you have solutions typically to very common problems.

  • So we'll use this set of solutions to the development

  • of a web-based application.

  • And for instance, the simplest Flask application

  • that's also available from last week's source code

  • might be something like this, whereby the top of your file,

  • you don't import like the CS50 library, you instead import someone else's

  • library, specifically from the Flask framework import-- what's

  • called the class, recall, called Flask-- capital F.

  • And then also a function or method called Render Template.

  • And we used both of those as follows.

  • Last week, we instantiated a Flask-based application

  • by essentially passing in this special reference

  • to the current file, the name of the current file,

  • and then allowing the framework to do its thing

  • and give us back this very special, very powerful object called app--

  • though we could have called it anything-- that gives us

  • access to some useful functionality.

  • For instance, the most useful functionality

  • initially was just this-- @app.

  • And this is just syntax for what's called a decorator, and more

  • and that some other time, or more on that in Flask's own documentation.

  • But essentially, this line here-- @app.route,

  • says that hey server, any time you see an HTTP request for slash, the default

  • web page typically of a website, go ahead and call the following function

  • that's immediately below it.

  • That function I've called Index, mostly by convention.

  • But I could've called it anything I want.

  • And all it did last week in this example was render a template.

  • In this case, index.html.

  • Now, that could be raw HTML recall.

  • But oftentimes, you use something called a templating language.

  • And indeed, we introduced a little bit of Ginga last time, which

  • is a Python-based templating language that we'll see just

  • makes it easier to generate HTML without having

  • to write HTML inside of our actual Python

  • code, which tends to be frowned upon.

  • So let's take a look back at one of those examples, which

  • I've renamed from Frosh IMs last week to Frosh IMs zero.

  • And recall that we had the following files.

  • We had a templates directory, inside of which

  • was failure, index, layout, and success.

  • Kind of a lot of complexity for a pretty simple program,

  • but we'll see what each of those does again.

  • And then application.

  • And this was the so-called controller code that actually did something

  • interesting last week.

  • Now, what was that?

  • Well, it's a pretty small program.

  • As before, I've imported from the Flask framework

  • a whole bunch of symbols here.

  • I'm instantiating my application here.

  • This is copy-paste from our simplest of examples,

  • whereby if the user just visits Slash, I want to show him or her index.html.

  • But then it got interesting.

  • This really was the first time we had the capability

  • in a web-based environment to respond dynamically to user's inputs

  • based on whatever they typed into a web form.

  • In fact, if you think back a few weeks in week six

  • when we first introduced the web and HTTP and TCP/IP

  • and making HTML-based web pages, you recall that all we did

  • was implement Google's front end, and an ugly one at that.

  • But just the HTML form, that if you click the Submit button,

  • it submits to Slash search on google.com,

  • because we pretty much deferred completely to them,

  • lacking at the time a backend and lacking at the time

  • even a language in which we could implement our own backend,

  • a web server that actually responds to those requests.

  • But in Frosh IMs Zero, we have the ability

  • to have our own route, in this case called

  • Slash Register that I've specified isn't even

  • going to respond to HTTP Get requests, but rather Post requests, which

  • typically mean a form submission is sending one or more fields

  • that you don't necessarily want to end up in the browser's URL or history.

  • I'm calling my function Register, and this would be a good convention.

  • Just make sure your function here lines up with what the route is there.

  • And then I'm checking a couple of conditions.

  • If that request's form-- that is, all of the parameters

  • that were submitted via HTTP Post-- has a name field--

  • so a text field, for instance, called Name--

  • and that equals nothing-- quote unquote, the so-called empty string--

  • or, that request's form's dorm key has a value of quote unquote, which

  • is to say that if the user, myself last week, did not give my name or my dorm,

  • then go ahead and return the template called Failure.html.

  • And we'll take a look back at that in a moment.

  • Otherwise, render template Success.html if all in fact goes well.

  • Now, if we take a look at Failure.html, it didn't do all that much.

  • It extended Layout.html.

  • It declared a title of Failure.

  • And then it declared a body of You must provide your name and dorm

  • as sort of an admonishment to the user.

  • If I look at Success, meanwhile, it's pretty similar,

  • but the text is different.

  • But this is the problem we address this week.

  • You are registered, well not really.

  • Because recall-- and you could have seen a moment ago-- we did nothing

  • with the user's name or dorm or any other information,

  • we just pretended to actually register them.

  • And then what did the form itself look like?

  • Well, this page here is mostly HTML.

  • But again, notice that even this page at the top

  • defines a body block and a title block, because it's extending Layout.html.

  • So this is the Ginga stuff that I referred

  • to earlier, the templating language.

  • And if we finally go into Layout.html, now you

  • see the basic framework for every page in this web-based application.

  • It's a pretty small application to be fair, but it does

  • have at least three distinct pages-- Index, Failure, Success-- all of which

  • are identical to the file, except that they each have a title and a custom

  • body dynamically embedded thanks to how we're using templates.

  • So this is a nice way of not having to copy and paste all of that code

  • into every file, thereby making it a pain to update anything,

  • to add CSS or JavaScript files, or generally any

  • of the overall structure of the page.

  • We can factor that all out to Layout.html.

  • Of course we just threw all this information away.

  • that name, that dorm, every student who's been registering by this app,

  • we're forgetting about.

  • And so at the tail end of last week, we introduced this solution recall,

  • whereby in Frosh IMs One now, I'll call it--

  • or rather today let's redress that by borrowing an idea from last week

  • that we didn't incorporate into Frosh IMs as follows.

  • In Application.py for this next version called Frosh IMs One,

  • what do I seem to be doing differently?

  • File's almost identical to before.

  • It's a little longer this route, because what I've grabbed

  • is some of the code from last time, whereby we wrote out to a CSV

  • file-- Comma Separated Value-- those student structures.

  • We did this way back when in C. We then ported the student struct to a student

  • class last week, and now I've borrowed that same code

  • for saving those students to disk, so to speak, by embedding it into this route.

  • So what's going on?

  • So if again the name or the dorm are blank,

  • I go ahead and just return the template called Failure.html and we bail out.

  • Otherwise, I declare a variable called File.

  • I open a la C's fopen function, registrants.csv,

  • which I'm just calling this just because it's

  • going to contain my registrants in a comma separated values file.

  • And then quote unquote a.

  • And you might not recall this or might not have seen this, but quote unquote a

  • is for appending.

  • If I instead and accidentally did w for writing,

  • you might think that's correct.

  • But every time you use quote unquote w, you're going to write out a new file.

  • Which is to say, if 100 students registered

  • for some intramural sport via this web app, and I was using quote unquote w,

  • I would actually keep clobbering or overwriting

  • the file, such that only the most recently registered student

  • would appear in this file.

  • So we want quote unquote a for append.

  • Then I declare a variable called writer, and I'm using this CSV module

  • that we get for free from Python that's going

  • to allow me to create what we'll a writer, passing in that file as input.

  • Then I'm going to call this special method Write Row, whose purpose in life

  • is to take a tuple, which in this case has

  • two elements, the name that was passed in via the form as well as the dorm.

  • And then I'm going to go ahead and close the file.

  • So Write Row is now the function that takes

  • care of all the complexity-- it's not all that much complexity-- of printing

  • out name, comma, dorm, name, comma, dorm.

  • And just in case anything has a quote in it, like a someone O'Leary,

  • or a name that has an apostrophe or some other punctuation symbol that

  • might otherwise confuse a program reading this text file,

  • which itself might contain quotes, this kind of function Write Row

  • will take care of those kind of details, as well as

  • commas that might correctly or incorrectly be in the values like name

  • and dorm that I'm providing.

  • And then we print out the success page.

  • But this isn't all that useful of a file format.

  • A CSV file is nice in that you can download it,

  • you can open it in Excel or Apple Numbers or similar programs.

  • You can import it into Google Spreadsheets.

  • But it's not really a database.

  • You can store data in it, but if you want to read any of that data,

  • or change any of that data, you pretty much

  • have to do what we've been doing in C, which is open it with Open or fopen

  • or whatever, iterate over the lines, maybe parse them or read them

  • into separate variables or into an array or a list or whatever,

  • and then you can use binary search or linear search

  • or whatever you want to actually find data,

  • maybe to then change data, and then save it all back out.

  • But this is just tedious.

  • To have to do all that work simply to save data

  • isn't all that much fun for programming, and it also doesn't scale very well.

  • As soon as you have some good success with some web-based application or even

  • some mobile application, it'd be nice if your code were

  • as efficient and as fast as possible.

  • And wouldn't it be nice if we could stand

  • on the shoulders of others who have had similar problems of storing data

  • efficiently so that we could learn from them as well and leverage

  • some of their work?

  • So thus was born SQL-- Structured Query Language--

  • which can be used in any number of environments.

  • So SQL is just a language that we will now introduce.

  • And it's a programming language, though it's not

  • going to be as-- we're not going to use it as richly as we have C or Python.

  • We're going to use it for a number of fairly basic, but nonetheless very

  • powerful operations.

  • And you can store data using this world of SQL in any number of ways.

  • You can use things like MySQL-- which is a very popular database

  • server that Facebook started with and still uses for some of its purposes--

  • PostgreSQL, Microsoft Access, or Oracle, or any number of third party products

  • that you might have heard of that are super

  • popular for web-based and business applications.

  • And then there's also a format that's even simpler

  • but gives us all of the same capabilities called SQLite.

  • Whereas MySQL and Postgres and Oracle and Microsoft Access and the like

  • typically require that you run some special program, a server, that's

  • listening for requests and responding to requests

  • and often has usernames and passwords, if you want something

  • simpler because you're making a fairly small scale website that might only

  • have hundreds or thousands or tens of thousands of users,

  • and you're OK with all of your data living on the same server

  • that your code lives on, maybe in the same folder,

  • you can use something called SQLite, which allows us to use the language

  • SQL that we're about to see,

  • but it doesn't require the complexity of configuring

  • a whole server or a whole set of tools.

  • You can just store your data right there in a .db or .sqlite or whatever file

  • right in your same directory.

  • But how are we going to store that data?

  • Well, many of you are probably familiar with tools like this-- Google

  • Spreadsheets or Excel or Numbers, which allow you to store data

  • in generally rows and columns.

  • And if you're looking at a spreadsheet like this

  • and you're storing people's names and dorms and email addresses and phone

  • numbers, generally we humans use the top row for headers,

  • and we'll put quote unquote name, quote unquote dorm, quote unquote email

  • or phone or whatever.

  • And then every row below that first row represents in this case a student,

  • a record in our spreadsheet.

  • So if I were to do this in reality-- let me go ahead and do just that.

  • Name and dorm and maybe email and phone, and any number of other fields.

  • And I'm going to go ahead and really be tidy here.

  • And you can do these silly aesthetics in something

  • like Google Spreadsheets and Excel and Numbers.

  • But here's where I would do something like David and Matthews and Rob

  • and Thayer, and dot dot dot, and we can fill in the rest of these rows,

  • and we can just keep growing and growing and growing.

  • And frankly, most of these programs are smart enough

  • that even though I think Google Spreadsheets now

  • goes to a default of 1,000, when you export that file or save it,

  • they're only going to save rows presumably that actually

  • have data up until that point.

  • And then you can have any number of columns.

  • You get A through I think Z probably by default. So 26 columns

  • even if you need fewer, but you can create more as well.

  • So this is nice.

  • This itself is structured data.

  • You have meta data, like name, dorm, email, phone, keys, if you will.

  • And we've used that expression before in various CS contexts.

  • And then you have values, and those values line up with those keys.

  • And in fact, now if you think back to some

  • of the features we've seen already in Python,

  • a dictionary is a very popular and very useful data structure.

  • In Python, a list or an array is another such one.

  • And so really, if you think about what a spreadsheet is, it's kind of really

  • just a list of rows, each of which is a dictionary.

  • And what do I mean by that?

  • Well, within every row, whether it's two or three or 1,000, you have columns.

  • And those columns have keys or names to them.

  • Name, dorm, email, phone.

  • And recall that a dict or dictionary in Python

  • is just a collection of key value pairs.

  • So if this is just a list of values, and each of those values

  • is organized horizontally essentially as dictionaries, key value pairs,

  • where row three's name is Rob, row three's dorm is Thayer, and so forth.

  • Well it seems that we could map this kind of format

  • very cleanly to something like Python.

  • Or C for that matter, but Python would be a lot more convenient.

  • And sometimes, one sheet is not enough.

  • So sometimes you might have gone down here to the bottom corner.

  • You might have created something like sheet two.

  • And maybe here, instead of just storing students, you might store professor.

  • So the professor's name and office and course

  • and any number of other pieces of data that you

  • want to store about him or her.

  • And we could store professors.

  • And I'll even rename this from sheet two to Professors.

  • And in sheet one, I'm going to go here and rename this to Students.

  • And so we can really kind of organize our data cleanly.

  • But that's about it.

  • Even though Google Spreadsheets and Excel

  • do have some programming functionality built

  • in with macros or similar functions, it's

  • not all that easy to query the data.

  • It's not all that easy to integrate this kind of file

  • into a program you're writing.

  • And it's definitely not necessarily efficient.

  • Because you've not really told Google Spreadsheets anything

  • about how much data you're going to be putting in here,

  • what the type of that data is beyond maybe the formatting thereof.

  • And so there are some opportunities for better design.

  • And thus came along SQL databases that give us

  • not only the ability to store things in rows and columns,

  • but also the ability to create data or create structures in memory,

  • and insert data, select data, and update data, and delete data.

  • In fact, a SQL database is really a specific instance

  • of what's generally known as a relational database, a database that

  • allows you to maintain relations among pieces of data,

  • generally spanning sheets-- or as we're going to start calling them,

  • tables-- that adheres to a silly acronym,

  • CRUD, the ability to Create, Read, Update and Delete data.

  • And those verbs map to such keywords or commands as these

  • as we're about to see.

  • So what does this actually mean?

  • Well, let me go ahead and do this.

  • Let me go ahead and within CS50 IDE, I'm going

  • to open up a special web-based program that's

  • simply going to give me the ability to create databases and create fields

  • therein.

  • I'm going to go ahead and create a database called Lecture.db.

  • And I'm going to go ahead and open now a web-based tool via which

  • I can administer this database called Lecture.

  • And you're going to see that I have a tab called Structure,

  • which is where I'm going to be able to define sheets,

  • or tables as we'll call them.

  • I can execute raw SQL commands, with which we'll now start to get familiar.

  • And later on, I can even export or import data as well.

  • But let's focus on these fields down here.

  • So this tool is one of any number of dozens

  • of tools that might exist that allow you to create and administer databases

  • using SQL.

  • I'm going to go ahead here and call my table Registrants.

  • So this is like creating a new sheet in Google Spreadsheets.

  • And the number fields-- for now, I'm just

  • going to go ahead and have two, Name and Dorm.

  • And now notice I get a little HTML form here, whose purpose in life

  • is to make things a little easier for me.

  • So I'm going to type in Name for one field, Dorm for another.

  • And you'll notice that SQL databases now allow

  • me to choose any number of data types.

  • So we're sort of going back and forth here

  • in between the world of strongly typed languages, weakly typed or loosely

  • typed languages.

  • Here it can actually matter for performance,

  • and I'm going to indeed tell my database what kind of data

  • I'm generally trying to store in its columns and each of its rows.

  • So my options with SQLite, which is one incarnation of the SQL language,

  • is Integer-- which means what it says-- Real--

  • which is like a real or floating point number

  • as we've seen that generally can store up to 15 or so digits of precision.

  • But even then, if it can't, it's going to use a text field automatically

  • instead, because text is really just a string, otherwise known

  • in some environments as var char or variable length characters.

  • Blob is just going to be binary data.

  • And this means we can store things that are just zeros and ones that

  • aren't ASCII or Unicode text.

  • Numeric is a little more flexible.

  • If I want to store a dollar amount with dollars and cents.

  • If I want to store maybe an integer, maybe a real number,

  • I can use numeric, and just let the database-- SQLite in this case-- figure

  • out what actual data type or affinity, so to speak, to use.

  • Booleans is just going to be a zero or one.

  • But that's really just a convenience.

  • It's actually still going to use a full byte or 8 bits just to store a zero

  • or to store a one, really just an integer.

  • And then date/time, which underneath the hood can use any number of formats,

  • whether it's text or integers or reals.

  • I'll defer to the documentation.

  • But that's going to allow me to in a standard way

  • store things like dates and times so that we can actually

  • record when stuff is happening, when someone

  • registered for freshman intramural sports,

  • and even do useful arithmetic operations on dates and times.

  • So what should Name be?

  • Given all that, there aren't really too many options.

  • Indeed, my instincts are to go with text.

  • So I'm going to do that for Name and for Dorm.

  • And then I'm being prompted by this web form for a few different questions.

  • So primary key.

  • It turns out that among the features of a SQL database

  • is the ability to specify that this field-- this column

  • shall be my primary key, the field whose values uniquely identify

  • all of the rows in my database.

  • Now, it's not going to be relevant just yet for the following reason.

  • Right now, it's quite possible that two Davids live in Matthews,

  • or two Robs live in Thayer.

  • And so if I said that Name or Dorm were my primary key,

  • that would mean I can have only one David or one Rob or one Matthews

  • or one Thayer in my database.

  • So for now, we're going to leave this alone.

  • That would seem to be a bad thing to do.

  • Auto increment isn't going to be applicable, because you can't auto

  • incremented a string or a text field.

  • That's going to be germane to integers.

  • And not null, which is my problem that this field cannot or shall not be null.

  • So I will go ahead and do this and just promise that every student in Frosh IMs

  • is going to have a name and a dorm.

  • Meanwhile, you can specify a default value.

  • I'm not going to bother doing that here, but you can see from the drop down

  • that you can specify a certain value, like null,

  • or you can specify the current date or time, if that's actually germane.

  • But I don't want to give people generic names.

  • And I don't want to just assume that people live in some default dorm.

  • So I'm going to leave that alone and expect

  • that the user of this database table is always

  • going to give me a name and a dorm.

  • So let me go ahead and click Create.

  • And what you'll see is this.

  • phpLiteAdmin is really just a handy web-based tool

  • for executing SQL statements.

  • So I don't necessarily have to remember all of the syntax up front.

  • I can kind of learn from this actual tool.

  • And what phpLiteAdmin has really done for me is this.

  • It has in that file called Lecture.db executed this statement,

  • Create Table quote unquote registrants, Name is text not null,

  • Dorm is text not null, and that's it.

  • So phpLiteAdmin is just saving me, at least for today's purposes,

  • from having to remember exactly all that syntax.

  • But after practice, it becomes pretty familiar.

  • And certainly with Google, you can fill in any blanks.

  • But what does this now mean?

  • If I click on Return here and go back to not just

  • the lecture where I was, but I click now on Registrants, my actual table,

  • you'll see a bunch more tabs.

  • The top one of which is Browse-- but this table

  • is empty, because I've not inserted me or Rob or anyone else for that matter.

  • But if I click on Structure, you can now see in a web-based environment

  • a little reminder as to what this table looks like underneath the hood.

  • It's more esoteric looking than something like Google Spreadsheets,

  • where you just have columns and rows.

  • Here we're being more precise.

  • I have two fields, Name and Dorm.

  • Each is of type text.

  • Neither can be null.

  • But neither has a default value, and neither is a primary key.

  • But via the Edit link here and Delete, I can actually change those definitions,

  • and I can even add more fields if I think of things later.

  • And so this is allowing me to create really

  • a table of information whereby these are my columns or fields Name and Dorm.

  • So let's go ahead and insert for instance me in here.

  • So I'm going to go ahead and insert David, who is in from Matthews,

  • and click Insert.

  • And notice what phpLiteAdmin did.

  • It simply executed this SQL command.

  • Insert into Registrants, Name,Dorm.

  • So it's specifying what the two fields are that I want to insert into.

  • And what values do I want to put?

  • David,Matthews, not to be confused with Dave Matthews.

  • So now, if I click Return and go back to the Browse tab,

  • notice that we see Davids and Matthews.

  • Meanwhile, if I click Insert Here, and I type something like Rob and Thayer,

  • click Insert, a very similar SQL statement was executed.

  • But this one customized for Rob.

  • And now Return and see Browse, both of us are in there.

  • But now let me start to take those training wheels off.

  • It's not all that interesting just to click around in a web GUI.

  • Let me actually learn something from this.

  • And no, let me go ahead and type my own SQL now.

  • Insert into Registrants.

  • And if I really want to be proper, I can quote it.

  • But so long as you don't have any special characters,

  • quoting symbols is not strictly necessary in this case.

  • What fields do I want to insert into?

  • Name and Dorm.

  • And what values do I want to insert?

  • Raw values.

  • Textual values that are not special keywords or field names.

  • I do need to quote with single quotes here in SQLite.

  • Let's go ahead and put [INAUDIBLE] and Currier for Dorm or House there.

  • I can put a semicolon.

  • It's OK if I omit it, but if I want to execute multiple SQL

  • statements at once, I will need the semicolon again.

  • Let me go ahead and click Go.

  • And nicely enough, notice what happened.

  • One row was affected.

  • It took only 0.01 seconds, so it's pretty darn fast.

  • This is just a reminder of what it is I executed.

  • And if I go back now to Browse, I can actually see three rows in the table.

  • All right, so now things are getting a little more interesting.

  • What if I want to search for things?

  • I'm not even going to use this training wheel here.

  • I could use the Search tab and type in some names or some dorms

  • that I want to search on.

  • But that's not the goal here.

  • The goal here is not to teach phpLiteAdmin or this web-based tool.

  • It's just to use pretty quickly this kind of web-based environment

  • to create the schema for our database, the design of the database,

  • our choices of tables and columns and perhaps some of the initial data,

  • and then use that database in actual code.

  • So that's where we're going.

  • We're going to move away from this web-based environment

  • and use actual Python code, but first let's

  • see some of those other instructions.

  • Turns out, per this little summary here, that we have the ability

  • to create-- which we already did using Create Table-- to instert--

  • which I just did.

  • But what about select, update, and delete?

  • Let's not use the web GUI per se.

  • Let's just go ahead and start typing some raw SQL.

  • I'm going to go ahead and do exactly this.

  • Select let's say star from Registrants.

  • And now as an aside, stylistically it's not strictly

  • necessary to write select or from or all of the SQL key words in all caps.

  • Tends to make code I think a little more readable.

  • But I could still just say select star from Registrants.

  • But I find it nice to distinguish what are built in keywords from like what

  • are my field names and table names.

  • So select star-- star meaning everything.

  • Let's click Go.

  • And now what I see here is this web-based representation

  • of two columns with three rows that have come back.

  • We're going to start calling this a result set.

  • A result set containing three rows have come back.

  • And that's going to map to code pretty nicely.

  • Now, let's go ahead and do this.

  • Suppose that Zamyla moves to a different dormitory.

  • And let's go ahead and type this-- update registrants,

  • set Dorm equal to let's say Grays where Name equals quote unquote Zamyla.

  • And this is what's particularly nice about SQL.

  • Even though the syntax is a bit new and some of the key words

  • are certainly new, you kind of just say what you mean.

  • So update the Registrants table, setting the Dorm field or column

  • equal to Grays.

  • Not for everyone, only where the name field or column equals Zamyla.

  • So let me go ahead and click Go here.

  • All right, one row is affected up here.

  • And notice, a reminder of what I just executed.

  • Now if I click on Browse just to see what the data looks like,

  • indeed Zamyla has relocated to a different dorm or building altogether.

  • If I screwed up, though, notice what could quickly happen.

  • Suppose I did this, Update, Registrants, set Dorm equal to Grays,

  • and I left off the so-called predicate, the

  • Where clause that I did a moment ago and click Go.

  • Now three rows were affected as indicated here.

  • And if I click Browse now, notice that oops, all three of us

  • now live in Grays.

  • So I can fix this manually if I really wanted to

  • without executing another SQL query.

  • But this is just meant to be a user-friendly way

  • of quickly and dirtily editing or creating some data.

  • Really, we're going to start using those SQL commands only.

  • What about Delete?

  • Suppose that Rob has graduated, and he's moving out.

  • Let me go ahead then and do Delete from registrants, where Name equals Rob.

  • Go ahead and click Go.

  • One row effected.

  • And if I go and click Browse now, only Zamyla and I remain.

  • All right, so those are just a few such keywords.

  • But what's nice is that again, it's pretty expressive.

  • So to recap here, how did I create the table?

  • I literally said Create Table Registrants.

  • Though I didn't type this at the time, but this was just the

  • SQL that that web tool generated for me.

  • Spoiler alert, don't look at that just yet.

  • We're going to come back to that in a moment.

  • And I specified that the Name field is going to be of type text

  • and Dorm is going to be of type text as well.

  • Down here I then inserted something like David from Matthews,

  • specifying name and dorm.

  • And notice, these lines wrapped onto two lines,

  • but in general, this could just be a really long query as well.

  • Select Star from Registrants, Update registrants,

  • Delete from registrants where-- I get another little spoiler.

  • And it's a spoiler because there is actually

  • a flaw in my database table's design.

  • My Registrants table was not so well thought through.

  • I indeed only care about storing names and dorms, at least right now.

  • Maybe later, emails and phone numbers.

  • But what could go wrong?

  • When I deleted Rob, something could have gone wrong a moment ago.

  • When I updated Zamyla, something could have gone wrong.

  • What if there are two Zamylas at Harvard?

  • What if there are two Robs at Harvard?

  • Well, each of those queries didn't seem to specify which Rob or which Zamyla.

  • It just said where the name is Rob or the name is Zamyla.

  • So how do we distinguish the two?

  • Well, maybe I should have forced undergrads or students

  • to register with their full names, Zamyla Chan.

  • OK, so that would decrease the probability

  • that we're going to have multiples Zamyla Chans at Harvard.

  • But still there could be that probability.

  • Same with Rob Bowden, or really any name, whether common or uncommon.

  • So that doesn't feel very robust.

  • We could look for a Zamyla that specifically lived in Korea,

  • but maybe on the off chance there's two of them

  • there, that's going to create problems.

  • And we might update or delete more rows than we intend.

  • So this is why we humans in the US for instance

  • have unique identifiers, for better or for worse.

  • Things like Social Security numbers.

  • Or here at Harvard we have Harvard University IDs-- HUID numbers-- or at

  • Yale MIT IDs.

  • Or where you are in the world, odds are you have in your wallet or at home

  • or somewhere in your life, a unique identifier.

  • And you might not even know what those identifiers are.

  • In fact, anytime you've registered for an account

  • on some website like Facebook or Google or any number of others,

  • underneath the hood, even if those companies aren't telling you

  • what your unique ID is, they know you by more than just your name and dorm

  • or email address or phone number.

  • Odds are, they have assigned you some unique but very simple

  • piece of data, like an integer, so that Zamyla might be user number three.

  • Rob might be two.

  • David might be one.

  • Simply because that's the order in reverse in which I inserted them

  • into the table.

  • So let me go ahead actually and fix this.

  • Let me go back into my database.

  • And you know what?

  • I'm going to be pretty bold here.

  • I'm going to go ahead and in my table drop the whole thing.

  • So beware ever executing the drop command,

  • because it will literally drop all the data in your table.

  • But I'm going to confirm that, I'm going to start over as follows.

  • I'm going to create a new table called Registrants,

  • this time with three fields.

  • And in addition this time to Name which will

  • be text and Dorm which will be text, this time I'm going to think ahead.

  • And I'm going to give everyone a unique ID.

  • Convention is to call it literally id in all lower case.

  • And indeed, field name should generally not have spaces, not

  • have funky punctuation.

  • Whether you use so-called camel case with alternating capital letters

  • or snake case with underscores, it's generally up to you.

  • But using underscores tends to be the convention here,

  • although it's not yet to remain.

  • But I'm going to make this id field an integer,

  • and I'm also going to make it my primary key.

  • So that there shall be no two registrants in this Frosh IMs program

  • that have the same primary key, the same number uniquely identifying them.

  • And you know, a better yet-- and this is one

  • of the features you get for free with many relational databases--

  • auto increment.

  • I the programmer don't care what number Zamyla is.

  • I don't care what number David or Rob is.

  • I just want them to have a number, whatever it is.

  • But I don't want them to be the same.

  • So let me just let my database auto increment this ID.

  • Give the first user one, the second user two, the third user three,

  • and so forth.

  • And by nature, this cannot be null.

  • They need to be uniquely identified.

  • And default value here doesn't matter, because the database

  • is going to take care of that.

  • Let me go ahead now and click Create.

  • And now you'll see the create table that I just

  • spoiled a moment ago on the slide.

  • This now creates a table again called Registrants,

  • because I dropped or deleted the old one.

  • But notice how much detail we have specified what id is.

  • It is not only an integer, it is also a primary key that supports auto

  • increment and it cannot be null.

  • Meanwhile, Name, Dorm, I don't particularly care this time.

  • Though I could specify like before, I don't want those to be null either.

  • So again, that's just going to be a design decision.

  • But let me see how this affects things.

  • Now let me go into MySQL tab, where I can just execute some raw SQL.

  • And soon we're going to be executing SQL from within Python code.

  • Let me go ahead and insert into Registrants a name and a dorm.

  • And notice, I don't care about specifying the ID.

  • That's not interesting to me.

  • All I care about is the values like David and Matthews.

  • And if I now click Go, notice that's all that the database executes.

  • It doesn't go in and secretly change my query,

  • but it does notice that I've omitted explicit mention of that id field.

  • And if I click Browse, now notice, David for Matthews has an id of one.

  • Let's go ahead and do two other inserts, again via raw SQL.

  • Let me go ahead here and do insert into Registrants.

  • And you'll notice there's a nice little history in this web tool of all

  • the previous queries I've executed.

  • So in case you want to click and save yourself some typing, you can do that.

  • But for now I'm just going to go ahead and insert Rob as before from Thayer.

  • And for good measure at the same time, Registrants,

  • Name Dorm values Zamyla from Currier.

  • And notice the semicolon, so that hopefully both of these

  • should go through as separate queries.

  • Indeed they did.

  • One row affected.

  • One row affected.

  • And if I click Browse, now Rob is number two, Zamyla is number three.

  • And this is just going to continue on until we maybe run out of integers.

  • But SQLite will support as many as 8 byte integers, or 64-bit values.

  • So that's not going to happen for quite some time.

  • So now notice what I'm protected against.

  • There might be a whole bunch of Robs at Harvard it's fair to say.

  • But now if I do a Delete from Registrants

  • where name equals Rob and id equals two--

  • and I don't have to quote the two, because it is indeed a number--

  • this is actually more information than I need.

  • Indeed, because id is a primary key, I don't need to mention Rob's name.

  • If I already know his id, I can just say Delete from Registrants

  • where id equals two, click Go.

  • One row affected.

  • And goodbye to Rob.

  • And what's nice now about SQLite is suppose I do one other insert.

  • Let me go ahead and Insert into Registrants a name and a dorm

  • with values of Jason and Kirkland.

  • And click Go.

  • Is Jason going to now be the new number two, or is he going to be number four?

  • Let's take a look.

  • If we click Browse here, he's indeed four.

  • So one of the features you get by using the auto increment field

  • is that the database is going to ensure that just in case

  • you have remnants of Rob somewhere else, or really

  • his id number maybe in some other table or in your logs or some piece of data

  • you care about, we're going to make sure that Jason has not only a unique id

  • now, but a unique id in perpetuity.

  • If you don't use unique id-- or rather auto increments,

  • SQLite will actually try to reuse numbers just

  • for efficiency, which can actually break things down the road later.

  • All right, so things are getting interesting.

  • We have the ability now to insert data, delete data, update data,

  • and then select it.

  • What more can we actually do?

  • Well, it turns out that SQL tends to come with not only various data types,

  • but also various functions.

  • It turns out to save ourselves trouble, there are functions like date functions

  • and times and date find times.

  • So for instance, I want to execute a query like select star

  • from registrants.

  • But no, I don't want to select all the registrants.

  • Maybe I want to select all of the registrants this year.

  • I could say something like, select star from Registrants

  • where date of a particular field is greater than or equal to January 1st,

  • 2016 or 2017, or whatever the year happens to be.

  • So these are functions that can do that kind of thinking for me.

  • But SQL also supports a number of even more compelling features.

  • We saw a moment ago primary key, which again specifies

  • that this one and only field shall uniquely identify all of the rows

  • in my table, I guarantee it.

  • There's other types of indexes you can use.

  • So this is the special type of index.

  • And essentially what you're getting is underneath the hood the database,

  • whether it's SQLite or MySQL or Postgres or Oracle or whatever,

  • is generally leveraging some of that week five, week four, week six material

  • from CS50 in the fancier data structures, especially the tree data

  • structures.

  • And if we tell the database in advance, hey, this field's

  • going to be my primary key, or this field

  • is going to be an index more generally, whoever implemented the database

  • software itself probably built into it the ability to store that field using

  • a fancy tree or some other data structure that ultimately

  • is going to expedite selecting and updating

  • and inserting and deleting data.

  • And indeed, we can say exactly that.

  • If you know in advance that you're going to be searching on a certain field all

  • the time-- not just the id field.

  • Maybe your table has an email field or maybe a zip code field or some other

  • field that users might want to search on, rather than just store it

  • in a column by defining a field in your database,

  • and then really just letting the database use linear search to find

  • everyone in 02138 or 90210 or whatever zip code,

  • you can actually tell your database in advance, index this zip code field,

  • index this email field.

  • Because I know I'm going to be searching on that specifically using

  • that special Where keyword in SQL.

  • And let the database give you better performance than something

  • like linear search alone.

  • Meanwhile, you can also specify, I'm not going to just search on this field.

  • I don't care just about it's performance.

  • You know what?

  • I can promise you right now this field shall be unique.

  • It might be an integer like a primary key.

  • But if there's some other field in a database that

  • in advance is going to be unique by design,

  • you can tell the database as much.

  • And the database too will optimize queries involving that field as well.

  • So among the fields we discussed today-- name, dorm, phone, email, zip code,

  • you can probably think of bunches of others--

  • which of those might be candidate to be unique

  • and to be indexed as unique by your database?

  • Name already feels like a bad idea.

  • We want to be able to have multiple Zamylas, multiple Robs, multiple

  • of any name in the world.

  • Probably want multiple people to live in the same dorm,

  • so that there can't be just one person from Currier, Matthews, or Thayer,

  • wherever.

  • What about email?

  • Email's kind of an interesting candidate, right?

  • Assuming you're OK with banning users-- maybe a couple of people in your life

  • who happen to use the same email address for whatever reason.

  • So long as you're comfortable assuming that every human in the world

  • is going to have or must have his or her own email address,

  • you could proactively say to your database,

  • OK, the email field or the email column shall be unique.

  • I don't want a user with the same email address to accidentally register twice,

  • let alone three times or more.

  • And even if two people think they have the same email address maybe because

  • of a typo, I don't want him or her to be able to register

  • if someone with that address has already been registered.

  • Phone number maybe works OK in the world of mobiles,

  • but if people still have landline phones,

  • you probably don't want to enforce uniqueness there.

  • But what else might you want to impose?

  • Not null.

  • If you want to ensure that everyone in your database has a name,

  • there's nothing stopping them inherently from typing like ASDF or whatever

  • into your database.

  • You might have to protect against bogus data.

  • But you can at least ensure that the user

  • has to give you some kind of value.

  • And then foreign key.

  • And we'll come back to this, but there's a way

  • of specifying that the data in one spreadsheet or in our world now,

  • one table, is somehow related to or identical to the data

  • in another spreadsheet or again a table.

  • But let's come back to some of these features in a moment.

  • What else do we get from SQL?

  • We also get the ability to join tables together.

  • Now what does this mean?

  • Well, let's go ahead and let's go back to the spreadsheet for just a moment.

  • And let's not worry about students and professors anymore, but let's go ahead

  • and think of this as, for instance, users.

  • I'm going to rename my quick and dirty spreadsheet here Users.

  • And what do I want every user in my world to have?

  • Well, I want everyone to have some unique ID,

  • and I'll let my database ultimately assign that.

  • I want everyone to have a name.

  • Maybe let's say a mailing address, a phone number, an email address,

  • and there could be bunches of other fields as well.

  • What else might someone have?

  • That's probably enough for a customer database.

  • And let's go ahead now and consider what these values might look like.

  • So my very first customer, who buys my first widget or whatever,

  • might be someone like Alice.

  • And she lives at 1 Oxford Street, Cambridge, Mass, 02138, USA.

  • Let's go ahead and make this column wider just so we can see it.

  • Her phone number shall be 617-495-5000, and her email address

  • shall be alice@example.com.

  • Let me make this field a little wider as well.

  • Then Bob comes along, and he buys something too.

  • He happens to live down the street at 33 Oxford Street, Cambridge, Mass

  • 82138, USA.

  • And his number is 617-495-9000.

  • And he is just bob@example.com.

  • Then there's someone new altogether, Charlie.

  • He lives at 51 Prospect Street in New Haven, Connecticut,

  • and his zip code is 06511, USA.

  • And his number will be-- he doesn't want to cooperate,

  • so he's just going to give us some bogus number there.

  • But he'll be also charlie@example.com.

  • And this table now-- if you think of the spreadsheet

  • really as just a database table of users,

  • could be dozens or hundreds or even thousands of rows long.

  • So let's now consider what data types I should

  • be using if I want to migrate this spreadsheet into an actual SQL

  • database.

  • So the id field, which frankly you usually get sort of for free

  • in a spreadsheet program, because it just numbers all the rows for you.

  • But if you resort your data, those numbers

  • do not followed the original rows.

  • And so giving our data, ultimately we'll see their own numbers as good.

  • Because those should remain with the data.

  • Alice should always now be one, Bob two, and Charlie three.

  • So name.

  • Let me go ahead and consider what should the fields here be?

  • Well, just for the sake of discussion, let me go ahead

  • and annotate right on the screen here.

  • It stands to reason that this should be an integer.

  • Name should probably be text.

  • Address should probably be text.

  • Phone, I mean it could be an integer.

  • It's definitely not a real.

  • Numeric is not right, because it could then become real somehow.

  • But is integer right?

  • I'm a little worried here.

  • Like especially if it's an international customer,

  • I'm kind of biased at the moment toward US users.

  • But suppose someone typed in their actual calling code or country code

  • and they did something like 011 or 001 for the US and then 617.

  • If I called this field an integer, and then let

  • the user type in their actual number like this, if it's an integer,

  • those leading zeros mean nothing mathematically.

  • So my database is probably going to throw them away.

  • And I might not want that to happen.

  • We get lucky in that this still could work as a phone number,

  • but I really shouldn't be throwing away user's data.

  • So you know what, I'm going to go ahead and just call this text.

  • Whatever the user gives me, I shall store from him or her.

  • And then finally email can be text.

  • And now let's consider how we might index these columns.

  • We'll come back to id in a moment, because SQL's

  • going to give us that automatically in our database

  • we'll see by using auto increment again.

  • But it will be an integer.

  • Name, I could-- I don't want to make it unique,

  • because I want to have multiple Alices and multiple Bobs.

  • I could make it an index.

  • If for whatever reason the website or the application I'm making

  • wants to make it easy for users to search on names,

  • I could proactively say, OK, database index the name column,

  • because it will speed up my searches to be something better than linear search.

  • Address, same thing.

  • If I want to use auto complete or some kind of search feature.

  • So I can search over address.

  • Maybe I want to do that.

  • Phone number, maybe that could be unique, if I'm expecting only mobiles.

  • But that could be risky.

  • Email is the only one I might claim a unique constraint is pretty reasonable,

  • especially if your website-- as is commonly

  • the case these days-- is going to expect users to log in,

  • certainly not with their id.

  • And I see no mention of username.

  • If they're going to use their email address as their login,

  • then you better only allow any email address

  • to be registered once and only once.

  • But you know what I'm not liking?

  • And let me make this column even wider.

  • What's kind of dumb about this design at the moment?

  • You know, before we even get ahead of ourselves

  • and start moving this into our SQL database, what's dumb here?

  • I feel like some redundancy, right?

  • Like 1 Oxford Street, that's special and unique.

  • 33 Oxford Street, that's special and unique,

  • even though there could be multiple people living or working there.

  • 51 Prospect Street, same thing there.

  • But like Cambridge, Mass 02138 USA.

  • Cambridge, Mass 02138 USA.

  • Like why for every resident of Cambridge,

  • Massachusetts am I storing Cambridge, Mass 08138 USA.

  • Cambridge, Mass 02138 USA.

  • This would seem redundant.

  • I mean it's not likely that Cambridge is going to change its name.

  • I mean, once upon a time it was Newtown, now it's Cambridge.

  • But that's not likely to happen again anytime soon.

  • So it's not so much anticipating change, but just look at all these darn bytes

  • that I'm storing redundantly.

  • Which of those bytes though should hopefully

  • be sufficient to identify where someone lives or works?

  • At least in the US, we tend to use our postal codes pretty readily.

  • So you know what?

  • Let me go ahead and create a field called zip code.

  • Why don't I just store 02138 here.

  • And oh, stupid Google Spreadsheets, notice what it did.

  • It is assuming, because something looks like an integer, that it is an integer.

  • So this is a stupid artifact of using a spreadsheet program.

  • Let me change that to plaintext and retype it.

  • Now Google will respect my input.

  • But the point is all the more clear, we had better in our SQL database

  • call that text and not an integer.

  • And now you know what?

  • I can make my address really just a street address.

  • And I can get rid of this, and oh, I shouldn't get rid of this.

  • But I should in this column do 06511, and then here, get rid of all of that.

  • Damn Google again.

  • Let's go ahead and change that to plaintext.

  • 05611, and change this to 02138.

  • So it's a little cleaner now.

  • There's a little bit of duplication.

  • I now have 02138 twice in my database-- or in my spreadsheet in this case.

  • But at least it's a lot, lot, lot less redundant.

  • But I need to recover that information.

  • This is just my Users spreadsheet-- my Users table.

  • You know what?

  • I'm going to go ahead and create another sheet with Google Spreadsheets here.

  • I'm going to go ahead and call this Zip Code.

  • I'm going to call this City, State, Country.

  • Although Country makes things a little more complicated,

  • because zip codes aren't going to be the same format everywhere.

  • But for now, 02138 is going to be-- dammit.

  • Let's change this whole column this time to plain text.

  • 02137, Cambridge, Massachusetts, USA.

  • And now for New Haven, let's do the same thing.

  • Paste that, New Haven, Connecticut, USA.

  • And now rather than just call the Sheet Three,

  • let me more intelligently call this like Zip Codes.

  • And you know what?

  • I'm not loving how many bytes I'm using to store Zip Code.

  • Why don't I learn from lessons past and give this spreadsheet

  • or this column or rather this table its own ID,

  • and just arbitrarily assume that someone will number those for me.

  • So now I can whittle this down to just numbers where

  • this should be an integer now.

  • And in my Zip Codes table, let me make room for just these little annotations.

  • This shall be an integer.

  • Zip code we've already learned had better be text.

  • This should be text.

  • This should be text.

  • This should be text as well.

  • All right, so I have taken a very simple idea, spreadsheet

  • with all of my customers, and I seem to have really over-engineered it.

  • I've made something simple more complex.

  • But why?

  • I mean, it's not compelling when I have just three customers.

  • It took me more time to describe what I was

  • doing than just do it the original way, albeit with the redundancy.

  • But if you do start to have tens of rows or hundreds of rows or thousands

  • or tens of thousands, having all of that duplicated

  • data just doesn't make sense.

  • You're throwing away disk space.

  • You're throwing away potential performance and running time.

  • You're throwing away money, if you have to buy more disk space for your server.

  • And so what I've done here is what we would call normalizing my database.

  • Factoring out the commonalities that can be uniquely identified more simply,

  • with this case a zip code, or even more simply,

  • via some arbitrary but some consistent number that I impose.

  • So let's go ahead now and port this over to my database.

  • I'm going to go ahead and use phpLiteAdmin, just because it

  • makes things nice and easy here.

  • I'm going to go ahead and drop my Registrants

  • table, because that story is over.

  • And now let me go ahead and create a couple of tables.

  • The first of which I'm going to go ahead and call my-- what did I call it?

  • Users.

  • And number of fields, I'm going to need one, two, three, four, five,

  • six fields.

  • So let's go ahead and do that.

  • So six fields, and those are going to be id, name, address, and what did we say?

  • Zip code, and phone, and email.

  • And this shall be an integer.

  • This shall be text.

  • This shall be text.

  • This shall be text.

  • This shall be text.

  • This shall be text.

  • But id shall be a primary key.

  • I want it to auto increment.

  • And you know what?

  • I don't want names to be null.

  • I'm OK with addresses being null if they don't

  • want to cooperate or zip code being null if they don't want to cooperate.

  • Phone, email should not be null.

  • So I'm going to go ahead now and click Create.

  • This is the query-- somewhat longer now-- that was executed.

  • And now let me go ahead and create another table

  • by clicking on the name of my database again, creating a new table called

  • Zip Codes.

  • Number of fields here is going to be one, two, three, four, five.

  • So five fields, one of which is id, one of which is the zip code.

  • Then city, then state, then country.

  • Text, text, text, text, integer.

  • This shall be primary key.

  • This also shall be auto incremented.

  • Zip code cannot be null.

  • And everything else I'm OK with being null.

  • So I can also not spell country.

  • Create.

  • And so now, let's toss in some data.

  • Let me go ahead and insert really quickly, I'm going to use the web form,

  • zip code of 02138, Cambridge, Massachusetts, USA.

  • Insert.

  • Let me go ahead and insert New Haven as well.

  • So 06511.

  • 05611, yep.

  • 05611.

  • New Haven, Connecticut, USA.

  • And now let me just give myself a few customers.

  • Let me go in to my Users table, insert David-- or no, who did we have?

  • We'll copy and paste our actual users.

  • Alice, so we had Alice from 1 Oxford Street in zip code now.

  • But this zip code-- oh, interesting.

  • I goofed.

  • I went a little too fast.

  • So let me abort.

  • Let me go to Structure.

  • Zip code I said was text, but you know what?

  • I'm going to change that to integer.

  • All right, and now let me go over to the Insert tab again.

  • Let's put Alice from 1 Oxford Street at location 1,

  • with her phone number, which was for 495-5000.

  • And you can do this.

  • Even though this looks even more like an integer,

  • a common mistake I think in a database is

  • to just blindly throw in whatever the user typed in,

  • even if he or she used parentheses or pluses or dashes

  • or any number of other punctuation symbols.

  • There's no reason for us to store all of that.

  • In fact, if we want to pretty things up and throw away

  • some of the inconsistencies, I could just

  • use code-- Python, ultimately-- throw away all punctuation symbols,

  • and then just store 10 characters in the case of a US number,

  • or whatever pattern I care about.

  • Or I could proactively with Python even pretty this

  • up by just putting the dashes where I want them to be,

  • not where the user necessarily put them.

  • And then Alice was alice@example.com.

  • And now let me really quickly go ahead and insert Bob, who was from 33

  • Oxford Street.

  • He's also in zip code 1 now.

  • His phone number was 617-495-9000.

  • And I'll go ahead and pretend like I'm formatting this in code.

  • And then he was bob@example.com.

  • And then lastly, let's go ahead and insert Charlie

  • from 51 Prospect Street in New Haven, whose id I now know to be 2.

  • 617-000.

  • He was the one that didn't cooperate.

  • And charlie@example.com.

  • All right, so what does this now mean?

  • In my database called Lecture, I now have two tables--

  • Users and Zip Codes, each of which has some number of fields inside of them.

  • But this is not all that useful to me anymore,

  • because now if I go ahead and select my users with a query like this,

  • select star from users, go, I get back this.

  • But what the heck is zip code 1 or zip code 2?

  • Now OK, I can figure this out.

  • If I know that I'm looking for Alice, and I see that she's in zip code 1,

  • I could do another query.

  • All right, let me go back to that SQL tab.

  • Let me go ahead and do select star from Zip Codes where id equals 1.

  • Because I saw that Alice's zip code was uniquely identified by 1.

  • Oh, all right.

  • So Alice lives in zip code 1.

  • Yeah, that's 02138 Cambridge, Mass, USA.

  • OK, got that.

  • But now I've used two SQL queries.

  • The data is still like-- some of it's over here, some of it's over here.

  • Now I've got to somehow combine it in code.

  • And you can do that, but SQL is much more powerful than that.

  • It has other keywords like this, where I can have

  • the database do the thinking for me.

  • Let me do this.

  • Select star from users, join Zip Codes on users.zipCode equals zipCodes.id.

  • Take a moment to think, and ignore the red squigglies.

  • That just my browser thinking I'm being grammatically bad, verbally too.

  • Select star from users, join zip code.

  • So I'm telling the database, go ahead and join these two tables.

  • How?

  • Well if you think of one table as this hand and one table

  • as this hand, what's nice is that each of them

  • has a field that's inside of the other.

  • This table might have the id field.

  • This table has the zip code field.

  • Wouldn't it be nice if I could somehow stitch those together,

  • lining up those zip codes, whereby in this table, Users--

  • I called it zip code-- and then in my Zip Codes table I called it id.

  • Let's see what happens.

  • Let me go ahead and click Go.

  • And amazingly, look at what I get back.

  • My result set this time contains everything.

  • I get Alice, Bob, and Charlie, each of whom lives at these addresses.

  • But look at their zip codes now.

  • They've been filled in with the actual values,

  • plus the cities and the states and the country.

  • So this is where you really now start to scratch

  • the surface of the capabilities of something

  • like SQL, because it can actually combine data in this way.

  • And this was a pretty simple query.

  • But now I can sort of exercise good design.

  • I can keep my data very cleanly structured and normalized, whereby

  • I factored out all of the redundancies.

  • And yet I can programmatically reconstruct that data

  • and ensure that I can get back everything I care about.

  • Moreover, I can further optimize things.

  • If I go into Users for instance and I go under Structure,

  • notice down here there's another field.

  • Create index on one column or more.

  • Let me go ahead and create that index on my Users table.

  • I'm going to call it Arbitrarily Email, and I'm

  • going to say duplicate values are not allowed.

  • I'm not going to bother with a where clause here,

  • but I'm going to put this constraint-- this unique index on my email field,

  • and now click Create.

  • And notice what got executed.

  • Earlier we used it to create a table.

  • Here I'm using it to create a unique index called Emait--

  • though I could've called it whatever-- on the Users table

  • using specifically this field.

  • So this now is my way of ensuring that the following can't happen.

  • Notice that in my Users table I've got Alice with alice@example.com.

  • suppose that someone else named Alice, also with that same email address--

  • or who thinks her email address is the same,

  • because of whatever typographical error-- comes along,

  • and I try to register them for my site as follows.

  • Insert into Users.

  • Let's see, we have name, what are the other fields now?

  • Name, address, zip code, phone, and email.

  • The values as follows Alice number two, so her name's not distinct.

  • Her address is going to be 1 Main Street, so her address is not the same.

  • Zip code, she'll go ahead and live in New Haven, for instance, so 2.

  • Phone we don't need to worry about, so she'll be another fake phone

  • number for today's purposes.

  • But suppose that she also thinks her address is alice@example.com

  • Or heck, maybe this is the same Alice who

  • forgot she has an account on our website-- maybe more likely--

  • and is trying to reregister with the same address.

  • What's going to happen?

  • If I didn't make any typos here, let's click Go.

  • Interesting.

  • The database stopped me from doing this.

  • Error, unique constraint failed, users.email.

  • And this is common syntax in the SQL world.

  • tableName.fieldName.

  • So insert into Users, whatever I just typed

  • failed, because the unique constraint.

  • And indeed, we can see this.

  • Let me go to Browse and look.

  • There is no second Alice.

  • So we could do this in code, as we'll soon

  • see even more in the world of Python.

  • You could check by writing your Python code to see, wait a minute,

  • someone's trying to register as alice@example.com.

  • Let me quick select all my users from my database, look for Alice,

  • and if I already have Alice at example.com,

  • I'll just say, no, you can't register.

  • Much like I said no you can't register for Frash IMs

  • if you don't have a name or a dorm inputed.

  • But my database can do that for me.

  • And this is a nice wall between me and my database,

  • a nice wall between the software developers

  • and even the database administrators so that you

  • don't have to worry about data accidentally

  • getting into your website or your database that shouldn't actually

  • be there.

  • All right, we have all of these building blocks now.

  • We have Select and Insert and Delete and Update and now Join.

  • What more is there?

  • Well, there are some more keywords.

  • But let's first use the ones we have.

  • But this time, not just to play around via the web-based tool.

  • Let me actually show you one other thing.

  • So let me go in to CS50 IDE and take a look at my source 9 directory,

  • and you'll see these four folders, two of which

  • you're about to look at in a moment.

  • And you'll also see lecture.db, a file that I've

  • been using to store all of my SQLite data, including those tables.

  • Well, it turns out that we can actually use a command line client in order

  • to see the same data.

  • Let me go ahead and do that by typing SQLite 3 for version 3.

  • And then let me go ahead and type in the name of this file, lecture.db Enter.

  • And now you'll see a very simple and perhaps a little cryptic command line

  • interface.

  • But thankfully if you type as it says .help,

  • you'll see everything that you can possibly do with this command line

  • program.

  • But we're going to go ahead and keep it simple.

  • I'm going to go ahead and just say .tables,

  • which is SQLite's way of saying show me what tables are inside of this file

  • called lecture.db.

  • Now here I have users.

  • Let me go ahead and see what is the schema of these tables.

  • And you'll see the SQL commands with which the Users table and the Zip Code

  • tables were created.

  • But more interesting than that is this.

  • At the command line, this is just a SQL client.

  • I've been using the web-based GUI, phpLiteAdmin for the past few examples.

  • But we can just do this at the command line too.

  • Let me go ahead and select star from users semicolon,

  • and now you see in purely textual form the exact same data.

  • Here's the ids, my users name addresses, zip codes, phone, emails.

  • I can see the same thing from my zip code fields.

  • Let me go ahead and do select star from Zip Codes.

  • And there we see the same data.

  • So using the command line too, you don't have to even use phpLiteAdmin.

  • It just tends to be a little more user friendly.

  • You can also see the same data in the same file.

  • And now that I've done this purely manually,

  • let's now transition to doing this in Python code.

  • But first, let me make mention of one detail in the tables

  • that we had created.

  • Back here in users, you'll recall that we

  • had the structure of having an id and a name

  • and the address, zip code, phone, and email.

  • What you'll actually find is that the world has generally

  • standardized how you define what are called foreign keys in these tables.

  • So specifically, which of the fields in this table

  • are kind of foreign to this table?

  • That one of zip code.

  • Originally, that zip code was a text field, 02138,

  • and any number of other zip codes as well.

  • But then I changed it to an integer, because it really in this table

  • is a foreign key.

  • Because those same numbers 1, 2, and onward

  • are primary keys in that other table called Zip Codes.

  • And so what would actually be more typical here

  • is that we would often name this not Zip Code, which is a little ambiguous.

  • But just as a human convention, let me propose

  • that we clean up this design a little bit

  • and call this Zip Code ID to make super clear that this is indeed an ID.

  • It's not this table's ID, it's an ID from the Zip Codes table.

  • Plural would be the convention, and singular would be the convention here.

  • And indeed, we can now use the same kind of statements in code.

  • But it turns out with one of our concluding examples today,

  • we'll see while it's advantageous to adhere to certain conventions,

  • and frankly even this might even be typically higher in the table as an id

  • and not just buried there in the middle.

  • But that's a more minor detail.

  • All right, now let's return to CS50 IDE and build upon those previous Frosh IMs

  • example.

  • But now weave in this new feature that we have,

  • which is that ability to store data and retrieve data from a database.

  • So recall that this was Frosh IMs 1, which

  • was an improvement upon Frosh IM 0 in that at least Frosh IMs 1 actually

  • kind of sort of registered students.

  • It used registrants.csv, and it wrote out the data to a simple text file.

  • But with text files, CSV files, even though you could double

  • click them and open them again in Numbers, Excel, and port

  • them into Google Spreadsheets, we don't have the same expressive capabilities

  • as we do with a language like SQL to select, insert, update, delete.

  • They're just text files.

  • With SQLite though, a binary file, and the SQLite 3 program,

  • and we'll soon see Python code, we can actually

  • execute more sophisticated queries than the CSV format allowed.

  • So how are we going to do this?

  • Let me propose now in Frosh IMs 2, we create a new application.py

  • that's going to behave as follows.

  • Let's go ahead and propose that I'm going

  • to import as before some of the Flask functionality.

  • So from Flask import capital Flask, which is the class that we're using

  • is the application itself.

  • Render template, so I can render HTML.

  • Redirect, so that if I want to redirect the user,

  • I can do this by an HTTP location redirect.

  • Request, so that I can actually get at form data.

  • And we'll see URL4, which you might recall from past problems.

  • But you know what?

  • I'm also going to import from CS50 our SQL wrapper.

  • So inside of the CS50 library recall for Python

  • is getInt and getString and getChar and getFloat.

  • But inside there also is a whole library called SQL.

  • So the CS50 SQL library that doesn't do all that much,

  • but it does offer us an execute function that's

  • going to allow us more simply than might otherwise

  • be possible to execute SQL statements inside

  • of our Python code against Lecture.db on FroshIMs.db

  • or whatever the file may be.

  • And I'm going to do that as follows.

  • First I'm going to instantiate my Flask application, as always.

  • And now I'm going to instantiate a database connection essentially

  • as follows.

  • SQL quote unquote SQLite colon, slash, slash, slash.

  • So notice the third slash in this case.

  • FroshIMs2.db.

  • So this will be Frosh IMs version 2.

  • And that's it.

  • So I'm telling the CS50 library to use SQLite and to open up ultimately

  • the Frosh IMs 2 database.

  • Now I'm going to go ahead and have a simple route as before for just slash.

  • I'm going to define a function called index as before,

  • and this one's super simple.

  • Just go ahead and return render template of index.html.

  • And now let's reimplement register as follows.

  • So app, route, slash register.

  • But in this case, you know what?

  • I want to support post.

  • I don't want students' information ending up in the URL

  • and like the computer lab's history or the roommate's browser or whatever.

  • So let's go ahead and specify to Flasks that you know what?

  • Only support this list of methods, specifically just post.

  • Let me go ahead now and define a method called Register.

  • That's going to map to this route.

  • And as before, if request.form quote unquote name equals equals nothing,

  • or request.form quote unquote dorm equals equals nothing,

  • then go ahead and return failure, and return render template of quote unquote

  • failure.html.

  • So the website is almost exactly the same

  • now, except we have preemptively added this database capability to our code

  • that we're now going to use.

  • Otherwise, if we're not in failure, let me

  • go ahead and execute the following SQL.

  • Inserts into registrants a name and a dorm with the values of-- we'll

  • come back to this in a moment.

  • Colon name colon dorm.

  • I could call these anything, but a convention

  • in a lot of SQL libraries-- a lot of code,

  • whether it's Python or PHP or Ruby or other languages--

  • is to have these placeholder values with a colon and then a word that's probably

  • identical to the field name, but much like in C,

  • percent s has been placeholders, much like in Python open curly

  • brace close curly brace has been placeholders.

  • We've got one more placeholder convention, which is colon, and then

  • the name of a symbol.

  • But for the most part, they can almost always

  • be identical to what the field names are.

  • And now, I need to plug these values in.

  • So I'm going to go ahead and plug in for the first name equals request.form

  • name.

  • And dorm is going to equal request.form dorm.

  • So again, this is a Python thing meets SQL thing.

  • In our SQL library, as implemented by the CS50 execute function,

  • we have this SQL statement.

  • Insert into Registrants name, dorm, values, colon name colon dorm.

  • CS50's execute method, like many libraries out there,

  • will recognize any words that you have written in a SQL statement that

  • begin with a colon, and will then proceed

  • to substitute in any of the named parameters you provide thereafter.

  • So I'm saying, hey, library, give plugin a name of request.form name.

  • So whatever came from the user's HTTP request.

  • And for dorm, plug in whatever came in the dorm field as well.

  • So those were from the HTML forms that the user submitted in order

  • to register for this sport.

  • After they've done that, let's go ahead and do this.

  • Return, render, template, success.

  • And this time I'm going to change the success message to actually be,

  • yes, you really did indeed register, or the spirit thereof.

  • So let me do this.

  • Let me go into this Frosh IMs 2 directory.

  • And as you might have done for recent problems,

  • let me go ahead and run Flask run.

  • And we have since tweaked the configuration of the IDE

  • so that you no longer have to specify a host of 0.0.0.0 or port of 8080.

  • Those will just be assumed by default. You can override them if need be.

  • Now, I believe my application is running.

  • So let me go over to my web server.

  • And indeed, this is the form we saw last week.

  • Super simple, super ugly, super HTML.

  • Looks like HTML1, actually HTML5.

  • But let me go ahead now and prepare a database for this.

  • In another terminal window, I'm going to go ahead.

  • And in my source 9 directory as well, open up FroshIMs2.db.

  • Which if I open it up in phpLiteAdmin already looks like this, Frosh IMs 2.

  • But there's no table in there yet.

  • So I care only for now about names and dorms.

  • But I've learned my lesson.

  • I'm going to care about IDs too.

  • I'm going to go ahead and create registrants with three fields,

  • much like we conjectured earlier.

  • An ID, a name, and a dorm, first of which

  • shall be an integer, primary key, auto increment.

  • Next of which will be text, dorm of which will be text.

  • I don't want any of those to be null.

  • They don't need to be auto incremented or primary keys.

  • So let me go ahead and create.

  • If I return now to my Browse Structure, the table is empty.

  • Now let's go to the web-based application.

  • Here we go.

  • Let me go ahead and have a new name altogether.

  • Maria wants to register.

  • And Maria is from Stoughton.

  • Register.

  • You are registered.

  • Really?

  • So we've not invested heavily in the aesthetics of this website just yet,

  • but really I have been registered.

  • If I go back to phpLiteAdmin, click on Browse now, notice Maria from Stoughton

  • is actually now in my database.

  • Wait a minute, maybe I'm kind of cheating.

  • Maybe she was already there somehow.

  • Let's do this again.

  • Let me go ahead and hit back.

  • Let me go ahead here now and say Andy.

  • And let's pretend Andy is visiting someone in Weld.

  • And click Register.

  • You're registered really.

  • Let's go back to phpLiteAdmin.

  • Click Browse.

  • And now Andy is in there as well.

  • So now we're not just storing things on disk so to speak with CSV files.

  • Now we're actually interacting with the SQL database.

  • And we did so by using CS50's execute method inside of its SQL database

  • library.

  • But notice how relatively simple it was.

  • We simply used the same syntax with which

  • we've been playing around, either the command line or phpLiteAdmin,

  • but now doing it an actual Python code.

  • So now we literally have the ability programmatically to create data,

  • to update data, delete data, or select data, ultimately, from a database.

  • So let's do exactly that.

  • Let's take things up a notch and do even more than we did last time.

  • You know what?

  • Let me go ahead now and create another route for slash registrants.

  • Suppose I want to make available a web page that shows

  • me everyone who has in fact registered.

  • So this is a feature we haven't even had yet, though we could have.

  • At least with CSVs, we would have had to open up the CSV and iterate over it.

  • Now we'll do that even more simply with the registrants method here

  • that has rows gets db.execute, select star from registrants.

  • So I know from our earlier experimentation,

  • select star means get everything.

  • From registrants means from that table.

  • db.execute is just the CS50 method that's going to execute the SQL

  • and return to you-- what?

  • I didn't care about a return value earlier, although technically, I

  • could have gotten back a value, as you'll see in the documentation.

  • But selecting star from registrants.

  • What do I want back?

  • Well earlier, I proposed that a database table, like a spreadsheet,

  • really is just a list of dictionaries.

  • And indeed, that's exactly what the CS50 execute method gives you back.

  • It will return to you if you've used a select a Python list each

  • of whose elements is a Python dict object, which

  • means you have access, which each of those

  • rows to the field name or column name, the so-called key in a dictionary,

  • and the value, the cell in that table.

  • So now that I have these rows, what can I actually do with them?

  • Well, I'm going to go ahead and render a new template, registrants.html,

  • and I'm going to pass in as you might have

  • for a past problem all of my registrants by passing in all of these rows.

  • So it turns out, templates can be parametrized such

  • that I don't just have to spit out some hard-coded registrants.html file.

  • I can pass in a key of registrants or call whatever I want,

  • and the value that I just got back from the database.

  • So that I'm now handing to my templating language, Ginga all of these rows.

  • So that suggests that in my template, my so-called view more generally in MVC,

  • my view can iterate over those rows and spit out every one of my registrants.

  • Let's go ahead and do this.

  • Let me go ahead into registrants.html, which will be a new file.

  • So let me create a new file here called registrants.html.

  • And let me go ahead and make sure that as are others

  • extends layout.html so that it looks just like everything else.

  • And then let me go ahead and just give it a block title up here,

  • so that its 2 is consistent.

  • This will be called registrants.

  • And now I do nblock.

  • So again, this is the Ginga templating language.

  • This is not a Python per se, not SQL per se.

  • It's just really for rendering a viewer or the aesthetics of my site's.

  • Block body.

  • And now in here is going to go nblock.

  • So the question is, how in this new template file

  • do I spit out like a list of registered students?

  • Well, list.

  • I know from some HTML back from week six,

  • I can give myself an unordered list.

  • So just a bulleted list like that.

  • And now I know I can spit out list items to put values

  • next to those bulleted lists.

  • But I don't know how many list items to output yet.

  • But wait a minute, passed into this template

  • was a key called registrants whose value is rows,

  • the list I got back from my database.

  • So you know what?

  • It turns out that in Ginga, you can execute

  • essentially Python code that looks like this for registrant and registrants.

  • And then down here, let me do N 4.

  • So slightly new syntax here.

  • So no colon here we're doing 4, and the opposite of it I N4.

  • I can now do list item.

  • And next to that list item, you know what?

  • I'm going to do it and we've seen this syntax before.

  • Registrant.name from registrant.dorm.

  • OK, this looks very strange so what's going on?

  • First line just means we're inheriting from layout.html.

  • So the whole page is going to just plug in values into that template, namely

  • a title and a body.

  • Title's uninteresting.

  • It's just going to be registrants.

  • Body gets interesting, but notice it's not hard-coded HTML anymore.

  • We've got an open UL tag and a close UL.

  • So this means hey browser, here comes the start of unordered,

  • the end of an unordered list.

  • But notice that my template or more generally the view in my software

  • is now going to use a Python-like loop here.

  • But again, no colon.

  • You have these special tags with the curly braces and the percent signs.

  • And four is the opposite and now notice what

  • I want to do inside of this loop I want to output literally

  • open bracket LI close bracket.

  • And then eventually, open bracket slash LI closed bracket.

  • And then dynamically, on each iteration of this loop,

  • I want to output a registrant's name and a registrant's dorm

  • and just grammatically say from in-between, so it's David from Matthews

  • and some Zamyla from Currior and Rob from Thayer and so.

  • And the only thing that's passed in is this thing here.

  • Remember the named arguments, the named parameters that I

  • passed into my render template method.

  • It was registrants equals rows.

  • This is really equivalent to iterating over the rows that

  • came back from my database.

  • In fact, if you want to be even more l I don't

  • have to call registrants I could just say rows equals rows.

  • So I'm literally passing in my database rows, and then in my template here,

  • I could do for row in rows row.name, registrant, row.dorm.

  • Just semantically, I thought it would be a little more intuitive if I actually

  • say what these things are and not just generically refer to them

  • as rows from my table.

  • All l so let's see this now.

  • Here that web form.

  • Let me now go to slash registrants, enter.

  • And oh my god, a bulleted list.

  • This may be-- I want to prove that this is working.

  • Let me go back here.

  • Let me go ahead and register Stelios now,

  • who is currently living in Canaday, register.

  • OK, Let me go to slash registrants again.

  • Reload.

  • Stelios from Canaday.

  • Very interesting.

  • But this isn't where we need to stop.

  • Let me add one other feature.

  • Lets get that Stelios out of there.

  • I suppose that it wasn't quite making it on the team

  • here so you know what, we want to have one other route.

  • Let's have an unregisterred function if Stelios wants to bow out after all

  • and focus on something else.

  • So apt out route slash unregister.

  • You know what?

  • This is going to support multiple methods.

  • So by default, it's just get.

  • But I want it to be not just get but get and post.

  • And then in here I'm going to have def unregister as the name of my method.

  • Again, it could be anything.

  • But you should be consistent, I would say.

  • And now I'm going to have two conditions.

  • So this slash unregister page is visited via gets.

  • You just go to this URL l.

  • Then I just want to see a same list of registrants

  • but with some kind of form by which I can delete them from my database,

  • by which they or I can unregister them.

  • Meanwhile, I want to go ahead and if post,

  • I want to acually do the deletion.

  • So let me try this.

  • If request.method equals equals gets, then Rose get's d.b.

  • Execute, select star from registrants.

  • And then I'm going to go ahead and return the templates, render template,

  • unregister.hteml.

  • And pass in those registrants as those rows.

  • Actually, c with.

  • Lith requests that method equals equals post.

  • What I want to do now, I want to do if request.form, quote unquote "ID."

  • So if there is, in fact, an ID passed in-- more on that in a moment--

  • I want to go ahead and execute-- hmm, where is this going?

  • Let me go ahead and see the template first.

  • I think we need to see this.

  • Let me go into froshims2.

  • And I whipped this one up in advance.

  • unregister.html-- let's go ahead and do this.

  • So on registrants.html, I just spit out an unordered list

  • of all the registrants.

  • In unregister.html, I'm going to add a little bit more.

  • Inside of each of those list items, I'm going

  • to have a forms input field-- input element-- whose name is going to be ID.

  • I've just hard-coded that, because this is

  • going to represent the ID of the student I want to unregister.

  • The type of this input is going to be radio.

  • So these are mutually exclusive circles that you can essentially toggle.

  • Value of this is going to be whatever this current registrant's ID is.

  • And then I'm going to go ahead and put this registrant's name

  • and dorm just as before.

  • So we'll see what this looks like in a moment.

  • But notice I've also added two other lines up here.

  • I've specified I'm inside of a form, the URL for which is unregister.

  • So we've seen this before. url_for() is just a function that comes with Flask

  • that helps you dynamically figure out what the actual URL should be

  • for the method called unregister().

  • So that's why, if you adopt some nice naming conventions,

  • you can use tricks like this, and Flask will just figure out to what URL

  • this form should be submitted.

  • And the method I'm going to use is going to be POST.

  • And then down here, notice I have a Submit button.

  • Type equals submit.

  • Value equals unregister.

  • So what does this look like?

  • Well, let's see that.

  • If I go ahead and visit /unregister, I see an ordered list, just like before.

  • And if I played with CSS, I could even get rid of these bullets

  • altogether, each of which has a radio button next to it, one of which

  • I can click in order to delete this user from my database.

  • But how does this work?

  • Let me go ahead in Chrome now, and inspect, and look

  • under elements at one such row.

  • And notice the HTML that Python, and in turn, my Jinja template

  • has dynamically output it for me.

  • Here's my page's HTML.

  • If I expand this list item, I see input name equals ID, and I hard-coded that.

  • Type equals radio, I hard-coded that.

  • Value equals-- this was registrant.id.

  • So in this Jinja for loop where I'm spitting out one registrant at a time,

  • this is where I was dynamically spinning out

  • Maria, and Andi, and now Stelios' ID.

  • So if I expand, the LIs above will see that, ahh, Andi has a value of 2,

  • because her ID is 2.

  • And Maria has an ID of 1, because her ID was 1.

  • Meanwhile, Stelios from Canada is just text.

  • So if I select him and then click this button,

  • we need to be able to handle that.

  • So what do I want to do?

  • Let me go back into CS50 IDE, into application.py.

  • And here we go.

  • If the form was submitted via POST, as by clicking that Submit button,

  • and if there is indeed an ID in the form--

  • so the user actually did select one of those radio buttons,

  • and therefore, there's some actual work to do-- DELETE FROM registrants

  • WHERE ID equals colon ID, where, again, colon ID is just my placeholder.

  • The value I want to plug in for ID is request.form quote unquote "ID."

  • Where did that come from?

  • Again, if we go back to my HTML, notice that these radio buttons are all

  • called "ID."

  • And because they are mutually exclusive by definition of a radio button,

  • only one ID will be submitted if one of these boxes is checked,

  • one of these circles is checked.

  • And that value submitted will be 1, 2, or 3.

  • So to confirm, here's our page with unregister,

  • and all three students are still in there.

  • Here is /registrants.

  • All three students are there.

  • Although this is uneditable, let's go ahead

  • and unregister Stelios by clicking on Register.

  • And it would seem that he's indeed gone from the bulleted list.

  • Let's go ahead and check phpLiteAdmin.

  • Let's go ahead and click Browse.

  • And Stelios is now gone.

  • So this is where everything's finally starting to come together, right?

  • In week 6, we talked about HTTP, and parameters,

  • and how HTML and CSS worked.

  • But it was largely static and hard-coded,

  • and we were just playing around with fake Google

  • and implementing our own front end.

  • But now I'm using forms again, and using them not only

  • to create an interactive UI-- user interface-- for users,

  • I'm also now implementing the back end, the server,

  • the routes that are capable of getting those HTTP parameters'

  • values as with request.form-- or other mechanisms if they come in via GET,

  • a different syntax altogether.

  • I can get those values and then do something with them

  • by combining those values with SQL code.

  • And so the last line here that I actually

  • should have included for good measure-- I simply

  • ran the code that I had pre-written in advance so that I didn't mess up.

  • Let me go ahead and do this.

  • This is the one line that was technically there

  • when I just ran that code even though you didn't see it until just now.

  • After all this, I have decided, just to keep the UI pretty simple,

  • after trying to delete someone, just go ahead and redirect,

  • not to the unregister page again, but to the registrants.

  • And this is why, after deleting Stelios, I immediately

  • saw a new bulleted list with just Marie and Andi,

  • because I redirected the user to the route for registrants, a.k.a.

  • /registrants.

  • So again, this is where everything's coming together.

  • And it's a lot to absorb all at once.

  • Because, my god, we had HTML and CSS.

  • Then we introduced Python.

  • Now we introduced SQL.

  • Then, we have Jinja and the templating language.

  • And now all of this comes together.

  • But again, if you go back in diving into all

  • of this, first principles and the definitions of each of these, HTML,

  • it's just the markup language that lets us lay out and format a web page.

  • HTTP is just the language-- or the protocol,

  • technically-- via which web browsers and servers intercommunicate.

  • Python, of course, is a higher-level language.

  • It's an alternative to C.

  • And it seems to come with a whole bunch of useful functionality

  • that, thanks to frameworks, or micro-frameworks like Flask,

  • make it relatively easy to get real work done with relatively few lines of code.

  • There's a learning curve, to be sure.

  • But you know, this is kind of impressive, that with just a dozen

  • or two lines of code, I've implemented the beginnings

  • of a web-based application by which students

  • can register for sports, unregister for sports,

  • see who's registered for sports.

  • You know, I might just need to add some logins, and a few other features,

  • and definitely, some prettier aesthetics.

  • But that's a lot of functionality packed into just a few lines.

  • And now that we have SQL today and we have a function like db.execute() that

  • allows me to execute SQL inside of my Python code,

  • now we have the ability to store data long-term, to access it, search it.

  • And we're just using small data sets.

  • I could store thousands, tens of thousands of rows

  • and use these same principles, especially

  • for data science applications, analytics, analyzing corpuses of text.

  • So many possible applications now.

  • And you know what, if we will go just one level deeper,

  • it turns out that while having programming chops with SQL,

  • and knowing SELECT, and INSERT, and DELETE, and JOIN, and CREATE,

  • and all of the various keywords we've started to play with today

  • and scratched the surface of-- super useful and super powerful

  • when you want to analyze your own data, or your own company's

  • data, or your own thesis' data, or the like--

  • it turns out that eventually, you even outgrow that level of interest

  • typically.

  • And an additional layer of abstraction is often helpful.

  • And so another feature you get with frameworks

  • like Flask is what are called models-- literally, models.

  • So you recall that we've been talking about,

  • in general, MVC, whereby we have models, and views, and controllers.

  • Well, in this model, we have been interacting

  • with our data via low-level SQL.

  • It's new, for sure, today.

  • But it turns out that once you get comfortable with SQL,

  • and so long as you adopt certain conventions of giving

  • all of your tables an ID field-- and if you have foreign keys,

  • it's something, underscore, ID-- where you generally

  • adhere to certain conventions and adhere to a framework's requirements,

  • it turns out you can do things like this.

  • In froshims3, we have essentially the same files,

  • except that we've changed the application.py

  • to be a little bit different.

  • We're not using any of CS50's package or module now,

  • so we're not using db.execute().

  • And in fact, you don't technically need to use that to access SQL.

  • Our single-function execute just makes it much easier

  • to get back lists of dictionaries as opposed

  • to executing multiple lines of code as you could do with a library called

  • SQLAlchemy or Postgres' own-- or rather, SQLite's own-- driver

  • in the world of Python.

  • But I'm going to add a few lines here using another library called

  • Flask-SQLAlchemy, which was pre-installed, or will

  • be pre-installed, for you in CS50 IDE.

  • There's a few lines that I had to copy and paste earlier,

  • from the documentation, to get it to work right.

  • But notice, this is, perhaps, the familiar line.

  • Instead of froshims2, it's now forshims3.db.

  • But notice, at the end, I get another db object.

  • It's not CS50's.

  • It now belongs to the author-- or it was created

  • by the author-- of this library.

  • But it turns out you can do some pretty cool things as follows.

  • Recall, from last week, that Python supports classes much like C

  • supports structures.

  • And inside of classes can go properties, or pieces of data,

  • as well as methods, or functions.

  • So it turns out that we can define, using Flask

  • and using libraries or frameworks like it, what's called an Object Relational

  • Mapper, or ORM.

  • And this is just a fancy way of saying, if you

  • don't want to think about your data as rows and columns,

  • which we have been all of today, ultimately,

  • whether in spreadsheet form or database form,

  • you'd really like to think of a registrant for a freshman intramural

  • sport as an entity, as an object of some class, well, you can do that.

  • You can declare a class called registrant

  • and have it extend the db.Model.

  • So this is another class that comes with Flask that we are now inheriting from,

  • so to speak.

  • So this is truly now object oriented programming.

  • We are specifying, via __tablename, that the SQL table to which this class

  • should map is going to be called registrants.

  • And ultimately, this class, registrants, when it is instantiated

  • is going to give me an object that represents a row in that table.

  • And this object is going to have an ID, a name, and a dorm,

  • as we've been doing.

  • And notice here, using SQLAlchemy-- so using this library from Python--

  • I am declaring a column called ID that's going to be of type integer.

  • And yes, it's true that it's the primary key, whereas name and dorm are just

  • going to be text.

  • Because what you can do with SQLAlchemy, and with ORMs,

  • more generally, is you can specify, in Python code, or whatever language,

  • what your database looks like and what your data therein

  • looks like without actually writing raw SQL queries so to speak.

  • So even though we've just introduced SQL syntax,

  • you can eventually take off that layer altogether and build

  • on top of it using objects like this.

  • And now, as an aside, this is a constructor or an initialization method

  • that you might recall from past problems.

  • But let's focus, a little later in the code,

  • on why this is actually compelling.

  • If we scroll down later in the code, like, to my register route,

  • there is no SQL in this implementation of froshims3.

  • 3

  • The first few lines are the same.

  • Indeed, only once we get down to this line here

  • do we have registrant as a variable, registrant as the class name.

  • And we're passing in, apparently, the name and the dorm

  • that came from the HTTP request.

  • And we're passing those into the registrant class.

  • If you recall how classes worked, if they have an init() method,

  • you can pass in some default values, name and dorm in this case.

  • And that's how we are creating a registrant object of type registrant.

  • And now notice what we can do here, db.sessios.add().

  • So this adds to my database sessions, so to speak,

  • another feature you get from this particular library.

  • We're going to add that registrant.

  • And then wonderfully, we're going to commit that registrant.

  • In other words, we have created a variable in memory-- specifically,

  • called registrant-- who's type, who's data type, is Registrant, capital R.

  • And that's simply a class, inside of which is name and dorm.

  • There's no ID yet.

  • But what's really cool about this ORM is that when you call add and then commit,

  • that's like putting it in a database and then hitting Save.

  • And the database, because of the code we find up here,

  • is going to automatically insert that ID for us.

  • And meanwhile, this object now, registrant, is actually going to have,

  • inside of it, the ID that was stored in the database.

  • So I don't have to worry about insertions.

  • I don't have to worry about any updates or deletes.

  • I can interact with my data now, completely at a higher level, in Python

  • alone and leave it to the ORM-- SQLAlchemy

  • here-- to actually do the creation of the SQL statements.

  • And if you look later in here, if you'd like to play around,

  • you'll see that we've rewritten registrants and unregister as well

  • to use SQLAlchemy as opposed to raw SQL queries.

  • And for instance, here is how you can get all the registrants

  • in your database if using an ORM.

  • Instead of doing SELECT* FROM registrants, you can just say,

  • hey registrant class, give me a query for all of my data.

  • And what you get back is a whole bunch of rows, which, as before,

  • we can pass into our template.

  • Down here, meanwhile, we can request all of those rows

  • again and pass them into that template.

  • Or here, notice what we can do.

  • If we've been passed the ID of a registrant

  • like Stelios', we can say, hey registrant class, give me a query,

  • but filter that query so that the registrant ID I care about equals

  • the one I was passed via HTTP.

  • Oh, and by the way, once you find that in my database,

  • call the dot delete method to just get rid of Stelios from the database.

  • So again, we might not necessarily be solving a problem

  • or scratching an itch just yet, especially since SQL itself

  • is, odds are, quite new to you.

  • And so we've already solved the problem in one way.

  • Here's another way to solve it.

  • But realize that, eventually, it's fair to say that you find

  • writing SQL queries sometimes tedious.

  • Though frankly, you'll get a lot more control, and potentially,

  • more performance out of them if writing them yourself.

  • And so having the best of both worlds is perhaps the best takeaway

  • here-- actually understanding what's going on underneath the hood,

  • as was the entire point of our spending so much time in C,

  • and understanding how you can execute SQL queries,

  • but realizing, down the road, especially if you find that, wow, it's really

  • getting a little slow to write all these low level SQL queries,

  • wouldn't it be nice to just create my database schema

  • as I did earlier with phpLiteAdmin or at the command line

  • and then let my library code figure out how

  • to get data in and out for me, albeit perhaps at a performance

  • penalty, that's a nice place to get to.

  • So again, even now that we're in week 9, and we've

  • abstracted so far away from week 0s and 1s,

  • is there still this progression and this onward march of abstraction

  • as the world gets more and more familiar with solving problems and starts

  • to realize best designs for doing so?

  • But it's not all fine, and good, and safe.

  • In fact, let's make note of perhaps one of the most tragically common mistakes

  • people make when using SQL.

  • And indeed, one of the reasons to use things

  • like libraries like CS50's library or even higher-level, fancier

  • libraries like SQLAlchemy, is to avoid these kinds of threats.

  • And yet many people and many sites still suffer

  • from what are called SQL injection attacks, for instance.

  • So what does this mean?

  • Well, probably, a few times a week, you log in with your Yale NetID

  • or with your HarvardKey, which gives you forms like this or like this.

  • And ultimately, you're providing.

  • simply, a username and a password.

  • But suppose that, for the sake of discussion,

  • the HarvardKey system were implemented on the back end with Python using SQL.

  • And how, then, do we implement logins?

  • Well, when I give Harvard or Yale my login name or my NetID

  • and then my password, well, what are they doing?

  • They probably have, each of them, a users

  • table if they're using SQL, whether it's a SQLite, or Oracle, or MySQL,

  • or Postgres, or whatever.

  • And they probably have written code somewhere in that login site that says

  • SELECT* FROM users WHERE username equals whatever they typed in AND password

  • equals whatever they typed in.

  • And if that gives you back a row representing David, for instance,

  • then you know that he or she has logged in correctly,

  • because you wouldn't have found the row if the username and password weren't

  • in the database.

  • Now, it turns out fancier things are done with the password.

  • You probably don't want to store users' passwords in plain text,

  • so to speak, borrowing language from week 2.

  • Rather, you want to store ciphertext or some kind of hashed value

  • so that even if, in the worst case, your database is compromised or stolen,

  • no one in the real world actually sees your users' passwords but only

  • some cryptic-looking hashes thereof, which at least raises

  • the bar to exploiting your account.

  • But if they're using SQL-- this is what worries me-- they are, at some point,

  • taking what I, a human, typed in-- hopefully a good guy,

  • but could be a bad guy, typed in-- to their website

  • and plugging it into a SQL query.

  • Because they're not just going to necessarily do SELECT* FROM users.

  • They might actually say, SELECT* FROM users WHERE username equals such

  • and such AND password equals such and such,

  • or at least one of those predicates.

  • So what if you've done things poorly in code?

  • And suppose that Harvard had implemented it in such a way

  • that this simple-looking, stupid-looking query is actually a really big threat?

  • So I've temporarily turned off the bullets

  • that you would normally see in a password form.

  • This is not hard.

  • It's just an HTML thing.

  • And suppose that my email addresses is me@examplemailprovider.com.

  • And suppose, for my password, I don't type in 12345,

  • or whatever my actual password is.

  • I type in, cryptically, ' OR ''1='1.

  • Why this?

  • And there's an infinite number of things I

  • could type if I am aggressively trying to hack into Harvard or Yale's website.

  • But notice this feels like it's part of a logical query.

  • It turns out SQL has OR.

  • We haven't seen that before, but like Python, it literally

  • has the keyword "or."

  • And I am assuming, in this case, per some of the examples,

  • that maybe the programmer at Harvard or Yale

  • has single quotes in his or her code.

  • And maybe they are just foolishly, and very riskily, plugging in what I type

  • in between those quotes.

  • So notice, this is kind of the end of a quote.

  • This is the beginning of a quote, but I've not finished the thought there.

  • So let's see what happens.

  • Suppose that the code on the back end at Harvard or Yale--

  • we're in Python-- this.

  • So somewhere in their files, username variable gets request.form username.

  • Password gets request.form password.

  • So just two variables called username and password,

  • just so I have them handy.

  • Maybe they're using CS50's library function.

  • db.execute( SELECT* FROM users-- and this happens to wrap on two lines,

  • but ignore the extra space-- WHERE username equals {} AND password equals

  • {}).

  • So just like we did last week when printing things in a formatted fashion,

  • this was like Python's equivalent of print diff

  • using the format method of the string class passing in username and password.

  • But this worries me.

  • Because if the programmer at Harvard or Yale has literally given me these

  • placeholders of '{}', whatever I typed in as my username and my password is

  • literally going to go there and there.

  • But what if-- oop, oop, oop, oop, typo.

  • The whole story breaks without this.

  • And there-- but what if the user types in that cryptic-looking string?

  • Well, then, what ends up happening is this.

  • And it's red because red is bad.

  • SELECT* star FROM users WHERE username name equals quote, unquote,

  • me@examplemailprovider.com-- no big deal-- AND password equals--

  • this is interesting.

  • Previously, there was a single quote and a single quote,

  • and then the curly braces in between.

  • But underlined here is what I, the adversary, typed in.

  • Cryptically, 'OR'1'=1', that's it.

  • Notice where the underlining starts and ends.

  • But I seem to have maliciously finished that programmer's

  • thought at Harvard or Yale.

  • I've not finished it in the way they intended.

  • But this is syntactically and semantically correct, if a bit strange.

  • I am essentially saying, if the password equals nothing or 1 equals 1,

  • because I've plugged in those characters in such a way

  • that they still make a WHERE clause syntactically accurate.

  • 1 equals 1 always.

  • So this is like saying, SELECT* FROM users WHERE username equals

  • me@examplemailprovider and whatever else is the case.

  • Like, that is always true, that 1 equals 1.

  • So it doesn't even matter what the user's password is

  • or that I didn't even type it in.

  • I just did quote, unquote.

  • All right, well, what if maybe-- OK, so format may be bad.

  • Curly braces, like we've been doing in Python for string formatting,

  • maybe that's bad.

  • Python also has, like Java and JavaScript,

  • the ability to concatenate things together, in this case,

  • using the plus operator.

  • So what if I just go old school and concatenate my strings together

  • like this?

  • Maybe that's better, noticing single quote here, single quote here.

  • And the double quotes are just stopping the string

  • while we do this concatenation there and there.

  • It's the exact same problem.

  • So those instincts don't serve us well at all.

  • In the end, we still get, username equals

  • whatever I typed in AND password equals this WHERE 1, indeed, again, equals 1.

  • So no matter what, this query is going to return a row if that email

  • address is in there, probably resulting in the Harvard or Yale code logging

  • this malicious user in.

  • So there's got to be a better way.

  • And indeed, the reason to use libraries in general,

  • whether it's CS50's for the next couple of weeks,

  • or in the real world, any of dozens of SQL libraries,

  • is that other people before you have thought through these threats,

  • have written the requisite code, which isn't all that much,

  • to notice when there's dangerous queries being injected

  • and escape them properly.

  • So here's how we would do this with the CS50 db.execute() method.

  • SELECT* FROM users WHERE username equals colon username AND password equals

  • colon password-- sorry for the type there.

  • I'll fix that.

  • Bad with spacing today-- AND password equals

  • colon password, unquote, passing in username, passing in password.

  • And again, you don't repeat the colons here.

  • But the key and the key in those named parameters lines up with the key

  • and the key that do have the colons.

  • And the way the CS50 library works-- and there's not much

  • going on in the CS50 library.

  • There's not much of a training wheel there--

  • we essentially are simply wrapping that other library I referred to,

  • SQLAlchemy, which gives you not only the ORM feature, the Object Relational

  • Mapper feature, where you can create your own classes like registrant,

  • they also let you, in that library, execute raw SQL.

  • And they also take care of all of this escaping.

  • So we, the CS50 library, are really just taking your queries,

  • passing them to the SQLAlchemy library via raw SQL, getting back the results,

  • and just neatening them up, and returning to you only

  • a list with dictionaries inside without expecting

  • you to execute multiple lines of code.

  • And now, in green, is the expected solution here.

  • Because of the way the CS50 library works, and in turn, the SQLAlchemy

  • library works, even if an adversary types in funky syntax

  • like those single quotes trying to trick your database or your Python code

  • into executing something malicious, notice what the library has done.

  • These backslashes were not in the user's adversarial input.

  • They were not in the previous red problematic examples.

  • What the CS50 execute() method does for you,

  • what the SQLAlchemy library does for you is looks at user input that's been

  • plugged in for named parameters and says, whoa-ho, wait a minute.

  • If there's a single quote in there, let me escape it with a backslash

  • so that the only actual single quotes are the outermost ones.

  • And only if the user's password is that thing there that's underlined will

  • they actually get in.

  • And most likely, it's not going to be something as crazy as that.

  • But in light of all this, do you perhaps now appreciate

  • what this particular person, perhaps with a little too much free time,

  • was trying to do?

  • So parked in a parking lot here is this fellow's plate here.

  • And this person had taken the time to print out something a little curious.

  • Let's enhance.

  • What the heck is going on there?

  • Well, it turns out that while we've been focusing on the logic of 1 equaling 1,

  • the real takeaway of SQL injection attacks

  • is that if you can somehow trick a database into executing a line of code

  • that you have written-- previously, the only line was, quote, unquote,

  • 1 equals 1 or what not.

  • But suppose that that adversary-- suppose

  • I had included a semi-colon in my username

  • or a semi-colon in my password.

  • And you, the programmer, naively trusted what I was typing in.

  • And you simply executed whatever I typed into my username or my password field,

  • allowing me to have a semi-colon in there.

  • And that semi-colon, as you know, ends one SQL statement and begins a new one.

  • Suppose, god forbid, like this person here tried

  • to finish his or her license plate in a SQL-like way,

  • with a quote here, and commas, whatever those mean, and a semi-colon,

  • but then also included a valid SQL command like DROP DATABASE TABLE, which

  • is the only thing I cautioned about, earlier,

  • being especially bad-- this person was apparently trying to use those traffic

  • cameras, which, he or she surmised, might

  • have been using SQL as the back end and storing people's license

  • plates in that back end and, correctly or incorrectly,

  • was hoping that, upon a camera seeing this,

  • using Optical Character Recognition, OCR, converting this into text,

  • passing that text into a database that might not be scrubbing

  • or sanitizing its inputs as we've proposed

  • with CS50's library or SQLAlchemy-- was hoping-- that maybe that back end

  • database was poorly implemented enough to trust

  • what was passed in so that after logging this person for speeding

  • or whatever, actually dropped the entire database, covering

  • his or her tracks entirely.

  • So all this and more is ahead of us as we continue to build, and build,

  • and build on top of lessons past.

  • And next week, when we introduce one final language, JavaScript, a language

  • that you can not only use on the server side

  • but also, and especially, on the client side

  • to create all the more of an interactive experience

  • and all the more of a compelling user experience

  • for users using a bit of Python, and SQL, and HTML, and CSS,

  • and soon, now, JavaScript, we'll see you then.

  • [MUSIC PLAYING]

  • SPEAKER 1: Rosebud-- yeah, actually, now that you mention it, one time, I went

  • into his office to look for some forms.

  • [KNOCKING ON DOOR]

  • SPEAKER 2: Bud?

  • David always said "pal."

[MUSIC PLAYING]

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級 美國腔

2016年CS50--第9周--SQL (CS50 2016 - Week 9 - SQL)

  • 14 6
    小克 發佈於 2021 年 01 月 14 日
影片單字