字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] SPEAKER: All right. This is CS50, and this is week nine. You'll recall that last time, we took a look back at such things as Scratch. Because recall that when we introduced Scratch, we introduced a whole number of programming constructs-- loops and conditions and functions and variables and more. And then just one week later did we transition from that world of Scratch to C, where the syntax was much more cryptic looking, certainly at first glance, and perhaps a little bit still. But the ideas were ultimately the same. And now in week nine as we transition from the world of C to Python, you'll find that that same finding is the case, whereby we are using the same ideas. We're leveraging the same concepts. But we have to translate it now to a slightly different domain and a slightly different syntax. But things really start to get interesting this week and beyond, especially as we build on our ability to write Python code, our ability to serve up web applications. Because now we can begin to leverage so much more functionality than comes with C, than comes with the CS50 library, because there's such a large community of software developers who have created some really amazing things that we can try out. So let's look further today at Python, and let's look further at the applications for a language like this. You'll recall that perhaps the simplest program we could have written last time was actually just one line-- print Hello World. But you'll recall as you began to dive into some of this past week's challenges, you might have needed or wanted to actually start wrapping code like that in a method or function like main, and then calling it by default with this magical incantation here. But this was the building block toward which we were starting to develop more interesting programs. But now we're going to really context switch from a command line environment over to a web-based environment. And the world has been writing web-based applications for quite some time. Even I 20 years ago recall made that Frosh IMs-- freshman intramural sports website. And even since then have the languages changed and the paradigms changed, and we humans have learned a lot about programming for web-based applications. And this, for instance, is one design pattern or one architecture that has arisen, whereby MVC refers again to this pattern, whereby you put your intelligence, your so-called business logic in your controllers. This is one or more files that has all the conditions, a lot of the functionality calls, and actually does something with your program. Then you have the views, which are often templates or files that render information that you might have dynamically generated or input from users. And then model. And today, we start to dwell on the M in MVC, model, because last week, we didn't really leverage much, if any, of a model. But this week we're finally going to demand of ourselves the ability to save data, retrieve data, search data, delete data, and more. And we really haven't had this capability, other than very simple CSV files, for instance back in the day of C, and even last week when we dabbled with those in Python. And you'll recall, last time we introduced this. Flask is what's called a micro framework. So a bunch of files, a bunch of code that a community of people have written that just make it easier to make web-based applications. It's absolutely not required. In fact, one of the sample programs among last week's distribution code if you like to go back and play was a program called serve.py that doesn't use any of this. It just uses built-in Python functionality. But you'll find that it's pretty cryptic. It's pretty heavyweight in order just to do something simple. And so things like Flask have come around that just make it easy and dare say more pleasurable to write web-based applications, because people have realized in writing web app after web app after web app that they're just repeating themselves, or borrowing code they've written in the past. And so in frameworks, you have solutions typically to very common problems. So we'll use this set of solutions to the development of a web-based application. And for instance, the simplest Flask application that's also available from last week's source code might be something like this, whereby the top of your file, you don't import like the CS50 library, you instead import someone else's library, specifically from the Flask framework import-- what's called the class, recall, called Flask-- capital F. And then also a function or method called Render Template. And we used both of those as follows. Last week, we instantiated a Flask-based application by essentially passing in this special reference to the current file, the name of the current file, and then allowing the framework to do its thing and give us back this very special, very powerful object called app-- though we could have called it anything-- that gives us access to some useful functionality. For instance, the most useful functionality initially was just this-- @app. And this is just syntax for what's called a decorator, and more and that some other time, or more on that in Flask's own documentation. But essentially, this line here-- @app.route, says that hey server, any time you see an HTTP request for slash, the default web page typically of a website, go ahead and call the following function that's immediately below it. That function I've called Index, mostly by convention. But I could've called it anything I want. And all it did last week in this example was render a template. In this case, index.html. Now, that could be raw HTML recall. But oftentimes, you use something called a templating language. And indeed, we introduced a little bit of Ginga last time, which is a Python-based templating language that we'll see just makes it easier to generate HTML without having to write HTML inside of our actual Python code, which tends to be frowned upon. So let's take a look back at one of those examples, which I've renamed from Frosh IMs last week to Frosh IMs zero. And recall that we had the following files. We had a templates directory, inside of which was failure, index, layout, and success. Kind of a lot of complexity for a pretty simple program, but we'll see what each of those does again. And then application. And this was the so-called controller code that actually did something interesting last week. Now, what was that? Well, it's a pretty small program. As before, I've imported from the Flask framework a whole bunch of symbols here. I'm instantiating my application here. This is copy-paste from our simplest of examples, whereby if the user just visits Slash, I want to show him or her index.html. But then it got interesting. This really was the first time we had the capability in a web-based environment to respond dynamically to user's inputs based on whatever they typed into a web form. In fact, if you think back a few weeks in week six when we first introduced the web and HTTP and TCP/IP and making HTML-based web pages, you recall that all we did was implement Google's front end, and an ugly one at that. But just the HTML form, that if you click the Submit button, it submits to Slash search on google.com, because we pretty much deferred completely to them, lacking at the time a backend and lacking at the time even a language in which we could implement our own backend, a web server that actually responds to those requests. But in Frosh IMs Zero, we have the ability to have our own route, in this case called Slash Register that I've specified isn't even going to respond to HTTP Get requests, but rather Post requests, which typically mean a form submission is sending one or more fields that you don't necessarily want to end up in the browser's URL or history. I'm calling my function Register, and this would be a good convention. Just make sure your function here lines up with what the route is there. And then I'm checking a couple of conditions. If that request's form-- that is, all of the parameters that were submitted via HTTP Post-- has a name field-- so a text field, for instance, called Name-- and that equals nothing-- quote unquote, the so-called empty string-- or, that request's form's dorm key has a value of quote unquote, which is to say that if the user, myself last week, did not give my name or my dorm, then go ahead and return the template called Failure.html. And we'll take a look back at that in a moment. Otherwise, render template Success.html if all in fact goes well. Now, if we take a look at Failure.html, it didn't do all that much. It extended Layout.html. It declared a title of Failure. And then it declared a body of You must provide your name and dorm as sort of an admonishment to the user. If I look at Success, meanwhile, it's pretty similar, but the text is different. But this is the problem we address this week. You are registered, well not really. Because recall-- and you could have seen a moment ago-- we did nothing with the user's name or dorm or any other information, we just pretended to actually register them. And then what did the form itself look like? Well, this page here is mostly HTML. But again, notice that even this page at the top defines a body block and a title block, because it's extending Layout.html. So this is the Ginga stuff that I referred to earlier, the templating language. And if we finally go into Layout.html, now you see the basic framework for every page in this web-based application. It's a pretty small application to be fair, but it does have at least three distinct pages-- Index, Failure, Success-- all of which are identical to the file, except that they each have a title and a custom body dynamically embedded thanks to how we're using templates. So this is a nice way of not having to copy and paste all of that code into every file, thereby making it a pain to update anything, to add CSS or JavaScript files, or generally any of the overall structure of the page. We can factor that all out to Layout.html. Of course we just threw all this information away. that name, that dorm, every student who's been registering by this app, we're forgetting about. And so at the tail end of last week, we introduced this solution recall, whereby in Frosh IMs One now, I'll call it-- or rather today let's redress that by borrowing an idea from last week that we didn't incorporate into Frosh IMs as follows. In Application.py for this next version called Frosh IMs One, what do I seem to be doing differently? File's almost identical to before. It's a little longer this route, because what I've grabbed is some of the code from last time, whereby we wrote out to a CSV file-- Comma Separated Value-- those student structures. We did this way back when in C. We then ported the student struct to a student class last week, and now I've borrowed that same code for saving those students to disk, so to speak, by embedding it into this route. So what's going on? So if again the name or the dorm are blank, I go ahead and just return the template called Failure.html and we bail out. Otherwise, I declare a variable called File. I open a la C's fopen function, registrants.csv, which I'm just calling this just because it's going to contain my registrants in a comma separated values file. And then quote unquote a. And you might not recall this or might not have seen this, but quote unquote a is for appending. If I instead and accidentally did w for writing, you might think that's correct. But every time you use quote unquote w, you're going to write out a new file. Which is to say, if 100 students registered for some intramural sport via this web app, and I was using quote unquote w, I would actually keep clobbering or overwriting the file, such that only the most recently registered student would appear in this file. So we want quote unquote a for append. Then I declare a variable called writer, and I'm using this CSV module that we get for free from Python that's going to allow me to create what we'll a writer, passing in that file as input. Then I'm going to call this special method Write Row, whose purpose in life is to take a tuple, which in this case has two elements, the name that was passed in via the form as well as the dorm. And then I'm going to go ahead and close the file. So Write Row is now the function that takes care of all the complexity-- it's not all that much complexity-- of printing out name, comma, dorm, name, comma, dorm. And just in case anything has a quote in it, like a someone O'Leary, or a name that has an apostrophe or some other punctuation symbol that might otherwise confuse a program reading this text file, which itself might contain quotes, this kind of function Write Row will take care of those kind of details, as well as commas that might correctly or incorrectly be in the values like name and dorm that I'm providing. And then we print out the success page. But this isn't all that useful of a file format. A CSV file is nice in that you can download it, you can open it in Excel or Apple Numbers or similar programs. You can import it into Google Spreadsheets. But it's not really a database. You can store data in it, but if you want to read any of that data, or change any of that data, you pretty much have to do what we've been doing in C, which is open it with Open or fopen or whatever, iterate over the lines, maybe parse them or read them into separate variables or into an array or a list or whatever, and then you can use binary search or linear search or whatever you want to actually find data, maybe to then change data, and then save it all back out. But this is just tedious. To have to do all that work simply to save data isn't all that much fun for programming, and it also doesn't scale very well. As soon as you have some good success with some web-based application or even some mobile application, it'd be nice if your code were as efficient and as fast as possible. And wouldn't it be nice if we could stand on the shoulders of others who have had similar problems of storing data efficiently so that we could learn from them as well and leverage some of their work? So thus was born SQL-- Structured Query Language-- which can be used in any number of environments. So SQL is just a language that we will now introduce. And it's a programming language, though it's not going to be as-- we're not going to use it as richly as we have C or Python. We're going to use it for a number of fairly basic, but nonetheless very powerful operations. And you can store data using this world of SQL in any number of ways. You can use things like MySQL-- which is a very popular database server that Facebook started with and still uses for some of its purposes-- PostgreSQL, Microsoft Access, or Oracle, or any number of third party products that you might have heard of that are super popular for web-based and business applications. And then there's also a format that's even simpler but gives us all of the same capabilities called SQLite. Whereas MySQL and Postgres and Oracle and Microsoft Access and the like typically require that you run some special program, a server, that's listening for requests and responding to requests and often has usernames and passwords, if you want something simpler because you're making a fairly small scale website that might only have hundreds or thousands or tens of thousands of users, and you're OK with all of your data living on the same server that your code lives on, maybe in the same folder, you can use something called SQLite, which allows us to use the language SQL that we're about to see, but it doesn't require the complexity of configuring a whole server or a whole set of tools. You can just store your data right there in a .db or .sqlite or whatever file right in your same directory. But how are we going to store that data? Well, many of you are probably familiar with tools like this-- Google Spreadsheets or Excel or Numbers, which allow you to store data in generally rows and columns. And if you're looking at a spreadsheet like this and you're storing people's names and dorms and email addresses and phone numbers, generally we humans use the top row for headers, and we'll put quote unquote name, quote unquote dorm, quote unquote email or phone or whatever. And then every row below that first row represents in this case a student, a record in our spreadsheet. So if I were to do this in reality-- let me go ahead and do just that. Name and dorm and maybe email and phone, and any number of other fields. And I'm going to go ahead and really be tidy here. And you can do these silly aesthetics in something like Google Spreadsheets and Excel and Numbers. But here's where I would do something like David and Matthews and Rob and Thayer, and dot dot dot, and we can fill in the rest of these rows, and we can just keep growing and growing and growing. And frankly, most of these programs are smart enough that even though I think Google Spreadsheets now goes to a default of 1,000, when you export that file or save it, they're only going to save rows presumably that actually have data up until that point. And then you can have any number of columns. You get A through I think Z probably by default. So 26 columns even if you need fewer, but you can create more as well. So this is nice. This itself is structured data. You have meta data, like name, dorm, email, phone, keys, if you will. And we've used that expression before in various CS contexts. And then you have values, and those values line up with those keys. And in fact, now if you think back to some of the features we've seen already in Python, a dictionary is a very popular and very useful data structure. In Python, a list or an array is another such one. And so really, if you think about what a spreadsheet is, it's kind of really just a list of rows, each of which is a dictionary. And what do I mean by that? Well, within every row, whether it's two or three or 1,000, you have columns. And those columns have keys or names to them. Name, dorm, email, phone. And recall that a dict or dictionary in Python is just a collection of key value pairs. So if this is just a list of values, and each of those values is organized horizontally essentially as dictionaries, key value pairs, where row three's name is Rob, row three's dorm is Thayer, and so forth. Well it seems that we could map this kind of format very cleanly to something like Python. Or C for that matter, but Python would be a lot more convenient. And sometimes, one sheet is not enough. So sometimes you might have gone down here to the bottom corner. You might have created something like sheet two. And maybe here, instead of just storing students, you might store professor. So the professor's name and office and course and any number of other pieces of data that you want to store about him or her. And we could store professors. And I'll even rename this from sheet two to Professors. And in sheet one, I'm going to go here and rename this to Students. And so we can really kind of organize our data cleanly. But that's about it. Even though Google Spreadsheets and Excel do have some programming functionality built in with macros or similar functions, it's not all that easy to query the data. It's not all that easy to integrate this kind of file into a program you're writing. And it's definitely not necessarily efficient. Because you've not really told Google Spreadsheets anything about how much data you're going to be putting in here, what the type of that data is beyond maybe the formatting thereof. And so there are some opportunities for better design. And thus came along SQL databases that give us not only the ability to store things in rows and columns, but also the ability to create data or create structures in memory, and insert data, select data, and update data, and delete data. In fact, a SQL database is really a specific instance of what's generally known as a relational database, a database that allows you to maintain relations among pieces of data, generally spanning sheets-- or as we're going to start calling them, tables-- that adheres to a silly acronym, CRUD, the ability to Create, Read, Update and Delete data. And those verbs map to such keywords or commands as these as we're about to see. So what does this actually mean? Well, let me go ahead and do this. Let me go ahead and within CS50 IDE, I'm going to open up a special web-based program that's simply going to give me the ability to create databases and create fields therein. I'm going to go ahead and create a database called Lecture.db. And I'm going to go ahead and open now a web-based tool via which I can administer this database called Lecture. And you're going to see that I have a tab called Structure, which is where I'm going to be able to define sheets, or tables as we'll call them. I can execute raw SQL commands, with which we'll now start to get familiar. And later on, I can even export or import data as well. But let's focus on these fields down here. So this tool is one of any number of dozens of tools that might exist that allow you to create and administer databases using SQL. I'm going to go ahead here and call my table Registrants. So this is like creating a new sheet in Google Spreadsheets. And the number fields-- for now, I'm just going to go ahead and have two, Name and Dorm. And now notice I get a little HTML form here, whose purpose in life is to make things a little easier for me. So I'm going to type in Name for one field, Dorm for another. And you'll notice that SQL databases now allow me to choose any number of data types. So we're sort of going back and forth here in between the world of strongly typed languages, weakly typed or loosely typed languages. Here it can actually matter for performance, and I'm going to indeed tell my database what kind of data I'm generally trying to store in its columns and each of its rows. So my options with SQLite, which is one incarnation of the SQL language, is Integer-- which means what it says-- Real-- which is like a real or floating point number as we've seen that generally can store up to 15 or so digits of precision. But even then, if it can't, it's going to use a text field automatically instead, because text is really just a string, otherwise known in some environments as var char or variable length characters. Blob is just going to be binary data. And this means we can store things that are just zeros and ones that aren't ASCII or Unicode text. Numeric is a little more flexible. If I want to store a dollar amount with dollars and cents. If I want to store maybe an integer, maybe a real number, I can use numeric, and just let the database-- SQLite in this case-- figure out what actual data type or affinity, so to speak, to use. Booleans is just going to be a zero or one. But that's really just a convenience. It's actually still going to use a full byte or 8 bits just to store a zero or to store a one, really just an integer. And then date/time, which underneath the hood can use any number of formats, whether it's text or integers or reals. I'll defer to the documentation. But that's going to allow me to in a standard way store things like dates and times so that we can actually record when stuff is happening, when someone registered for freshman intramural sports, and even do useful arithmetic operations on dates and times. So what should Name be? Given all that, there aren't really too many options. Indeed, my instincts are to go with text. So I'm going to do that for Name and for Dorm. And then I'm being prompted by this web form for a few different questions. So primary key. It turns out that among the features of a SQL database is the ability to specify that this field-- this column shall be my primary key, the field whose values uniquely identify all of the rows in my database. Now, it's not going to be relevant just yet for the following reason. Right now, it's quite possible that two Davids live in Matthews, or two Robs live in Thayer. And so if I said that Name or Dorm were my primary key, that would mean I can have only one David or one Rob or one Matthews or one Thayer in my database. So for now, we're going to leave this alone. That would seem to be a bad thing to do. Auto increment isn't going to be applicable, because you can't auto incremented a string or a text field. That's going to be germane to integers. And not null, which is my problem that this field cannot or shall not be null. So I will go ahead and do this and just promise that every student in Frosh IMs is going to have a name and a dorm. Meanwhile, you can specify a default value. I'm not going to bother doing that here, but you can see from the drop down that you can specify a certain value, like null, or you can specify the current date or time, if that's actually germane. But I don't want to give people generic names. And I don't want to just assume that people live in some default dorm. So I'm going to leave that alone and expect that the user of this database table is always going to give me a name and a dorm. So let me go ahead and click Create. And what you'll see is this. phpLiteAdmin is really just a handy web-based tool for executing SQL statements. So I don't necessarily have to remember all of the syntax up front. I can kind of learn from this actual tool. And what phpLiteAdmin has really done for me is this. It has in that file called Lecture.db executed this statement, Create Table quote unquote registrants, Name is text not null, Dorm is text not null, and that's it. So phpLiteAdmin is just saving me, at least for today's purposes, from having to remember exactly all that syntax. But after practice, it becomes pretty familiar. And certainly with Google, you can fill in any blanks. But what does this now mean? If I click on Return here and go back to not just the lecture where I was, but I click now on Registrants, my actual table, you'll see a bunch more tabs. The top one of which is Browse-- but this table is empty, because I've not inserted me or Rob or anyone else for that matter. But if I click on Structure, you can now see in a web-based environment a little reminder as to what this table looks like underneath the hood. It's more esoteric looking than something like Google Spreadsheets, where you just have columns and rows. Here we're being more precise. I have two fields, Name and Dorm. Each is of type text. Neither can be null. But neither has a default value, and neither is a primary key. But via the Edit link here and Delete, I can actually change those definitions, and I can even add more fields if I think of things later. And so this is allowing me to create really a table of information whereby these are my columns or fields Name and Dorm. So let's go ahead and insert for instance me in here. So I'm going to go ahead and insert David, who is in from Matthews, and click Insert. And notice what phpLiteAdmin did. It simply executed this SQL command. Insert into Registrants, Name,Dorm. So it's specifying what the two fields are that I want to insert into. And what values do I want to put? David,Matthews, not to be confused with Dave Matthews. So now, if I click Return and go back to the Browse tab, notice that we see Davids and Matthews. Meanwhile, if I click Insert Here, and I type something like Rob and Thayer, click Insert, a very similar SQL statement was executed. But this one customized for Rob. And now Return and see Browse, both of us are in there. But now let me start to take those training wheels off. It's not all that interesting just to click around in a web GUI. Let me actually learn something from this. And no, let me go ahead and type my own SQL now. Insert into Registrants. And if I really want to be proper, I can quote it. But so long as you don't have any special characters, quoting symbols is not strictly necessary in this case. What fields do I want to insert into? Name and Dorm. And what values do I want to insert? Raw values. Textual values that are not special keywords or field names. I do need to quote with single quotes here in SQLite. Let's go ahead and put [INAUDIBLE] and Currier for Dorm or House there. I can put a semicolon. It's OK if I omit it, but if I want to execute multiple SQL statements at once, I will need the semicolon again. Let me go ahead and click Go. And nicely enough, notice what happened. One row was affected. It took only 0.01 seconds, so it's pretty darn fast. This is just a reminder of what it is I executed. And if I go back now to Browse, I can actually see three rows in the table. All right, so now things are getting a little more interesting. What if I want to search for things? I'm not even going to use this training wheel here. I could use the Search tab and type in some names or some dorms that I want to search on. But that's not the goal here. The goal here is not to teach phpLiteAdmin or this web-based tool. It's just to use pretty quickly this kind of web-based environment to create the schema for our database, the design of the database, our choices of tables and columns and perhaps some of the initial data, and then use that database in actual code. So that's where we're going. We're going to move away from this web-based environment and use actual Python code, but first let's see some of those other instructions. Turns out, per this little summary here, that we have the ability to create-- which we already did using Create Table-- to instert-- which I just did. But what about select, update, and delete? Let's not use the web GUI per se. Let's just go ahead and start typing some raw SQL. I'm going to go ahead and do exactly this. Select let's say star from Registrants. And now as an aside, stylistically it's not strictly necessary to write select or from or all of the SQL key words in all caps. Tends to make code I think a little more readable. But I could still just say select star from Registrants. But I find it nice to distinguish what are built in keywords from like what are my field names and table names. So select star-- star meaning everything. Let's click Go. And now what I see here is this web-based representation of two columns with three rows that have come back. We're going to start calling this a result set. A result set containing three rows have come back. And that's going to map to code pretty nicely. Now, let's go ahead and do this. Suppose that Zamyla moves to a different dormitory. And let's go ahead and type this-- update registrants, set Dorm equal to let's say Grays where Name equals quote unquote Zamyla. And this is what's particularly nice about SQL. Even though the syntax is a bit new and some of the key words are certainly new, you kind of just say what you mean. So update the Registrants table, setting the Dorm field or column equal to Grays. Not for everyone, only where the name field or column equals Zamyla. So let me go ahead and click Go here. All right, one row is affected up here. And notice, a reminder of what I just executed. Now if I click on Browse just to see what the data looks like, indeed Zamyla has relocated to a different dorm or building altogether. If I screwed up, though, notice what could quickly happen. Suppose I did this, Update, Registrants, set Dorm equal to Grays, and I left off the so-called predicate, the Where clause that I did a moment ago and click Go. Now three rows were affected as indicated here. And if I click Browse now, notice that oops, all three of us now live in Grays. So I can fix this manually if I really wanted to without executing another SQL query. But this is just meant to be a user-friendly way of quickly and dirtily editing or creating some data. Really, we're going to start using those SQL commands only. What about Delete? Suppose that Rob has graduated, and he's moving out. Let me go ahead then and do Delete from registrants, where Name equals Rob. Go ahead and click Go. One row effected. And if I go and click Browse now, only Zamyla and I remain. All right, so those are just a few such keywords. But what's nice is that again, it's pretty expressive. So to recap here, how did I create the table? I literally said Create Table Registrants. Though I didn't type this at the time, but this was just the SQL that that web tool generated for me. Spoiler alert, don't look at that just yet. We're going to come back to that in a moment. And I specified that the Name field is going to be of type text and Dorm is going to be of type text as well. Down here I then inserted something like David from Matthews, specifying name and dorm. And notice, these lines wrapped onto two lines, but in general, this could just be a really long query as well. Select Star from Registrants, Update registrants, Delete from registrants where-- I get another little spoiler. And it's a spoiler because there is actually a flaw in my database table's design. My Registrants table was not so well thought through. I indeed only care about storing names and dorms, at least right now. Maybe later, emails and phone numbers. But what could go wrong? When I deleted Rob, something could have gone wrong a moment ago. When I updated Zamyla, something could have gone wrong. What if there are two Zamylas at Harvard? What if there are two Robs at Harvard? Well, each of those queries didn't seem to specify which Rob or which Zamyla. It just said where the name is Rob or the name is Zamyla. So how do we distinguish the two? Well, maybe I should have forced undergrads or students to register with their full names, Zamyla Chan. OK, so that would decrease the probability that we're going to have multiples Zamyla Chans at Harvard. But still there could be that probability. Same with Rob Bowden, or really any name, whether common or uncommon. So that doesn't feel very robust. We could look for a Zamyla that specifically lived in Korea, but maybe on the off chance there's two of them there, that's going to create problems. And we might update or delete more rows than we intend. So this is why we humans in the US for instance have unique identifiers, for better or for worse. Things like Social Security numbers. Or here at Harvard we have Harvard University IDs-- HUID numbers-- or at Yale MIT IDs. Or where you are in the world, odds are you have in your wallet or at home or somewhere in your life, a unique identifier. And you might not even know what those identifiers are. In fact, anytime you've registered for an account on some website like Facebook or Google or any number of others, underneath the hood, even if those companies aren't telling you what your unique ID is, they know you by more than just your name and dorm or email address or phone number. Odds are, they have assigned you some unique but very simple piece of data, like an integer, so that Zamyla might be user number three. Rob might be two. David might be one. Simply because that's the order in reverse in which I inserted them into the table. So let me go ahead actually and fix this. Let me go back into my database. And you know what? I'm going to be pretty bold here. I'm going to go ahead and in my table drop the whole thing. So beware ever executing the drop command, because it will literally drop all the data in your table. But I'm going to confirm that, I'm going to start over as follows. I'm going to create a new table called Registrants, this time with three fields. And in addition this time to Name which will be text and Dorm which will be text, this time I'm going to think ahead. And I'm going to give everyone a unique ID. Convention is to call it literally id in all lower case. And indeed, field name should generally not have spaces, not have funky punctuation. Whether you use so-called camel case with alternating capital letters or snake case with underscores, it's generally up to you. But using underscores tends to be the convention here, although it's not yet to remain. But I'm going to make this id field an integer, and I'm also going to make it my primary key. So that there shall be no two registrants in this Frosh IMs program that have the same primary key, the same number uniquely identifying them. And you know, a better yet-- and this is one of the features you get for free with many relational databases-- auto increment. I the programmer don't care what number Zamyla is. I don't care what number David or Rob is. I just want them to have a number, whatever it is. But I don't want them to be the same. So let me just let my database auto increment this ID. Give the first user one, the second user two, the third user three, and so forth. And by nature, this cannot be null. They need to be uniquely identified. And default value here doesn't matter, because the database is going to take care of that. Let me go ahead now and click Create. And now you'll see the create table that I just spoiled a moment ago on the slide. This now creates a table again called Registrants, because I dropped or deleted the old one. But notice how much detail we have specified what id is. It is not only an integer, it is also a primary key that supports auto increment and it cannot be null. Meanwhile, Name, Dorm, I don't particularly care this time. Though I could specify like before, I don't want those to be null either. So again, that's just going to be a design decision. But let me see how this affects things. Now let me go into MySQL tab, where I can just execute some raw SQL. And soon we're going to be executing SQL from within Python code. Let me go ahead and insert into Registrants a name and a dorm. And notice, I don't care about specifying the ID. That's not interesting to me. All I care about is the values like David and Matthews. And if I now click Go, notice that's all that the database executes. It doesn't go in and secretly change my query, but it does notice that I've omitted explicit mention of that id field. And if I click Browse, now notice, David for Matthews has an id of one. Let's go ahead and do two other inserts, again via raw SQL. Let me go ahead here and do insert into Registrants. And you'll notice there's a nice little history in this web tool of all the previous queries I've executed. So in case you want to click and save yourself some typing, you can do that. But for now I'm just going to go ahead and insert Rob as before from Thayer. And for good measure at the same time, Registrants, Name Dorm values Zamyla from Currier. And notice the semicolon, so that hopefully both of these should go through as separate queries. Indeed they did. One row affected. One row affected. And if I click Browse, now Rob is number two, Zamyla is number three. And this is just going to continue on until we maybe run out of integers. But SQLite will support as many as 8 byte integers, or 64-bit values. So that's not going to happen for quite some time. So now notice what I'm protected against. There might be a whole bunch of Robs at Harvard it's fair to say. But now if I do a Delete from Registrants where name equals Rob and id equals two-- and I don't have to quote the two, because it is indeed a number-- this is actually more information than I need. Indeed, because id is a primary key, I don't need to mention Rob's name. If I already know his id, I can just say Delete from Registrants where id equals two, click Go. One row affected. And goodbye to Rob. And what's nice now about SQLite is suppose I do one other insert. Let me go ahead and Insert into Registrants a name and a dorm with values of Jason and Kirkland. And click Go. Is Jason going to now be the new number two, or is he going to be number four? Let's take a look. If we click Browse here, he's indeed four. So one of the features you get by using the auto increment field is that the database is going to ensure that just in case you have remnants of Rob somewhere else, or really his id number maybe in some other table or in your logs or some piece of data you care about, we're going to make sure that Jason has not only a unique id now, but a unique id in perpetuity. If you don't use unique id-- or rather auto increments, SQLite will actually try to reuse numbers just for efficiency, which can actually break things down the road later. All right, so things are getting interesting. We have the ability now to insert data, delete data, update data, and then select it. What more can we actually do? Well, it turns out that SQL tends to come with not only various data types, but also various functions. It turns out to save ourselves trouble, there are functions like date functions and times and date find times. So for instance, I want to execute a query like select star from registrants. But no, I don't want to select all the registrants. Maybe I want to select all of the registrants this year. I could say something like, select star from Registrants where date of a particular field is greater than or equal to January 1st, 2016 or 2017, or whatever the year happens to be. So these are functions that can do that kind of thinking for me. But SQL also supports a number of even more compelling features. We saw a moment ago primary key, which again specifies that this one and only field shall uniquely identify all of the rows in my table, I guarantee it. There's other types of indexes you can use. So this is the special type of index. And essentially what you're getting is underneath the hood the database, whether it's SQLite or MySQL or Postgres or Oracle or whatever, is generally leveraging some of that week five, week four, week six material from CS50 in the fancier data structures, especially the tree data structures. And if we tell the database in advance, hey, this field's going to be my primary key, or this field is going to be an index more generally, whoever implemented the database software itself probably built into it the ability to store that field using a fancy tree or some other data structure that ultimately is going to expedite selecting and updating and inserting and deleting data. And indeed, we can say exactly that. If you know in advance that you're going to be searching on a certain field all the time-- not just the id field. Maybe your table has an email field or maybe a zip code field or some other field that users might want to search on, rather than just store it in a column by defining a field in your database, and then really just letting the database use linear search to find everyone in 02138 or 90210 or whatever zip code, you can actually tell your database in advance, index this zip code field, index this email field. Because I know I'm going to be searching on that specifically using that special Where keyword in SQL. And let the database give you better performance than something like linear search alone. Meanwhile, you can also specify, I'm not going to just search on this field. I don't care just about it's performance. You know what? I can promise you right now this field shall be unique. It might be an integer like a primary key. But if there's some other field in a database that in advance is going to be unique by design, you can tell the database as much. And the database too will optimize queries involving that field as well. So among the fields we discussed today-- name, dorm, phone, email, zip code, you can probably think of bunches of others-- which of those might be candidate to be unique and to be indexed as unique by your database? Name already feels like a bad idea. We want to be able to have multiple Zamylas, multiple Robs, multiple of any name in the world. Probably want multiple people to live in the same dorm, so that there can't be just one person from Currier, Matthews, or Thayer, wherever. What about email? Email's kind of an interesting candidate, right? Assuming you're OK with banning users-- maybe a couple of people in your life who happen to use the same email address for whatever reason. So long as you're comfortable assuming that every human in the world is going to have or must have his or her own email address, you could proactively say to your database, OK, the email field or the email column shall be unique. I don't want a user with the same email address to accidentally register twice, let alone three times or more. And even if two people think they have the same email address maybe because of a typo, I don't want him or her to be able to register if someone with that address has already been registered. Phone number maybe works OK in the world of mobiles, but if people still have landline phones, you probably don't want to enforce uniqueness there. But what else might you want to impose? Not null. If you want to ensure that everyone in your database has a name, there's nothing stopping them inherently from typing like ASDF or whatever into your database. You might have to protect against bogus data. But you can at least ensure that the user has to give you some kind of value. And then foreign key. And we'll come back to this, but there's a way of specifying that the data in one spreadsheet or in our world now, one table, is somehow related to or identical to the data in another spreadsheet or again a table. But let's come back to some of these features in a moment. What else do we get from SQL? We also get the ability to join tables together. Now what does this mean? Well, let's go ahead and let's go back to the spreadsheet for just a moment. And let's not worry about students and professors anymore, but let's go ahead and think of this as, for instance, users. I'm going to rename my quick and dirty spreadsheet here Users. And what do I want every user in my world to have? Well, I want everyone to have some unique ID, and I'll let my database ultimately assign that. I want everyone to have a name. Maybe let's say a mailing address, a phone number, an email address, and there could be bunches of other fields as well. What else might someone have? That's probably enough for a customer database. And let's go ahead now and consider what these values might look like. So my very first customer, who buys my first widget or whatever, might be someone like Alice. And she lives at 1 Oxford Street, Cambridge, Mass, 02138, USA. Let's go ahead and make this column wider just so we can see it. Her phone number shall be 617-495-5000, and her email address shall be alice@example.com. Let me make this field a little wider as well. Then Bob comes along, and he buys something too. He happens to live down the street at 33 Oxford Street, Cambridge, Mass 82138, USA. And his number is 617-495-9000. And he is just bob@example.com. Then there's someone new altogether, Charlie. He lives at 51 Prospect Street in New Haven, Connecticut, and his zip code is 06511, USA. And his number will be-- he doesn't want to cooperate, so he's just going to give us some bogus number there. But he'll be also charlie@example.com. And this table now-- if you think of the spreadsheet really as just a database table of users, could be dozens or hundreds or even thousands of rows long. So let's now consider what data types I should be using if I want to migrate this spreadsheet into an actual SQL database. So the id field, which frankly you usually get sort of for free in a spreadsheet program, because it just numbers all the rows for you. But if you resort your data, those numbers do not followed the original rows. And so giving our data, ultimately we'll see their own numbers as good. Because those should remain with the data. Alice should always now be one, Bob two, and Charlie three. So name. Let me go ahead and consider what should the fields here be? Well, just for the sake of discussion, let me go ahead and annotate right on the screen here. It stands to reason that this should be an integer. Name should probably be text. Address should probably be text. Phone, I mean it could be an integer. It's definitely not a real. Numeric is not right, because it could then become real somehow. But is integer right? I'm a little worried here. Like especially if it's an international customer, I'm kind of biased at the moment toward US users. But suppose someone typed in their actual calling code or country code and they did something like 011 or 001 for the US and then 617. If I called this field an integer, and then let the user type in their actual number like this, if it's an integer, those leading zeros mean nothing mathematically. So my database is probably going to throw them away. And I might not want that to happen. We get lucky in that this still could work as a phone number, but I really shouldn't be throwing away user's data. So you know what, I'm going to go ahead and just call this text. Whatever the user gives me, I shall store from him or her. And then finally email can be text. And now let's consider how we might index these columns. We'll come back to id in a moment, because SQL's going to give us that automatically in our database we'll see by using auto increment again. But it will be an integer. Name, I could-- I don't want to make it unique, because I want to have multiple Alices and multiple Bobs. I could make it an index. If for whatever reason the website or the application I'm making wants to make it easy for users to search on names, I could proactively say, OK, database index the name column, because it will speed up my searches to be something better than linear search. Address, same thing. If I want to use auto complete or some kind of search feature. So I can search over address. Maybe I want to do that. Phone number, maybe that could be unique, if I'm expecting only mobiles. But that could be risky. Email is the only one I might claim a unique constraint is pretty reasonable, especially if your website-- as is commonly the case these days-- is going to expect users to log in, certainly not with their id. And I see no mention of username. If they're going to use their email address as their login, then you better only allow any email address to be registered once and only once. But you know what I'm not liking? And let me make this column even wider. What's kind of dumb about this design at the moment? You know, before we even get ahead of ourselves and start moving this into our SQL database, what's dumb here? I feel like some redundancy, right? Like 1 Oxford Street, that's special and unique. 33 Oxford Street, that's special and unique, even though there could be multiple people living or working there. 51 Prospect Street, same thing there. But like Cambridge, Mass 02138 USA. Cambridge, Mass 02138 USA. Like why for every resident of Cambridge, Massachusetts am I storing Cambridge, Mass 08138 USA. Cambridge, Mass 02138 USA. This would seem redundant. I mean it's not likely that Cambridge is going to change its name. I mean, once upon a time it was Newtown, now it's Cambridge. But that's not likely to happen again anytime soon. So it's not so much anticipating change, but just look at all these darn bytes that I'm storing redundantly. Which of those bytes though should hopefully be sufficient to identify where someone lives or works? At least in the US, we tend to use our postal codes pretty readily. So you know what? Let me go ahead and create a field called zip code. Why don't I just store 02138 here. And oh, stupid Google Spreadsheets, notice what it did. It is assuming, because something looks like an integer, that it is an integer. So this is a stupid artifact of using a spreadsheet program. Let me change that to plaintext and retype it. Now Google will respect my input. But the point is all the more clear, we had better in our SQL database call that text and not an integer. And now you know what? I can make my address really just a street address. And I can get rid of this, and oh, I shouldn't get rid of this. But I should in this column do 06511, and then here, get rid of all of that. Damn Google again. Let's go ahead and change that to plaintext. 05611, and change this to 02138. So it's a little cleaner now. There's a little bit of duplication. I now have 02138 twice in my database-- or in my spreadsheet in this case. But at least it's a lot, lot, lot less redundant. But I need to recover that information. This is just my Users spreadsheet-- my Users table. You know what? I'm going to go ahead and create another sheet with Google Spreadsheets here. I'm going to go ahead and call this Zip Code. I'm going to call this City, State, Country. Although Country makes things a little more complicated, because zip codes aren't going to be the same format everywhere. But for now, 02138 is going to be-- dammit. Let's change this whole column this time to plain text. 02137, Cambridge, Massachusetts, USA. And now for New Haven, let's do the same thing. Paste that, New Haven, Connecticut, USA. And now rather than just call the Sheet Three, let me more intelligently call this like Zip Codes. And you know what? I'm not loving how many bytes I'm using to store Zip Code. Why don't I learn from lessons past and give this spreadsheet or this column or rather this table its own ID, and just arbitrarily assume that someone will number those for me. So now I can whittle this down to just numbers where this should be an integer now. And in my Zip Codes table, let me make room for just these little annotations. This shall be an integer. Zip code we've already learned had better be text. This should be text. This should be text. This should be text as well. All right, so I have taken a very simple idea, spreadsheet with all of my customers, and I seem to have really over-engineered it. I've made something simple more complex. But why? I mean, it's not compelling when I have just three customers. It took me more time to describe what I was doing than just do it the original way, albeit with the redundancy. But if you do start to have tens of rows or hundreds of rows or thousands or tens of thousands, having all of that duplicated data just doesn't make sense. You're throwing away disk space. You're throwing away potential performance and running time. You're throwing away money, if you have to buy more disk space for your server. And so what I've done here is what we would call normalizing my database. Factoring out the commonalities that can be uniquely identified more simply, with this case a zip code, or even more simply, via some arbitrary but some consistent number that I impose. So let's go ahead now and port this over to my database. I'm going to go ahead and use phpLiteAdmin, just because it makes things nice and easy here. I'm going to go ahead and drop my Registrants table, because that story is over. And now let me go ahead and create a couple of tables. The first of which I'm going to go ahead and call my-- what did I call it? Users. And number of fields, I'm going to need one, two, three, four, five, six fields. So let's go ahead and do that. So six fields, and those are going to be id, name, address, and what did we say? Zip code, and phone, and email. And this shall be an integer. This shall be text. This shall be text. This shall be text. This shall be text. This shall be text. But id shall be a primary key. I want it to auto increment. And you know what? I don't want names to be null. I'm OK with addresses being null if they don't want to cooperate or zip code being null if they don't want to cooperate. Phone, email should not be null. So I'm going to go ahead now and click Create. This is the query-- somewhat longer now-- that was executed. And now let me go ahead and create another table by clicking on the name of my database again, creating a new table called Zip Codes. Number of fields here is going to be one, two, three, four, five. So five fields, one of which is id, one of which is the zip code. Then city, then state, then country. Text, text, text, text, integer. This shall be primary key. This also shall be auto incremented. Zip code cannot be null. And everything else I'm OK with being null. So I can also not spell country. Create. And so now, let's toss in some data. Let me go ahead and insert really quickly, I'm going to use the web form, zip code of 02138, Cambridge, Massachusetts, USA. Insert. Let me go ahead and insert New Haven as well. So 06511. 05611, yep. 05611. New Haven, Connecticut, USA. And now let me just give myself a few customers. Let me go in to my Users table, insert David-- or no, who did we have? We'll copy and paste our actual users. Alice, so we had Alice from 1 Oxford Street in zip code now. But this zip code-- oh, interesting. I goofed. I went a little too fast. So let me abort. Let me go to Structure. Zip code I said was text, but you know what? I'm going to change that to integer. All right, and now let me go over to the Insert tab again. Let's put Alice from 1 Oxford Street at location 1, with her phone number, which was for 495-5000. And you can do this. Even though this looks even more like an integer, a common mistake I think in a database is to just blindly throw in whatever the user typed in, even if he or she used parentheses or pluses or dashes or any number of other punctuation symbols. There's no reason for us to store all of that. In fact, if we want to pretty things up and throw away some of the inconsistencies, I could just use code-- Python, ultimately-- throw away all punctuation symbols, and then just store 10 characters in the case of a US number, or whatever pattern I care about. Or I could proactively with Python even pretty this up by just putting the dashes where I want them to be, not where the user necessarily put them. And then Alice was alice@example.com. And now let me really quickly go ahead and insert Bob, who was from 33 Oxford Street. He's also in zip code 1 now. His phone number was 617-495-9000. And I'll go ahead and pretend like I'm formatting this in code. And then he was bob@example.com. And then lastly, let's go ahead and insert Charlie from 51 Prospect Street in New Haven, whose id I now know to be 2. 617-000. He was the one that didn't cooperate. And charlie@example.com. All right, so what does this now mean? In my database called Lecture, I now have two tables-- Users and Zip Codes, each of which has some number of fields inside of them. But this is not all that useful to me anymore, because now if I go ahead and select my users with a query like this, select star from users, go, I get back this. But what the heck is zip code 1 or zip code 2? Now OK, I can figure this out. If I know that I'm looking for Alice, and I see that she's in zip code 1, I could do another query. All right, let me go back to that SQL tab. Let me go ahead and do select star from Zip Codes where id equals 1. Because I saw that Alice's zip code was uniquely identified by 1. Oh, all right. So Alice lives in zip code 1. Yeah, that's 02138 Cambridge, Mass, USA. OK, got that. But now I've used two SQL queries. The data is still like-- some of it's over here, some of it's over here. Now I've got to somehow combine it in code. And you can do that, but SQL is much more powerful than that. It has other keywords like this, where I can have the database do the thinking for me. Let me do this. Select star from users, join Zip Codes on users.zipCode equals zipCodes.id. Take a moment to think, and ignore the red squigglies. That just my browser thinking I'm being grammatically bad, verbally too. Select star from users, join zip code. So I'm telling the database, go ahead and join these two tables. How? Well if you think of one table as this hand and one table as this hand, what's nice is that each of them has a field that's inside of the other. This table might have the id field. This table has the zip code field. Wouldn't it be nice if I could somehow stitch those together, lining up those zip codes, whereby in this table, Users-- I called it zip code-- and then in my Zip Codes table I called it id. Let's see what happens. Let me go ahead and click Go. And amazingly, look at what I get back. My result set this time contains everything. I get Alice, Bob, and Charlie, each of whom lives at these addresses. But look at their zip codes now. They've been filled in with the actual values, plus the cities and the states and the country. So this is where you really now start to scratch the surface of the capabilities of something like SQL, because it can actually combine data in this way. And this was a pretty simple query. But now I can sort of exercise good design. I can keep my data very cleanly structured and normalized, whereby I factored out all of the redundancies. And yet I can programmatically reconstruct that data and ensure that I can get back everything I care about. Moreover, I can further optimize things. If I go into Users for instance and I go under Structure, notice down here there's another field. Create index on one column or more. Let me go ahead and create that index on my Users table. I'm going to call it Arbitrarily Email, and I'm going to say duplicate values are not allowed. I'm not going to bother with a where clause here, but I'm going to put this constraint-- this unique index on my email field, and now click Create. And notice what got executed. Earlier we used it to create a table. Here I'm using it to create a unique index called Emait-- though I could've called it whatever-- on the Users table using specifically this field. So this now is my way of ensuring that the following can't happen. Notice that in my Users table I've got Alice with alice@example.com. suppose that someone else named Alice, also with that same email address-- or who thinks her email address is the same, because of whatever typographical error-- comes along, and I try to register them for my site as follows. Insert into Users. Let's see, we have name, what are the other fields now? Name, address, zip code, phone, and email. The values as follows Alice number two, so her name's not distinct. Her address is going to be 1 Main Street, so her address is not the same. Zip code, she'll go ahead and live in New Haven, for instance, so 2. Phone we don't need to worry about, so she'll be another fake phone number for today's purposes. But suppose that she also thinks her address is alice@example.com Or heck, maybe this is the same Alice who forgot she has an account on our website-- maybe more likely-- and is trying to reregister with the same address. What's going to happen? If I didn't make any typos here, let's click Go. Interesting. The database stopped me from doing this. Error, unique constraint failed, users.email. And this is common syntax in the SQL world. tableName.fieldName. So insert into Users, whatever I just typed failed, because the unique constraint. And indeed, we can see this. Let me go to Browse and look. There is no second Alice. So we could do this in code, as we'll soon see even more in the world of Python. You could check by writing your Python code to see, wait a minute, someone's trying to register as alice@example.com. Let me quick select all my users from my database, look for Alice, and if I already have Alice at example.com, I'll just say, no, you can't register. Much like I said no you can't register for Frash IMs if you don't have a name or a dorm inputed. But my database can do that for me. And this is a nice wall between me and my database, a nice wall between the software developers and even the database administrators so that you don't have to worry about data accidentally getting into your website or your database that shouldn't actually be there. All right, we have all of these building blocks now. We have Select and Insert and Delete and Update and now Join. What more is there? Well, there are some more keywords. But let's first use the ones we have. But this time, not just to play around via the web-based tool. Let me actually show you one other thing. So let me go in to CS50 IDE and take a look at my source 9 directory, and you'll see these four folders, two of which you're about to look at in a moment. And you'll also see lecture.db, a file that I've been using to store all of my SQLite data, including those tables. Well, it turns out that we can actually use a command line client in order to see the same data. Let me go ahead and do that by typing SQLite 3 for version 3. And then let me go ahead and type in the name of this file, lecture.db Enter. And now you'll see a very simple and perhaps a little cryptic command line interface. But thankfully if you type as it says .help, you'll see everything that you can possibly do with this command line program. But we're going to go ahead and keep it simple. I'm going to go ahead and just say .tables, which is SQLite's way of saying show me what tables are inside of this file called lecture.db. Now here I have users. Let me go ahead and see what is the schema of these tables. And you'll see the SQL commands with which the Users table and the Zip Code tables were created. But more interesting than that is this. At the command line, this is just a SQL client. I've been using the web-based GUI, phpLiteAdmin for the past few examples. But we can just do this at the command line too. Let me go ahead and select star from users semicolon, and now you see in purely textual form the exact same data. Here's the ids, my users name addresses, zip codes, phone, emails. I can see the same thing from my zip code fields. Let me go ahead and do select star from Zip Codes. And there we see the same data. So using the command line too, you don't have to even use phpLiteAdmin. It just tends to be a little more user friendly. You can also see the same data in the same file. And now that I've done this purely manually, let's now transition to doing this in Python code. But first, let me make mention of one detail in the tables that we had created. Back here in users, you'll recall that we had the structure of having an id and a name and the address, zip code, phone, and email. What you'll actually find is that the world has generally standardized how you define what are called foreign keys in these tables. So specifically, which of the fields in this table are kind of foreign to this table? That one of zip code. Originally, that zip code was a text field, 02138, and any number of other zip codes as well. But then I changed it to an integer, because it really in this table is a foreign key. Because those same numbers 1, 2, and onward are primary keys in that other table called Zip Codes. And so what would actually be more typical here is that we would often name this not Zip Code, which is a little ambiguous. But just as a human convention, let me propose that we clean up this design a little bit and call this Zip Code ID to make super clear that this is indeed an ID. It's not this table's ID, it's an ID from the Zip Codes table. Plural would be the convention, and singular would be the convention here. And indeed, we can now use the same kind of statements in code. But it turns out with one of our concluding examples today, we'll see while it's advantageous to adhere to certain conventions, and frankly even this might even be typically higher in the table as an id and not just buried there in the middle. But that's a more minor detail. All right, now let's return to CS50 IDE and build upon those previous Frosh IMs example. But now weave in this new feature that we have, which is that ability to store data and retrieve data from a database. So recall that this was Frosh IMs 1, which was an improvement upon Frosh IM 0 in that at least Frosh IMs 1 actually kind of sort of registered students. It used registrants.csv, and it wrote out the data to a simple text file. But with text files, CSV files, even though you could double click them and open them again in Numbers, Excel, and port them into Google Spreadsheets, we don't have the same expressive capabilities as we do with a language like SQL to select, insert, update, delete. They're just text files. With SQLite though, a binary file, and the SQLite 3 program, and we'll soon see Python code, we can actually execute more sophisticated queries than the CSV format allowed. So how are we going to do this? Let me propose now in Frosh IMs 2, we create a new application.py that's going to behave as follows. Let's go ahead and propose that I'm going to import as before some of the Flask functionality. So from Flask import capital Flask, which is the class that we're using is the application itself. Render template, so I can render HTML. Redirect, so that if I want to redirect the user, I can do this by an HTTP location redirect. Request, so that I can actually get at form data. And we'll see URL4, which you might recall from past problems. But you know what? I'm also going to import from CS50 our SQL wrapper. So inside of the CS50 library recall for Python is getInt and getString and getChar and getFloat. But inside there also is a whole library called SQL. So the CS50 SQL library that doesn't do all that much, but it does offer us an execute function that's going to allow us more simply than might otherwise be possible to execute SQL statements inside of our Python code against Lecture.db on FroshIMs.db or whatever the file may be. And I'm going to do that as follows. First I'm going to instantiate my Flask application, as always. And now I'm going to instantiate a database connection essentially as follows. SQL quote unquote SQLite colon, slash, slash, slash. So notice the third slash in this case. FroshIMs2.db. So this will be Frosh IMs version 2. And that's it. So I'm telling the CS50 library to use SQLite and to open up ultimately the Frosh IMs 2 database. Now I'm going to go ahead and have a simple route as before for just slash. I'm going to define a function called index as before, and this one's super simple. Just go ahead and return render template of index.html. And now let's reimplement register as follows. So app, route, slash register. But in this case, you know what? I want to support post. I don't want students' information ending up in the URL and like the computer lab's history or the roommate's browser or whatever. So let's go ahead and specify to Flasks that you know what? Only support this list of methods, specifically just post. Let me go ahead now and define a method called Register. That's going to map to this route. And as before, if request.form quote unquote name equals equals nothing, or request.form quote unquote dorm equals equals nothing, then go ahead and return failure, and return render template of quote unquote failure.html. So the website is almost exactly the same now, except we have preemptively added this database capability to our code that we're now going to use. Otherwise, if we're not in failure, let me go ahead and execute the following SQL. Inserts into registrants a name and a dorm with the values of-- we'll come back to this in a moment. Colon name colon dorm. I could call these anything, but a convention in a lot of SQL libraries-- a lot of code, whether it's Python or PHP or Ruby or other languages-- is to have these placeholder values with a colon and then a word that's probably identical to the field name, but much like in C, percent s has been placeholders, much like in Python open curly brace close curly brace has been placeholders. We've got one more placeholder convention, which is colon, and then the name of a symbol. But for the most part, they can almost always be identical to what the field names are. And now, I need to plug these values in. So I'm going to go ahead and plug in for the first name equals request.form name. And dorm is going to equal request.form dorm. So again, this is a Python thing meets SQL thing. In our SQL library, as implemented by the CS50 execute function, we have this SQL statement. Insert into Registrants name, dorm, values, colon name colon dorm. CS50's execute method, like many libraries out there, will recognize any words that you have written in a SQL statement that begin with a colon, and will then proceed to substitute in any of the named parameters you provide thereafter. So I'm saying, hey, library, give plugin a name of request.form name. So whatever came from the user's HTTP request. And for dorm, plug in whatever came in the dorm field as well. So those were from the HTML forms that the user submitted in order to register for this sport. After they've done that, let's go ahead and do this. Return, render, template, success. And this time I'm going to change the success message to actually be, yes, you really did indeed register, or the spirit thereof. So let me do this. Let me go into this Frosh IMs 2 directory. And as you might have done for recent problems, let me go ahead and run Flask run. And we have since tweaked the configuration of the IDE so that you no longer have to specify a host of 0.0.0.0 or port of 8080. Those will just be assumed by default. You can override them if need be. Now, I believe my application is running. So let me go over to my web server. And indeed, this is the form we saw last week. Super simple, super ugly, super HTML. Looks like HTML1, actually HTML5. But let me go ahead now and prepare a database for this. In another terminal window, I'm going to go ahead. And in my source 9 directory as well, open up FroshIMs2.db. Which if I open it up in phpLiteAdmin already looks like this, Frosh IMs 2. But there's no table in there yet. So I care only for now about names and dorms. But I've learned my lesson. I'm going to care about IDs too. I'm going to go ahead and create registrants with three fields, much like we conjectured earlier. An ID, a name, and a dorm, first of which shall be an integer, primary key, auto increment. Next of which will be text, dorm of which will be text. I don't want any of those to be null. They don't need to be auto incremented or primary keys. So let me go ahead and create. If I return now to my Browse Structure, the table is empty. Now let's go to the web-based application. Here we go. Let me go ahead and have a new name altogether. Maria wants to register. And Maria is from Stoughton. Register. You are registered. Really? So we've not invested heavily in the aesthetics of this website just yet, but really I have been registered. If I go back to phpLiteAdmin, click on Browse now, notice Maria from Stoughton is actually now in my database. Wait a minute, maybe I'm kind of cheating. Maybe she was already there somehow. Let's do this again. Let me go ahead and hit back. Let me go ahead here now and say Andy. And let's pretend Andy is visiting someone in Weld. And click Register. You're registered really. Let's go back to phpLiteAdmin. Click Browse. And now Andy is in there as well. So now we're not just storing things on disk so to speak with CSV files. Now we're actually interacting with the SQL database. And we did so by using CS50's execute method inside of its SQL database library. But notice how relatively simple it was. We simply used the same syntax with which we've been playing around, either the command line or phpLiteAdmin, but now doing it an actual Python code. So now we literally have the ability programmatically to create data, to update data, delete data, or select data, ultimately, from a database. So let's do exactly that. Let's take things up a notch and do even more than we did last time. You know what? Let me go ahead now and create another route for slash registrants. Suppose I want to make available a web page that shows me everyone who has in fact registered. So this is a feature we haven't even had yet, though we could have. At least with CSVs, we would have had to open up the CSV and iterate over it. Now we'll do that even more simply with the registrants method here that has rows gets db.execute, select star from registrants. So I know from our earlier experimentation, select star means get everything. From registrants means from that table. db.execute is just the CS50 method that's going to execute the SQL and return to you-- what? I didn't care about a return value earlier, although technically, I could have gotten back a value, as you'll see in the documentation. But selecting star from registrants. What do I want back? Well earlier, I proposed that a database table, like a spreadsheet, really is just a list of dictionaries. And indeed, that's exactly what the CS50 execute method gives you back. It will return to you if you've used a select a Python list each of whose elements is a Python dict object, which means you have access, which each of those rows to the field name or column name, the so-called key in a dictionary, and the value, the cell in that table. So now that I have these rows, what can I actually do with them? Well, I'm going to go ahead and render a new template, registrants.html, and I'm going to pass in as you might have for a past problem all of my registrants by passing in all of these rows. So it turns out, templates can be parametrized such that I don't just have to spit out some hard-coded registrants.html file. I can pass in a key of registrants or call whatever I want, and the value that I just got back from the database. So that I'm now handing to my templating language, Ginga all of these rows. So that suggests that in my template, my so-called view more generally in MVC, my view can iterate over those rows and spit out every one of my registrants. Let's go ahead and do this. Let me go ahead into registrants.html, which will be a new file. So let me create a new file here called registrants.html. And let me go ahead and make sure that as are others extends layout.html so that it looks just like everything else. And then let me go ahead and just give it a block title up here, so that its 2 is consistent. This will be called registrants. And now I do nblock. So again, this is the Ginga templating language. This is not a Python per se, not SQL per se. It's just really for rendering a viewer or the aesthetics of my site's. Block body. And now in here is going to go nblock. So the question is, how in this new template file do I spit out like a list of registered students? Well, list. I know from some HTML back from week six, I can give myself an unordered list. So just a bulleted list like that. And now I know I can spit out list items to put values next to those bulleted lists. But I don't know how many list items to output yet. But wait a minute, passed into this template was a key called registrants whose value is rows, the list I got back from my database. So you know what? It turns out that in Ginga, you can execute essentially Python code that looks like this for registrant and registrants. And then down here, let me do N 4. So slightly new syntax here. So no colon here we're doing 4, and the opposite of it I N4. I can now do list item. And next to that list item, you know what? I'm going to do it and we've seen this syntax before. Registrant.name from registrant.dorm. OK, this looks very strange so what's going on? First line just means we're inheriting from layout.html. So the whole page is going to just plug in values into that template, namely a title and a body. Title's uninteresting. It's just going to be registrants. Body gets interesting, but notice it's not hard-coded HTML anymore. We've got an open UL tag and a close UL. So this means hey browser, here comes the start of unordered, the end of an unordered list. But notice that my template or more generally the view in my software is now going to use a Python-like loop here. But again, no colon. You have these special tags with the curly braces and the percent signs. And four is the opposite and now notice what I want to do inside of this loop I want to output literally open bracket LI close bracket. And then eventually, open bracket slash LI closed bracket. And then dynamically, on each iteration of this loop, I want to output a registrant's name and a registrant's dorm and just grammatically say from in-between, so it's David from Matthews and some Zamyla from Currior and Rob from Thayer and so. And the only thing that's passed in is this thing here. Remember the named arguments, the named parameters that I passed into my render template method. It was registrants equals rows. This is really equivalent to iterating over the rows that came back from my database. In fact, if you want to be even more l I don't have to call registrants I could just say rows equals rows. So I'm literally passing in my database rows, and then in my template here, I could do for row in rows row.name, registrant, row.dorm. Just semantically, I thought it would be a little more intuitive if I actually say what these things are and not just generically refer to them as rows from my table. All l so let's see this now. Here that web form. Let me now go to slash registrants, enter. And oh my god, a bulleted list. This may be-- I want to prove that this is working. Let me go back here. Let me go ahead and register Stelios now, who is currently living in Canaday, register. OK, Let me go to slash registrants again. Reload. Stelios from Canaday. Very interesting. But this isn't where we need to stop. Let me add one other feature. Lets get that Stelios out of there. I suppose that it wasn't quite making it on the team here so you know what, we want to have one other route. Let's have an unregisterred function if Stelios wants to bow out after all and focus on something else. So apt out route slash unregister. You know what? This is going to support multiple methods. So by default, it's just get. But I want it to be not just get but get and post. And then in here I'm going to have def unregister as the name of my method. Again, it could be anything. But you should be consistent, I would say. And now I'm going to have two conditions. So this slash unregister page is visited via gets. You just go to this URL l. Then I just want to see a same list of registrants but with some kind of form by which I can delete them from my database, by which they or I can unregister them. Meanwhile, I want to go ahead and if post, I want to acually do the deletion. So let me try this. If request.method equals equals gets, then Rose get's d.b. Execute, select star from registrants. And then I'm going to go ahead and return the templates, render template, unregister.hteml. And pass in those registrants as those rows. Actually, c with. Lith requests that method equals equals post. What I want to do now, I want to do if request.form, quote unquote "ID." So if there is, in fact, an ID passed in-- more on that in a moment-- I want to go ahead and execute-- hmm, where is this going? Let me go ahead and see the template first. I think we need to see this. Let me go into froshims2. And I whipped this one up in advance. unregister.html-- let's go ahead and do this. So on registrants.html, I just spit out an unordered list of all the registrants. In unregister.html, I'm going to add a little bit more. Inside of each of those list items, I'm going to have a forms input field-- input element-- whose name is going to be ID. I've just hard-coded that, because this is going to represent the ID of the student I want to unregister. The type of this input is going to be radio. So these are mutually exclusive circles that you can essentially toggle. Value of this is going to be whatever this current registrant's ID is. And then I'm going to go ahead and put this registrant's name and dorm just as before. So we'll see what this looks like in a moment. But notice I've also added two other lines up here. I've specified I'm inside of a form, the URL for which is unregister. So we've seen this before. url_for() is just a function that comes with Flask that helps you dynamically figure out what the actual URL should be for the method called unregister(). So that's why, if you adopt some nice naming conventions, you can use tricks like this, and Flask will just figure out to what URL this form should be submitted. And the method I'm going to use is going to be POST. And then down here, notice I have a Submit button. Type equals submit. Value equals unregister. So what does this look like? Well, let's see that. If I go ahead and visit /unregister, I see an ordered list, just like before. And if I played with CSS, I could even get rid of these bullets altogether, each of which has a radio button next to it, one of which I can click in order to delete this user from my database. But how does this work? Let me go ahead in Chrome now, and inspect, and look under elements at one such row. And notice the HTML that Python, and in turn, my Jinja template has dynamically output it for me. Here's my page's HTML. If I expand this list item, I see input name equals ID, and I hard-coded that. Type equals radio, I hard-coded that. Value equals-- this was registrant.id. So in this Jinja for loop where I'm spitting out one registrant at a time, this is where I was dynamically spinning out Maria, and Andi, and now Stelios' ID. So if I expand, the LIs above will see that, ahh, Andi has a value of 2, because her ID is 2. And Maria has an ID of 1, because her ID was 1. Meanwhile, Stelios from Canada is just text. So if I select him and then click this button, we need to be able to handle that. So what do I want to do? Let me go back into CS50 IDE, into application.py. And here we go. If the form was submitted via POST, as by clicking that Submit button, and if there is indeed an ID in the form-- so the user actually did select one of those radio buttons, and therefore, there's some actual work to do-- DELETE FROM registrants WHERE ID equals colon ID, where, again, colon ID is just my placeholder. The value I want to plug in for ID is request.form quote unquote "ID." Where did that come from? Again, if we go back to my HTML, notice that these radio buttons are all called "ID." And because they are mutually exclusive by definition of a radio button, only one ID will be submitted if one of these boxes is checked, one of these circles is checked. And that value submitted will be 1, 2, or 3. So to confirm, here's our page with unregister, and all three students are still in there. Here is /registrants. All three students are there. Although this is uneditable, let's go ahead and unregister Stelios by clicking on Register. And it would seem that he's indeed gone from the bulleted list. Let's go ahead and check phpLiteAdmin. Let's go ahead and click Browse. And Stelios is now gone. So this is where everything's finally starting to come together, right? In week 6, we talked about HTTP, and parameters, and how HTML and CSS worked. But it was largely static and hard-coded, and we were just playing around with fake Google and implementing our own front end. But now I'm using forms again, and using them not only to create an interactive UI-- user interface-- for users, I'm also now implementing the back end, the server, the routes that are capable of getting those HTTP parameters' values as with request.form-- or other mechanisms if they come in via GET, a different syntax altogether. I can get those values and then do something with them by combining those values with SQL code. And so the last line here that I actually should have included for good measure-- I simply ran the code that I had pre-written in advance so that I didn't mess up. Let me go ahead and do this. This is the one line that was technically there when I just ran that code even though you didn't see it until just now. After all this, I have decided, just to keep the UI pretty simple, after trying to delete someone, just go ahead and redirect, not to the unregister page again, but to the registrants. And this is why, after deleting Stelios, I immediately saw a new bulleted list with just Marie and Andi, because I redirected the user to the route for registrants, a.k.a. /registrants. So again, this is where everything's coming together. And it's a lot to absorb all at once. Because, my god, we had HTML and CSS. Then we introduced Python. Now we introduced SQL. Then, we have Jinja and the templating language. And now all of this comes together. But again, if you go back in diving into all of this, first principles and the definitions of each of these, HTML, it's just the markup language that lets us lay out and format a web page. HTTP is just the language-- or the protocol, technically-- via which web browsers and servers intercommunicate. Python, of course, is a higher-level language. It's an alternative to C. And it seems to come with a whole bunch of useful functionality that, thanks to frameworks, or micro-frameworks like Flask, make it relatively easy to get real work done with relatively few lines of code. There's a learning curve, to be sure. But you know, this is kind of impressive, that with just a dozen or two lines of code, I've implemented the beginnings of a web-based application by which students can register for sports, unregister for sports, see who's registered for sports. You know, I might just need to add some logins, and a few other features, and definitely, some prettier aesthetics. But that's a lot of functionality packed into just a few lines. And now that we have SQL today and we have a function like db.execute() that allows me to execute SQL inside of my Python code, now we have the ability to store data long-term, to access it, search it. And we're just using small data sets. I could store thousands, tens of thousands of rows and use these same principles, especially for data science applications, analytics, analyzing corpuses of text. So many possible applications now. And you know what, if we will go just one level deeper, it turns out that while having programming chops with SQL, and knowing SELECT, and INSERT, and DELETE, and JOIN, and CREATE, and all of the various keywords we've started to play with today and scratched the surface of-- super useful and super powerful when you want to analyze your own data, or your own company's data, or your own thesis' data, or the like-- it turns out that eventually, you even outgrow that level of interest typically. And an additional layer of abstraction is often helpful. And so another feature you get with frameworks like Flask is what are called models-- literally, models. So you recall that we've been talking about, in general, MVC, whereby we have models, and views, and controllers. Well, in this model, we have been interacting with our data via low-level SQL. It's new, for sure, today. But it turns out that once you get comfortable with SQL, and so long as you adopt certain conventions of giving all of your tables an ID field-- and if you have foreign keys, it's something, underscore, ID-- where you generally adhere to certain conventions and adhere to a framework's requirements, it turns out you can do things like this. In froshims3, we have essentially the same files, except that we've changed the application.py to be a little bit different. We're not using any of CS50's package or module now, so we're not using db.execute(). And in fact, you don't technically need to use that to access SQL. Our single-function execute just makes it much easier to get back lists of dictionaries as opposed to executing multiple lines of code as you could do with a library called SQLAlchemy or Postgres' own-- or rather, SQLite's own-- driver in the world of Python. But I'm going to add a few lines here using another library called Flask-SQLAlchemy, which was pre-installed, or will be pre-installed, for you in CS50 IDE. There's a few lines that I had to copy and paste earlier, from the documentation, to get it to work right. But notice, this is, perhaps, the familiar line. Instead of froshims2, it's now forshims3.db. But notice, at the end, I get another db object. It's not CS50's. It now belongs to the author-- or it was created by the author-- of this library. But it turns out you can do some pretty cool things as follows. Recall, from last week, that Python supports classes much like C supports structures. And inside of classes can go properties, or pieces of data, as well as methods, or functions. So it turns out that we can define, using Flask and using libraries or frameworks like it, what's called an Object Relational Mapper, or ORM. And this is just a fancy way of saying, if you don't want to think about your data as rows and columns, which we have been all of today, ultimately, whether in spreadsheet form or database form, you'd really like to think of a registrant for a freshman intramural sport as an entity, as an object of some class, well, you can do that. You can declare a class called registrant and have it extend the db.Model. So this is another class that comes with Flask that we are now inheriting from, so to speak. So this is truly now object oriented programming. We are specifying, via __tablename, that the SQL table to which this class should map is going to be called registrants. And ultimately, this class, registrants, when it is instantiated is going to give me an object that represents a row in that table. And this object is going to have an ID, a name, and a dorm, as we've been doing. And notice here, using SQLAlchemy-- so using this library from Python-- I am declaring a column called ID that's going to be of type integer. And yes, it's true that it's the primary key, whereas name and dorm are just going to be text. Because what you can do with SQLAlchemy, and with ORMs, more generally, is you can specify, in Python code, or whatever language, what your database looks like and what your data therein looks like without actually writing raw SQL queries so to speak. So even though we've just introduced SQL syntax, you can eventually take off that layer altogether and build on top of it using objects like this. And now, as an aside, this is a constructor or an initialization method that you might recall from past problems. But let's focus, a little later in the code, on why this is actually compelling. If we scroll down later in the code, like, to my register route, there is no SQL in this implementation of froshims3. 3 The first few lines are the same. Indeed, only once we get down to this line here do we have registrant as a variable, registrant as the class name. And we're passing in, apparently, the name and the dorm that came from the HTTP request. And we're passing those into the registrant class. If you recall how classes worked, if they have an init() method, you can pass in some default values, name and dorm in this case. And that's how we are creating a registrant object of type registrant. And now notice what we can do here, db.sessios.add(). So this adds to my database sessions, so to speak, another feature you get from this particular library. We're going to add that registrant. And then wonderfully, we're going to commit that registrant. In other words, we have created a variable in memory-- specifically, called registrant-- who's type, who's data type, is Registrant, capital R. And that's simply a class, inside of which is name and dorm. There's no ID yet. But what's really cool about this ORM is that when you call add and then commit, that's like putting it in a database and then hitting Save. And the database, because of the code we find up here, is going to automatically insert that ID for us. And meanwhile, this object now, registrant, is actually going to have, inside of it, the ID that was stored in the database. So I don't have to worry about insertions. I don't have to worry about any updates or deletes. I can interact with my data now, completely at a higher level, in Python alone and leave it to the ORM-- SQLAlchemy here-- to actually do the creation of the SQL statements. And if you look later in here, if you'd like to play around, you'll see that we've rewritten registrants and unregister as well to use SQLAlchemy as opposed to raw SQL queries. And for instance, here is how you can get all the registrants in your database if using an ORM. Instead of doing SELECT* FROM registrants, you can just say, hey registrant class, give me a query for all of my data. And what you get back is a whole bunch of rows, which, as before, we can pass into our template. Down here, meanwhile, we can request all of those rows again and pass them into that template. Or here, notice what we can do. If we've been passed the ID of a registrant like Stelios', we can say, hey registrant class, give me a query, but filter that query so that the registrant ID I care about equals the one I was passed via HTTP. Oh, and by the way, once you find that in my database, call the dot delete method to just get rid of Stelios from the database. So again, we might not necessarily be solving a problem or scratching an itch just yet, especially since SQL itself is, odds are, quite new to you. And so we've already solved the problem in one way. Here's another way to solve it. But realize that, eventually, it's fair to say that you find writing SQL queries sometimes tedious. Though frankly, you'll get a lot more control, and potentially, more performance out of them if writing them yourself. And so having the best of both worlds is perhaps the best takeaway here-- actually understanding what's going on underneath the hood, as was the entire point of our spending so much time in C, and understanding how you can execute SQL queries, but realizing, down the road, especially if you find that, wow, it's really getting a little slow to write all these low level SQL queries, wouldn't it be nice to just create my database schema as I did earlier with phpLiteAdmin or at the command line and then let my library code figure out how to get data in and out for me, albeit perhaps at a performance penalty, that's a nice place to get to. So again, even now that we're in week 9, and we've abstracted so far away from week 0s and 1s, is there still this progression and this onward march of abstraction as the world gets more and more familiar with solving problems and starts to realize best designs for doing so? But it's not all fine, and good, and safe. In fact, let's make note of perhaps one of the most tragically common mistakes people make when using SQL. And indeed, one of the reasons to use things like libraries like CS50's library or even higher-level, fancier libraries like SQLAlchemy, is to avoid these kinds of threats. And yet many people and many sites still suffer from what are called SQL injection attacks, for instance. So what does this mean? Well, probably, a few times a week, you log in with your Yale NetID or with your HarvardKey, which gives you forms like this or like this. And ultimately, you're providing. simply, a username and a password. But suppose that, for the sake of discussion, the HarvardKey system were implemented on the back end with Python using SQL. And how, then, do we implement logins? Well, when I give Harvard or Yale my login name or my NetID and then my password, well, what are they doing? They probably have, each of them, a users table if they're using SQL, whether it's a SQLite, or Oracle, or MySQL, or Postgres, or whatever. And they probably have written code somewhere in that login site that says SELECT* FROM users WHERE username equals whatever they typed in AND password equals whatever they typed in. And if that gives you back a row representing David, for instance, then you know that he or she has logged in correctly, because you wouldn't have found the row if the username and password weren't in the database. Now, it turns out fancier things are done with the password. You probably don't want to store users' passwords in plain text, so to speak, borrowing language from week 2. Rather, you want to store ciphertext or some kind of hashed value so that even if, in the worst case, your database is compromised or stolen, no one in the real world actually sees your users' passwords but only some cryptic-looking hashes thereof, which at least raises the bar to exploiting your account. But if they're using SQL-- this is what worries me-- they are, at some point, taking what I, a human, typed in-- hopefully a good guy, but could be a bad guy, typed in-- to their website and plugging it into a SQL query. Because they're not just going to necessarily do SELECT* FROM users. They might actually say, SELECT* FROM users WHERE username equals such and such AND password equals such and such, or at least one of those predicates. So what if you've done things poorly in code? And suppose that Harvard had implemented it in such a way that this simple-looking, stupid-looking query is actually a really big threat? So I've temporarily turned off the bullets that you would normally see in a password form. This is not hard. It's just an HTML thing. And suppose that my email addresses is me@examplemailprovider.com. And suppose, for my password, I don't type in 12345, or whatever my actual password is. I type in, cryptically, ' OR ''1='1. Why this? And there's an infinite number of things I could type if I am aggressively trying to hack into Harvard or Yale's website. But notice this feels like it's part of a logical query. It turns out SQL has OR. We haven't seen that before, but like Python, it literally has the keyword "or." And I am assuming, in this case, per some of the examples, that maybe the programmer at Harvard or Yale has single quotes in his or her code. And maybe they are just foolishly, and very riskily, plugging in what I type in between those quotes. So notice, this is kind of the end of a quote. This is the beginning of a quote, but I've not finished the thought there. So let's see what happens. Suppose that the code on the back end at Harvard or Yale-- we're in Python-- this. So somewhere in their files, username variable gets request.form username. Password gets request.form password. So just two variables called username and password, just so I have them handy. Maybe they're using CS50's library function. db.execute( SELECT* FROM users-- and this happens to wrap on two lines, but ignore the extra space-- WHERE username equals {} AND password equals {}). So just like we did last week when printing things in a formatted fashion, this was like Python's equivalent of print diff using the format method of the string class passing in username and password. But this worries me. Because if the programmer at Harvard or Yale has literally given me these placeholders of '{}', whatever I typed in as my username and my password is literally going to go there and there. But what if-- oop, oop, oop, oop, typo. The whole story breaks without this. And there-- but what if the user types in that cryptic-looking string? Well, then, what ends up happening is this. And it's red because red is bad. SELECT* star FROM users WHERE username name equals quote, unquote, me@examplemailprovider.com-- no big deal-- AND password equals-- this is interesting. Previously, there was a single quote and a single quote, and then the curly braces in between. But underlined here is what I, the adversary, typed in. Cryptically, 'OR'1'=1', that's it. Notice where the underlining starts and ends. But I seem to have maliciously finished that programmer's thought at Harvard or Yale. I've not finished it in the way they intended. But this is syntactically and semantically correct, if a bit strange. I am essentially saying, if the password equals nothing or 1 equals 1, because I've plugged in those characters in such a way that they still make a WHERE clause syntactically accurate. 1 equals 1 always. So this is like saying, SELECT* FROM users WHERE username equals me@examplemailprovider and whatever else is the case. Like, that is always true, that 1 equals 1. So it doesn't even matter what the user's password is or that I didn't even type it in. I just did quote, unquote. All right, well, what if maybe-- OK, so format may be bad. Curly braces, like we've been doing in Python for string formatting, maybe that's bad. Python also has, like Java and JavaScript, the ability to concatenate things together, in this case, using the plus operator. So what if I just go old school and concatenate my strings together like this? Maybe that's better, noticing single quote here, single quote here. And the double quotes are just stopping the string while we do this concatenation there and there. It's the exact same problem. So those instincts don't serve us well at all. In the end, we still get, username equals whatever I typed in AND password equals this WHERE 1, indeed, again, equals 1. So no matter what, this query is going to return a row if that email address is in there, probably resulting in the Harvard or Yale code logging this malicious user in. So there's got to be a better way. And indeed, the reason to use libraries in general, whether it's CS50's for the next couple of weeks, or in the real world, any of dozens of SQL libraries, is that other people before you have thought through these threats, have written the requisite code, which isn't all that much, to notice when there's dangerous queries being injected and escape them properly. So here's how we would do this with the CS50 db.execute() method. SELECT* FROM users WHERE username equals colon username AND password equals colon password-- sorry for the type there. I'll fix that. Bad with spacing today-- AND password equals colon password, unquote, passing in username, passing in password. And again, you don't repeat the colons here. But the key and the key in those named parameters lines up with the key and the key that do have the colons. And the way the CS50 library works-- and there's not much going on in the CS50 library. There's not much of a training wheel there-- we essentially are simply wrapping that other library I referred to, SQLAlchemy, which gives you not only the ORM feature, the Object Relational Mapper feature, where you can create your own classes like registrant, they also let you, in that library, execute raw SQL. And they also take care of all of this escaping. So we, the CS50 library, are really just taking your queries, passing them to the SQLAlchemy library via raw SQL, getting back the results, and just neatening them up, and returning to you only a list with dictionaries inside without expecting you to execute multiple lines of code. And now, in green, is the expected solution here. Because of the way the CS50 library works, and in turn, the SQLAlchemy library works, even if an adversary types in funky syntax like those single quotes trying to trick your database or your Python code into executing something malicious, notice what the library has done. These backslashes were not in the user's adversarial input. They were not in the previous red problematic examples. What the CS50 execute() method does for you, what the SQLAlchemy library does for you is looks at user input that's been plugged in for named parameters and says, whoa-ho, wait a minute. If there's a single quote in there, let me escape it with a backslash so that the only actual single quotes are the outermost ones. And only if the user's password is that thing there that's underlined will they actually get in. And most likely, it's not going to be something as crazy as that. But in light of all this, do you perhaps now appreciate what this particular person, perhaps with a little too much free time, was trying to do? So parked in a parking lot here is this fellow's plate here. And this person had taken the time to print out something a little curious. Let's enhance. What the heck is going on there? Well, it turns out that while we've been focusing on the logic of 1 equaling 1, the real takeaway of SQL injection attacks is that if you can somehow trick a database into executing a line of code that you have written-- previously, the only line was, quote, unquote, 1 equals 1 or what not. But suppose that that adversary-- suppose I had included a semi-colon in my username or a semi-colon in my password. And you, the programmer, naively trusted what I was typing in. And you simply executed whatever I typed into my username or my password field, allowing me to have a semi-colon in there. And that semi-colon, as you know, ends one SQL statement and begins a new one. Suppose, god forbid, like this person here tried to finish his or her license plate in a SQL-like way, with a quote here, and commas, whatever those mean, and a semi-colon, but then also included a valid SQL command like DROP DATABASE TABLE, which is the only thing I cautioned about, earlier, being especially bad-- this person was apparently trying to use those traffic cameras, which, he or she surmised, might have been using SQL as the back end and storing people's license plates in that back end and, correctly or incorrectly, was hoping that, upon a camera seeing this, using Optical Character Recognition, OCR, converting this into text, passing that text into a database that might not be scrubbing or sanitizing its inputs as we've proposed with CS50's library or SQLAlchemy-- was hoping-- that maybe that back end database was poorly implemented enough to trust what was passed in so that after logging this person for speeding or whatever, actually dropped the entire database, covering his or her tracks entirely. So all this and more is ahead of us as we continue to build, and build, and build on top of lessons past. And next week, when we introduce one final language, JavaScript, a language that you can not only use on the server side but also, and especially, on the client side to create all the more of an interactive experience and all the more of a compelling user experience for users using a bit of Python, and SQL, and HTML, and CSS, and soon, now, JavaScript, we'll see you then. [MUSIC PLAYING] SPEAKER 1: Rosebud-- yeah, actually, now that you mention it, one time, I went into his office to look for some forms. [KNOCKING ON DOOR] SPEAKER 2: Bud? David always said "pal."
B1 中級 美國腔 2016年CS50--第9周--SQL (CS50 2016 - Week 9 - SQL) 15 6 小克 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字