字幕列表 影片播放 列印英文字幕 DAVID MALAN: Hi, everyone. This is CS50's own David Malan and-- COLTON OGDEN: Colton Ogden. DAVID MALAN: So this is a new approach for us. We, of course, have lectures once a week here in Cambridge. And of course, we've done a number of live streams over the past few years. But we thought we'd try something new with our Twitch channel here. Colton is a big fan of Twitch. He's quite the gamer himself, and a programmer of games. And we thought we'd use this as a more interactive time to actually chat with some folks in the Twitch chat room, and actually walk through some code in real time-- live coding-- to get into more depth on some of the things that we might otherwise talk about in class, and then, ultimately, take people's questions and steer the conversation in any direction folks out there would like to go. COLTON OGDEN: Yeah. It's a very flexible format. Everybody who's in, feel free, definitely, to chime today and chat. If you aren't following the CS50 TV account, I believe there's a 10 minute waiting period. So definitely follow us so you can talk to us. But I see we do have one person, Srini Vasank says hello. So hello, Srini Vasank. Nice to see you here. DAVID MALAN: Hello. Nice to meet you as well. COLTON OGDEN: So yeah. I guess we can take a little transition here to your laptop where we have your code there. DAVID MALAN: Magic. Yep, so we're actually sitting here in Cambridge, Massachusetts on Harvard's campus with a magical green screen behind us. And so we're superimposing ourselves, the two humans, which is why you see this nice glow on both of us today. COLTON OGDEN: I wish I had a shot of just the green screen. All I have is this. DAVID MALAN: That's green, but you see it as gray, perhaps. And this allows us to superimpose ourselves over the code, like you've seen in some other channels, so that we can actually talk in greater proximity. COLTON OGDEN: We do it for lecture for the code it itself. DAVID MALAN: We do. We do this in Sanders Theater. But the green screen's way behind. And we use some camera magic to actually capture it across the room as well. COLTON OGDEN: Yeah. DAVID MALAN: All right, anymore hellos to tend to? COLTON OGDEN: Not just yet, no. Just Srini Vasank. We have 10 viewers currently active. DAVID MALAN: All right, wonderful. Well, so glad to have you here in class. It's like a CS50 section here on campus. What we thought we'd do today is-- our first trial of this is-- take a closer look at one of the topics we look at in CS50. Those of you have been following along for some time and are pretty much through half of the course or so might have implemented your own spell checker at some points. And for those unfamiliar, roughly halfway during CS50's semester here on campus and online, we challenged students to build their own spell checker, which is a piece of software that looks over all the words you've typed out in a file, and tells you, often with read underlines, which words you've misspelled, that aren't in a dictionary. And that's where the CS comes in. That dictionary is just a really big list of words, hopefully alphabetically sorted. And it's a non-trivial problem to actually look up every one of the words in your file against every word in the dictionary. If you recall from our discussion of asymptotic notation, that's like an n squared problem, if n is the number of words in each, just to simplify a bit. And that can actually take quite a bit of CPU cycles and memory. And so one of the challenges in CS50 is, do that efficiently. Minimize your use of memory. Minimize your use of CPU time, so to speak. And just write the fastest spell checker possible. Now, what language, of course, do we tend to write this in though? COLTON OGDEN: Well we start it in C, generally, for the course. Yeah. DAVID MALAN: And that's great. Because C is super low level. It's super fast. And really takes advantage of the hardware. But what are some of the prices you'd say you pay by writing stuff in C? COLTON OGDEN: Programmer time would be number one. Obviously looking for memory leaks, and seg faults, and all those sorts of things that trip up a lot of programmers. DAVID MALAN: Yeah, those of you programmed in C, maybe C++, have dealt with pointers, memory addresses, segmentation faults, buffer overflow exploits, and the like. All of that comes into play. And so why don't we start there? Why don't we actually take a look at what it is you have had to do, or if you're following along with CS50 for the first time, might have to do if you tackle this particular problem down the road. And then let's actually do it in a much more friendly environment of Python. But consider what prices we actually pay for doing that. So we have up here behind us dictionary.h. This is the so-called header file that we give to CS50 students. And in this header file, we define four functions. Do you want to summarize what functions they have to implement? COLTON OGDEN: Actually, offhand, I don't remember. I have to look at the source code. DAVID MALAN: Oh, that's cool. COLTON OGDEN: Sorry. DAVID MALAN: We have it right here if you'd like. OK. Well, we clearly prepped here. COLTON OGDEN: It looks like check. DAVID MALAN: Yeah, check is one of them, isn't it? COLTON OGDEN: Load size and unload. DAVID MALAN: Yeah, now fortunately, the file's completely commented. So we can follow along here. So these four functions characterize, if you will, the API for spellchecker. API is just application programming interface. And while this might sometimes mean some web based service or third party you're using, it can also refer to local software that you yourself are writing, or that someone else has started writing and you now need to finish. And indeed, the challenge for this problem is to flesh out that API and write the actual implementation thereof. All right, so in order of operation, not alphabetically, load is the first function that students in this class have to write. This is a function that takes as input in C-- it looks like "const char* dictionary". And how should folks think about what that means? COLTON OGDEN: Well, const meaning that it can't be mutated. So whatever data that they pass into this function should be not tampered with by the function. A char* is just a sequence of chars or a pointer to a char, which can be any number of chars after that. But it's a string, effectively. DAVID MALAN: Yep. I agree. COLTON OGDEN: Text data that's going to become their dictionary. DAVID MALAN: And then the name of this parameter, of course, is just dictionary. And so the idea is that this is the name or the path to the file containing all of those words that I mentioned earlier that represent the actual dictionary. And load's purpose in life, per the comment atop it, is to load dictionary into memory. And it's supposed to return a bool-- true if successful, else false. For those unfamiliar, we are using standard bool.h, which is a header file that you can include in C. That actually gives you access to two definitions of true and false as 1 and 0, respectively, effectively. So after load is called, once implemented, then it's the function called "check" that's supposed to get it called again, and again, and again for every word in the file you're actually spell checking. So it's that one that's going to be especially performant. Super fast, super minimal memory usage, hopefully, if you're really trying to optimize those two things. What does it appear that this takes as input to, if you want to tackle that one? COLTON OGDEN: The check function. So it looks like the same thing. So char*, meaning a pointer to a char. So a variable length string, or a sequence of chars. And then words. So it's basically going to be any size word. And we're going to, basically, check for the string header representing the signature, whether the word is actually in the dictionary. So you check our dictionary for this word and see whether it's actually inside that dictionary, that data structure. DAVID MALAN: Yeah. And so strictly speaking, you don't have to use const in either of these definitions. This is just a nice mechanism for really protecting yourself from yourself, lest you accidentally try changing word or try changing dictionary. The fact that you defensively-- or whoever wrote this file-- defensively put const there means, you just won't accidentally screw up, change the value, and introduce some bug that could be very easily avoidable by making it read only, as const effectively does. So lastly, there's two other functions in this file. Per the comments, those are size and unload. Unload is meant to do the opposite of load. So whatever work we end up doing in load is supposed to be undone, freed up in unload. And then size is supposed to just return the number of words in the dictionary. Now, maybe that's a linear operation. You just iterate over whatever your data structure is, counting up every one of the words and return that value. Or if you're smart about it, you could probably just keep like an int or a long around, and just, as you're loading words, keep track of how many words there are. And then just spit out in constant time that same value. So in C, this is a pain in the neck to actually implement this thing. You can implement it as a big array, a linked list to give you some dynamism and growth. You can implement it as a hash table, which is often implemented as an array with multiple linked lists. Or you can implement it using a try, an even fancier data structure, that just consumes lots of memory, but theoretically, is constant time in its operation, at least asymptotically. So we're not going to do that though in C. Indeed, that's one of the challenges in CS50 itself. And one of the great moments in a course like this is when you actually change context, and you switch from C-- a very low level language-- to something like Python, which is higher level, so to speak, that has lots more abstractions, lots more features, lots more capabilities built into the language itself. You can whittle down a 10, 20, 30, 40 hour project into truly just minutes if not seconds, once you're actually super learned in the language. Shall we dive into some actual live code? COLTON OGDEN: Sure. Before we do, Srini Vasank has a question for you. DAVID MALAN: Yeah. So pass by value can be used here instead of const. Short answer, no. You are already technically passing in the dictionary by its value. But the catch is that the value you're passing in is its address. And in so far as passing in an address allows you to dereference that address and go to that address in memory, means that it's potentially vulnerable to being changed. And so we're declaring const here for the very reason that we're technically passing by reference here, or by value. But that value is an address. And so this is why we have this defense mechanism in place. If we were instead passing in a specific char, or an int, or a float, or any other primitive type that doesn't involve memory addresses, then yes, pass by value avoids this. Because the worst you can do is change a copy of the argument. But in this case, by definition, we are passing in a string, which is indeed a char*, or address of a character, or a sequence of character. So we can't-- it's not as easily said as passing by value. COLTON OGDEN: Correct. DAVID MALAN: The values here are what are [INAUDIBLE].. COLTON OGDEN: I guess the equivalent would be maybe duplicating the data structure in the function. But then that would be like horribly inefficient for something massive, like a big dictionary file. DAVID MALAN: Yeah, you can totally copy the whole dictionary. But even then, you'd have this window of a few lines of code where you could still screw up and accidentally change things just by nature of making a mistake. COLTON OGDEN: Yeah, that's true. DAVID MALAN: Really good question. So keep the questions coming if you have any as we move ahead. But let me go ahead and open up a file. Let's call it dictionary.py, and actually implement this API, but using Python, and therefore Python syntax and the equivalent. And then if we have time, we can take a look at a new speller.py, which is a porting or translation of speller.c, which we haven't looked at yet, but CS50 students do in the class. That actually implements the spell checker itself. But the real intellectual work and the data structure stuff really takes place in dictionary.c, or today, dictionary.py. So let me go ahead and proactively save this as dictionary.py so we get our syntax highlighting. And let me propose that we just have place holders initially for those four functions. So again, we can't use dictionary.py with speller.c. And so this is just a porting of the whole project to another language. So in Python, you declare functions using slightly different syntax. In Python, you don't specify a return type. And you don't specify the types of your arguments to functions. But you do specify their names. And recall that one of those names was "check". It does take an argument, in this case, which we'll call "word", just as we did in C. But you have to define the function, which in Python uses the def keyword. And then of course, instead of currently braces in Python, if you're familiar, you instead indent-- instead of using those curly braces-- often with colons, implying that here comes some indentation. And I'm not ready to do this yet. So I'm just going to say "pass", literally passing on implementing the function. But we'll come back to that. But that will get the IDE here. We're using CS50 IDE, or cloud nine, to just stop yelling at us with some red marks. And quite welcome for the pass by value question there. So let's go ahead and do two others. Let's go ahead and define load. And load takes in a dictionary. Here, too, I'm just going to pass for now on actually implementing it. I'm going to go ahead and implement size, which takes no arguments. So you just specify open paren and closed paren. You don't need to say void as you do in C. Let's pass on implementing that for now. And then unload, which frankly, we're probably not going to need at all in Python, because memory is managed for you. You might have to allocate it, albeit implicitly. But you're not going to have to worry about managing it like in C. So there's the API ported to Python. And again, you'll notice no return types and no parameter types. Python's loosely typed. It still has type strings, and ints, and floats, and other values. But it doesn't actually require that the programmer, like us, actually specify those things. All right, so what do you think we should do to think about this? We have to implement a dictionary for a spellchecker. A dictionary, by definition, is a whole bunch of words that are passed in in a big text file. How would you go about thinking about how to begin solving this in Python? COLTON OGDEN: I think the first thing I would want to do is understand the dictionary file so that we can parse it and we can load it into the dictionary. DAVID MALAN: OK. All right, well so if we want to parse the dictionary file, it's nice that we're actually in Python because it's even easier in Python than in C, where you have to really think about how long your lines are and how many bytes of memory you might need. So let me go ahead and go up there. Let's implement load first, or at least the beginning of it. And if I want to open a file in Python, it's actually as simple as saying Open the file name-- dictionary being the file name-- and then you can specify, do you want to read the file, or maybe read and write the file? And so just as in C, with the F open function, if you recall, you can say quote unquote, "read", which just says, hey Python, open the file called "dictionary", or whatever that value actually is, and then read it for me. But this is going to return, essentially, a file handle to the file, so to speak-- kind of like a reference there, too. So I want to keep that around. And I'm just going to declare a variable on the left called "file". But I could call it anything I want-- F or whatever-- and assign that the return value of "open". Notice, I don't have to specify the type of file. It's going to return-- it's going to store some kind of file object. But I don't need to worry about that. And now in C, you would probably proceed to iterate using F read, or F scan F, F get as, or F get C, or any number of other file I/O reading functions. But in Python, there's these higher level abstractions, where if you know that text file just contains word after word after word, you can just say as much in Python by literally saying, you know what, for each line in the file, go ahead and do this. And so just as I kind of rattled that off verbally, so can you type it in Python. And it's a lot more English like, if you will, in that sense. Now, I don't really know what we want to do with this word yet. So let's just say for now, store word in some data structure. Because that's kind of an interesting design question to come back to. But once we're done doing that, we're just going to go ahead and call close on the file. Or rather-- I'm using C syntax already-- we're going to call file close. And here's where Python is also object oriented. The file reference that came back itself has functions or methods built into it. And if you look at the documentation, one of those is close. So file.close actually closes that specific file. And then after that, let's assume for the sake of discussion that everything went well. So I'm just going to go ahead and return true, here in this case. So returning true just signifies everything is good. I do seem to have an error here, but that's only because I haven't actually done something in this for loop. So let me also just say pass on implementing that for now. And that error should go away as well. COLTON OGDEN: There's an even cleaner way to do what you've done with the file, right? DAVID MALAN: Oh yeah? COLTON OGDEN: Using what's called a context manager. DAVID MALAN: All right, how do we do that? COLTON OGDEN: With the "with" keyword. So if you have the-- instead of needing to open a file handler with file-- well, you still need a file handler. But instead of having, basically, the dock close, you can enclose all of these operations that require this file within its own indented block using the "with" keyword. And this is essentially just what's called a contact manager. It manages the context of this file and the operations on it. So basically, it saves you a line of code. DAVID MALAN: Yeah. So great point to make. This is more Pythonic, if you will. Here on line 7 what I've done per Colton's suggestion is actually say, with the opening of this file in read only mode, call the return value "file". And then notice how I've indented by tabbing everything below that-- line 8, 9, and 10 at the moment, inside of that indentation-- thereby implying that this file variable is useful within every indented line underneath. And as Colton remarks, this notion of context is important. Because what's going to happen ultimately is that Python is going to close this file for me by the time we hit line 11. So I just don't need to think about it. So this is a very common approach just to clean up your thinking, focus on the work you actually care about, which is opening the file and getting at it. It's less interesting, intellectually, to close the file and, indeed, much better if the language itself deals with that overhead for you. COLTON OGDEN: Plenty of people forget to close their files in C as well. DAVID MALAN: Indeed. In fact, one of most common sources of memory leaks, where you allocate RAM and forget to give it back is, indeed, because you forgot to close some data structure, like a file. COLTON OGDEN: We have a few people in the chat. We have Trebinator who says, greetings from Germany. DAVID MALAN: Greetings from America. COLTON OGDEN: Amit [INAUDIBLE],, says, greetings from India. DAVID MALAN: Nice to meet you, Amit, as well from Cambridge. COLTON OGDEN: And then Srini Vasank has another that says, comment in line 9 is stores w. Comment in line 9. Store word in some data structure. DAVID MALAN: Oh yeah. So entire line, instead of one-- yes, it is. Well, it depends what I do with this. So I can save myself here. I haven's actually written this line of code, so I can claim it's going to do whatever I want. But yes, if we do no other action here in the commented line on line 9, yeah, we'd be storing the whole line. But if we assume, as we're allowed to in CS50 C version of the PSet, that the dictionary contains words, one word per line. Then, with a bit of fanciness, can I actually extract from that line the one, and only one, word that's on it, get rid of maybe any leading or trailing whitespace or messiness, and just load that word. And indeed, that's the to do on line 9 that remains. COLTON OGDEN: For context, do you happen to have the dictionary file available for us to look at? DAVID MALAN: Yeah, absolutely. If you want to take a look at the dictionary, let's go into another folder. And in that folder, we have a dictionaries folder. And I'm going to go ahead and open up a large dictionary here, which has got 140,000 words. Even the IDE here was struggling for a moment to open it. And here we go. So here are the lines in the file. "A" is the shortest English word I can think of that starts with A. Apparently "aaa" and "aaas" have meaning too. Aardvark-- COLTON OGDEN: So we've gone ahead-- we've lower cased all the words, in this case, in advance. DAVID MALAN: We have. And so that's kind of an assumption we're making. If we didn't want to make that assumption, that's fine. We can deal with it in code. But yes, that would have been one of the setups in the C version of the problem, which is that, yeah, it's all lowercase. COLTON OGDEN: We got [INAUDIBLE],, greetings from Italy. Hello, [INAUDIBLE]. DAVID MALAN: Buongiorno, [INAUDIBLE]. COLTON OGDEN: We have a SB Venner, greetings from London. Nice. DAVID MALAN: Hello. COLTON OGDEN: Good to see you. DAVID MALAN: Nice from the Cambridge here. COLTON OGDEN: Srini Vasank. Sorry if I am jumping. No, that's totally fine. I think it was a good thing to catch in the comment. DAVID MALAN: Yeah. No, that's fine. Call me on anything that we're missing here. So if you want to see all 140,000 words, we can scroll here for quite a while. You can see there's a lot of A words in English. And this is why it's actually important at the end of the day to think about how much memory you're using, how many CPU cycles you're using. Because at the end of the day, this is going to add up. And indeed, I got bored with scrolling. We haven't even gotten through all of the A words. So there's a lot of words there. COLTON OGDEN: And that's why we have a small and a large, I'm guessing, so that we have a dichotomy. We can performance measure between the small-- our algorithm operating on the small database versus the large database. DAVID MALAN: Yes. You do not want to try to debug a problem in your code with 140,000 inputs coming at you. Can you even imagine setting breakpoint in a debugger and walking through that? COLTON OGDEN: That would be terrible. DAVID MALAN: So we have small, which just has two words-- cat and caterpillar. So this is actually-- fun fact. So this is kind of curious that we have these two words. Why might you, in writing some test code, use two words like this, as opposed to cat and dog? COLTON OGDEN: I'm thinking from the perspective of the try, because they're both going to operate on the same nodes in the tree. Cat and then caterpillar-- caterpillar is kind of a superset of cat in that sense. That's my first inclination. DAVID MALAN: Yeah. No, and that was the motivation. Because you want to try to think, when testing your code, of potential corner cases. And it's definitely potentially worrisome if two of your words are so similar but different. So whether you're using a try or anything else, just choosing those kinds of corner cases-- one letter words, two letter words, substrings of other words-- it's a really good defensive mindset to go into debugging with. Fun fact two-- it was just a few years ago that I learned how to spell caterpillar. Because it occurred to me, since childhood, I could pronounce this word. I knew what a caterpillar was. And then the one time, off the cuff, I said, oh, why don't we spell check cater-- pill-- ar. I had to Google the darn word in the front of class. COLTON OGDEN: How did you think it was spelled? DAVID MALAN: A cata-- a cat-a-pillar? I don't know. COLTON OGDEN: Oh, I see. DAVID MALAN: You just take for granted for 30 years-- COLTON OGDEN: It's a weird word. I think there's other words like that too. DAVID MALAN: I was embarrassed. COLTON OGDEN: Srini Vasank says, sorry, I didn't realize that dictionary falcon needs only one word. DAVID MALAN: No need to apologize. We didn't tell you, so you wouldn't have known. No big deal. All right, so let's go back to the Python API that we've been in the midst of implementing here, and see if we can't flesh this out a little more. So I feel like the big to do is really this. We've kind of cut some corners, and we've only thought about-- we have a special guest here today-- but we've only really thought about what to do logically. But where are we going to tuck this data away? So how do you think about-- how should anyone think about actually storing these words, would you say? COLTON OGDEN: Well in CS50, I think normally, we use what's called a hash table. So some way for us to essentially map a word to some storage bucket somewhere, using a function that's optimized for even distribution of our words, so we don't have all of the same words in, basically, one long linked list. But I think we should show people maybe how to implement a hash table in Python. DAVID MALAN: Oh yeah. You want to do a hash table in Python? COLTON OGDEN: Yeah. Let's do a hash table in Python. DAVID MALAN: All rigth. So if you want to declare a hash table-- for instance, a variable called "hash table in Python"-- you could technically just do that. COLTON OGDEN: Oh, wow. DAVID MALAN: And then done. What would you like me to do next? COLTON OGDEN: OK, so now I think-- done. I think we're all set. DAVID MALAN: Kind of. But this actually does invite some interesting design questions. So this is indeed the definition of a hash table-- or in Python, it's called a dictionary or dict-- where you map keys to values. And that's the fundamental definition of a hash table. It's a mapping of keys to values. COLTON OGDEN: So cat equals 1, dog equals 2, for example. DAVID MALAN: It could be. Yeah, but you don't have to use numeric indices like that. It really suffices in a dictionary to say, yes or no, the word is in it. So you could probably just say, cat true, dog true, caterpillar true. But even that feels a little redundant. And so when it comes to designing a data structure like this, hash table might not necessarily need be what you want. You can actually use bunches of others. So you could actually get away with just using a list in Python, which is a dynamically realizable array or vector, if you're coming from the C++ world. That's nice too, because we'll automatically grow to fill the data. And you're also not unnecessarily storing true, true, true, true, or 1, 2, 3, 4. You're just storing the words themselves-- the keys. COLTON OGDEN: More efficient terms of memoy. DAVID MALAN: Yeah, I mean, you're saving some number of bytes. However Python implements true and false-- it's got to be non-zero, presumably. But would you caution against using just a list, would you say? COLTON OGDEN: I would, for access purposes. Because if you want to find out whether something's in a list, it's going to be an O of n operation. DAVID MALAN: Yeah. So a big O of n is a linear time operation. Because if that word is "zoo", and it's all the way at the end of your list, if it's alphabetized, then you've got to search all 140,000 words just looking for the darn word that you care about. COLTON OGDEN: We've got a couple of questions also. Trebinator-- are you doing PSet5 in Python? Yes. DAVID MALAN: Yes, essentially. And fun fact-- PSet5 just became PSet4 this year. So stay tuned for CS50x 2019. COLTON OGDEN: And do you want to read off SB Venner as the last question or two? DAVID MALAN: Yeah, sure. SB Venner asks, "what does the passed reserve keyword do in Python? I've not really seen that before. Steve." So it's actually just literally a pass. When you want to just have a placeholder line that does nothing, as I'm doing here sort of pedagogically, you can just say pass. That's a bit of a contrived use case. It really means, when you need a line of code, but you don't want to take any action there, you can use pass. So just to pull up a separate file here-- let me just create foo.py for the sake of discussion. This would not be the best design, but it's one possible use of this. If you wanted to say something like, if x is greater than y-- but in that case, you don't want to do anything about it, you could just say pass. Now, I say this is bad design because, frankly, if you just want to check a condition and then not do something, and instead do something only if the opposite is true-- like do this-- you can just flip the logic, frankly. And you can, instead, say, if x is less than or equal to y, then do this. And you can avoid the pass altogether. But I've been in the habit on some applications of actually using it in the context of exceptions. So we don't tend to talk about these in CS50, but you might in a higher level class, or in your own professional work. Often, it's the case in Python that you want to try to do something inside of which you might have some lines of code. Except, something bad might happen. And so you might say, except-- exception as e, for exception. But if you want to do something with the exception, you might not want, sometimes, to do something with the exception. But you do want to catch it and then ignore it. Because you know more than the computer. And you know that sometimes an exception might happen. But if you decide, not a big deal, you can literally just say pass, effectively ignoring the exception, but still handling it so the whole program doesn't abort. So I would say that's a more compelling use of pass that's not pedagogical, but it is functional. COLTON OGDEN: Essentially, because in Python, if you try to do a function definition without a body, it will just give you a compiler error. DAVID MALAN: Yeah. Yeah, indeed. So if you actually want to have the equivalent of abstract methods inside of a class that you then override in a child's class, you might just say pass, because you don't want the function to have to do anything by default. But it does need to be well-defined grammatically in the language. That's a really good question. Keep them coming. Hope that helps, Steve. All right, so if we turn our attention back to dictionary.py, I would propose we could take this one step further. Python has a whole buffet, if you will, of data structure. It's not just lists, not just hash tables, but it also has one that we introduced in class, albeit briefly. It actually has what are called sets. You might not have thought about these recently. But mathematically, a set is a collection of numbers, typically. But a key detail is that the set does not contain duplicates, which is great because your dictionary probably shouldn't contain duplicates. The word is either a word or it's not. And so a set is kind of nice because it maintains that property. But more importantly, if you read the documentation for Python, set is more performant than list. It will give you an approximation of constant time access. So it's probably implemented with a hash table underneath the hood, much like a dictionary. But it does have this property where you can just think about membership therein, not an association of keys with values. COLTON OGDEN: It's kind of like the hash table, but rather than needing to store that true every time, which like you said, probably amounts to some non-zero amount of bytes, we just get all of that sorted for free with the same functionality of the membership check. DAVID MALAN: Yeah, indeed. And so a set is what you would often call an abstract data type. But you don't necessarily know or have to care how it's implemented underneath the hood. All you care about is its properties-- the fact that you can add to it, remove from it, and get an approximately constant time access to it. That's what you care about. But underneath the hood, it probably is implemented with some kind of tree, or a hash table, or anything. That's not for us to worry about in a language like Python. So let's call this actually what it is. This is our dictionary of words. So let's just say-- we've got a variable called "words", and it's initialized to be just an empty set, containing no words initially, until we're ready to put them into the file. COLTON OGDEN: We've got a question from Andre. Andre-- we see Andre on the Facebook group all the time, actually. DAVID MALAN: Indeed. Hello, Andre. Nice to see you here on Twitch. COLTON OGDEN: "Would checking access aligned bounding box collisions be a good use case for pass, since you are typically testing whether for cases are not true?" In that case, you still would check all four conditions. So I don't think you would, because you want to make sure all of them aren't true. Because that's just part of how that algorithm works. So yeah, I don't think pass, in that particular instance, would be appropriate. DAVID MALAN: Yeah, you can still flip the logic even if syntactically it starts to get a little messy or sloppy. Generally, in any language, having just empty code blocks where you take no action or explicitly say pass, it usually suggests that better design is possible. COLTON OGDEN: Yeah, I usually see it as a non-implemented sort of thing like a to do. DAVID MALAN: It's like in the real world, at least in the US and in English, you sometimes have silly business or legal documents that say "this page intentionally left blank". I mean, everyone kind of rolls their eyes at that, because well, this is kind of silly. And I think the same idea there. You don't want to just have place holders for code. You want to get the logic right from the get go. COLTON OGDEN: Good question. DAVID MALAN: Yeah, nice to see you here on another venue. All right, so this line 9 still kind of remains. Store line in some data structure. So honestly, this is pretty straight forward in Python. If I have this data structure called "words", and I just want to go ahead and add to it, I can literally just add the line to the set. Now, this isn't quite right. You notice this earlier-- I think one of our-- COLTON OGDEN: I think Srini Vasank. DAVID MALAN: Yes-- noted earlier that, doesn't it contain a full line? And it does. But if we know that the dictionary contains one word per line, but every line probably ends with a backslash n, or CRLF, or the like, we're going to want to strip that off. And if you read the CS50 specification, you'll know that, indeed, it ends with backslash n, or a new line, or a line feed. So what's nice in Python is, you can actually strip this off pretty nicely. You can say, reverse strip-- rstrip a specific character or characters from the end of the string, hence a write strip from the end. And that's actually going to strip off one or more new lines from the end of the string, thereby giving us, effectively, the whole word from the file. COLTON OGDEN: Nice. Clean. DAVID MALAN: And that's it. Now, even though Colton and I have been talking for quite a few minutes here, at the end of the day, we wrote what, six or so lines of code to get done much of a CS50 problem, namely the load function here. So let's take a bit of a breather. And let's go ahead and just implement size. This function, according to the problem statement, is return the number of words in the dictionary file, which is now loaded into memory, presumably. How would you go about thinking about this? COLTON OGDEN: Well, one approach might be to use some sort of loop to basically go over our data structure and say, for-- in Python, for example-- for something in our collection-- for word in words. And then just increment the counter that you've declared-- size. So size plus equals 1. And then yeah. Just returned the size. DAVID MALAN: OK. All right, so that's pretty straightforward. It feels like, if words is a data structure, it probably supports the notion of length. Could we whittle this down into something more Pythonic? COLTON OGDEN: Yeah, I would say most abstract data types probably do have some sort of length function, or length field on them. So I believe you can just call len on set-- the length function, which is a function that works on pretty much any core Python data structure. And that should give you the number of words. It should be stored internally. DAVID MALAN: Indeed. This is a nice generalization. It's a little inconsistent across Python that you don't call words.lang, but this is essentially an API in Python. leng, or length, is a function that works across various different data structures and data types and can even work across your own customized ones that you yourself invent, so long as your data structure implements itself a magical method called "leng" itself, which is built into the object. This len function will call that length method, thereby making sure that you can get the length of any data structure in a standard way, in this case. So actually, just letting the data structure, the set itself, tell us what's your length, it could probably now be done in constant time, instead of linear. And again, that's the goal-- speed everything up today. Well, let's do a spoiler here with unload in Python. Even though in C, you would have to allocate all of this memory-- maybe it's on the stack, or the heap, or wherever-- once you've decided that, you have to actually free that memory. In Python, there's really no work to be done here. So I'm just going to go ahead and boldly say "done". COLTON OGDEN: Return true. DAVID MALAN: Yeah. COLTON OGDEN: What's the mechanism called that does that for us? DAVID MALAN: So it would be called "garbage collection". And phew, I wasn't sure if what you were fishing for there. It'd be called "garbage collection", which is kind of a cool way of describing a process that happens behind the scenes. Whenever that program isn't really doing much else, or has a breather, or just needs to collect garbage, it will free up any memory that you are known not to be using anymore, so that it can be used for other purposes in your program or somewhere else. Yeah. COLTON OGDEN: Languages like Java, and Ruby, and such also include-- typically interpreted languages, and some compiled languages have this feature. DAVID MALAN: Gotcha. COLTON OGDEN: C definitely does not have this feature. DAVID MALAN: No. No, and that's what makes it all the more painful sometimes to use. And Srini, you got it. Thank you for the summary there to everyone. A few people have some long usernames here today. So we're kind of left with just one function that we passed on earlier, literally. That is the check. So check, recall, is going to be the function that we're going to call again, and again, and again to spell check any of the words that are actually passed in. So let's consider how we can go implement this up top. We know that the data structure in question is called "words". It's a set. And it turns out, it's pretty easy to just ask the question, is this string in this set? You can literally just say something like, if word in set, then go ahead and return true. Or if that's not the case, with your colons here, you can go ahead and return false. But in Python, anytime you find yourself in the habit of very verbosely saying, if this, then return true, else if that, return false, you can probably whittle it down even more cleanly. And indeed, you can actually do this in C. Though we tend not to do it in the earliest weeks of CS50. I could actually just be a little more elegant here, and just say, what-- return word in words. And that in itself is a Boolean expression that will return true or false. I might want to do one enhancement here. And you would only know this from reading the problem statement. Even though the dictionary that we were given, with 140,000 words is sorted in all lowercase, is in the problem statement, the file your spellchecking might not be. Because that could be any file that some human typed out. So we should probably, if we're looking for words in a lowercase dictionary, we should probably force every word that we're being passed to lowercase, just to make sure that we're comparing apples and apples, so to speak. COLTON OGDEN: Good point. DAVID MALAN: So any questions, then, on how we've tackled this? Believe it or not, in 17 lines or fewer, can you implement a spellchecker in Python that conforms to CS50's own API, which takes many more lines and, undoubtedly, many, many, many more hours, if not tears in C to implement. COLTON OGDEN: Sure. Maybe we have a couple of minutes for questions. In the meantime, when was the first year that we had this PSet? Do you remember? DAVID MALAN: I do remember. In CS50 here at Harvard, 2007. And AI has yet to replace a spellchecker. So we haven't had to replace it yet. COLTON OGDEN: Did you come up with this from scratch? Or was this an inherited PSet? DAVID MALAN: I did come up with this from scratch, I believe. It's a while ago now. And in fact, I think I first wrote this PSet, or problem set, even years prior. Let's roll back to 2002 or 2003. i was teaching at Tufts University, another school up the road, in their computer science department. And I was making all new homework assignments for that class. And I think, unless I'm rewriting history here, I think I did write that from scratch way back when. And it's evolved a little bit over time. And in fact, it started off as a C++ problem set, if this was true. That, or I'm completely making all of this up, and in 2007 did I first write this PSet. COLTON OGDEN: Was it because you were having issues with Word's spellchecker? DAVID MALAN: [INAUDIBLE] No, I mean, we had Microsoft Word in my day. [INTERPOSING VOICES] You can actually just ask the software to do this. COLTON OGDEN: ZAlien's got a couple of kappas for us. Hey, ZAlien, good to see you. DAVID MALAN: Thank you very much. COLTON OGDEN: Trebinator, "love your course and self teaching. [INAUDIBLE] DAVID MALAN: Oh, thank you. Oh, PSet5 and C-- I'm sorry we showed you the answer in Python. Otherwise-- [INTERPOSING VOICES] dictionary.h, the file we started with, is actually given to students. So that's not even a spoiler there. So fun fact-- when I was a kid, Microsoft Word came on four floppy disks. COLTON OGDEN: Oh man. DAVID MALAN: And it had all the features I actually needed or cared about back in, what, 1995 give or take. Microsoft Word 5.1A. COLTON OGDEN: How much smaller was the dictionary back then? DAVID MALAN: We had as many words, I think, in English. Although, you hear about more and more stupid words coming out every year that are apparently immortalized in the dictionary. COLTON OGDEN: I can't remember any from this year. DAVID MALAN: I don't know. You always roll your eyes. COLTON OGDEN: You have the floppy Word in your office at 125, don't you? DAVID MALAN: Oh, maybe. COLTON OGDEN: Or Excel. No, you have the Excel floppy disk. DAVID MALAN: Oh, possibly. Yeah. It was cool. Actually, a couple of years ago now, CS50's video team went on the road to visit our friends at Microsoft. So I probably shouldn't have said ill things about Microsoft Word just now. And they have a wonderful archive. And we got to explore some of the boxes and shelves of old stuff. And they just had all of the oldest software right there on floppy disks. We have some fun photos of picking up entire boxes of really, really heavy products. But the stupid thing is, the heavy products were not like the actual software. It was the damn instruction manuals. They used to come with dictionary sized user manuals that we held up above our head. I don't think we can find the photo on such quick demand. Maybe? Can we do that? All right. CS50's own Dan Coffey here is suggesting that we go to our own photo website here. So let's see if we can't dig up a little memory from yesteryear. If I go here, I think we can probably search. COLTON OGDEN: Srini is asking, is that a 3.5 inch floppy disk? Oh, did you already answer that question? DAVID MALAN: Hey, look at this. Hey, look at that search. That's a good dictionary we've got here. COLTON OGDEN: Wow. DAVID MALAN: So this is me looking very surprised at how heavy Microsoft Office was a long time ago. COLTON OGDEN: That's really how big the manuals were? Are those the manuals or the full software? DAVID MALAN: This is mostly the manual. So let's enhance here. And yeah, there were a few floppy disks or maybe CDs by then in there. But those are mostly manuals. That's not even the biggest one. Here we go. This is Visual C++. What the hell? That's the compiler. Wow, that's me a long time ago. That's a compiler and the manuals for it. Slightly creepily, too, is that photo. But also, Microsoft apparently-- what are those called? Teletubbies? COLTON OGDEN: Teletubbies, yeah. DAVID MALAN: Teletubbies used to be powered by Microsoft. COLTON OGDEN: Microsoft did-- I had no idea. DAVID MALAN: Yeah. COLTON OGDEN: Windows 3.1 was distributed on 8 3.5 inch floppies, says Andre. I believe it, 100%. DAVID MALAN: Indeed. Oh, and it's apropos here. Off camera is our friend Dan Coffey, who is right there on camera with that. And then of course, it wouldn't be CS50 without our friend Natasha Chornesky, Nacho, who's at Microsoft with other colleagues of hers as well. And there she is, CS50 proud, having taken CS50 herself. COLTON OGDEN: Do you remember per Srini's comment whether it was 3.5 or 5.5 inch floppies? DAVID MALAN: Let's see, for Office? In my day, the version we had but on Macs was 3.5. But I don't think Macs ever had 5.25 disks. But Windows might have, certainly. If there was Windows. I mean, DOS and Windows 3.1 and onward might have. COLTON OGDEN: Yeah. I remember having a Windows 95 machine with floppies on it. But it's been a long time. DAVID MALAN: Yeah. Hey, we're all sounding really old right now. So if there's any you kids tuning in wondering what the hell this channel just became-- COLTON OGDEN: I think we should have prefaced it with a picture of what a floppy disk is. DAVID MALAN: That is true. So here. Let's do that. What's a floppy disk? Floppy disk. They came in different shapes and forms. They're mostly died out now. COLTON OGDEN: You used to have a really good presentation in CS50 where you would have people rip open a floppy disk. You still have those in storage, by the way. DAVID MALAN: We do still have some floppy disks in stock. Yeah. Fun fact-- it actually got really expensive to do. Because the price of storage and media just went down, and down, and down, and down. So for a few pennies, we could give 1,000 students in the room their own floppy disk, and then tear it apart in class. But then once nobody started using them, you could only get them on eBay, not even Staples. And so then it's $1 each, and that makes for a really expensive demo. COLTON OGDEN: I remember when we bulk ordered those. There were thousands of them. DAVID MALAN: Yeah. So this is a somewhat hard floppy disk. Inside is the actual floppy material. And then as Srini was referring, we had 5 and 1/4 inch disks, and even bigger ones back in the day-- small, medium, and large, if you will. And I started off life, I think, with the middle disks there. But fun fact two-- if we enhance this, these 5 and 1/4 inch floppy disks, which were indeed flexible, came with a little divot out of the corner, which was to help align them so you would know which is on the left and which is on the right, so you don't put the disk in upside down. But the catch is that the materials inside the disk are actually perfectly symmetric. And so, if you actually take a scissors or a special $5 device, you could snip a hole in the floppy disk exactly symmetrically on the other side, thereby tricking the floppy disk drive to write bits onto the other side of the actual floppy disk inside. Now I'm sure if I had any notion of electrical engineering at the time, I would know that this just creates interference and bad things. But to a kid trying to double his storage space, it doubled my storage space. So all of us used to do this and make your floppy disks double sided. COLTON OGDEN: That's a TIO for me. I didn't know you could do that. DAVID MALAN: Yeah, well it probably wasn't recommended. The better practice would be just buy more floppy disks. But that's what we did back in the day. COLTON OGDEN: Yeah. That's cool. That's a good piece of history. DAVID MALAN: Yeah. So any questions from the audience there? Anything we can pluck off, either on C, Python, or anything in between? COLTON OGDEN: Yeah, if you guys have any sort of small exercises or examples you want us to implement in Python, let us know. You said potentially resize as well? Is the one that you have? I can print that for you. DAVID MALAN: We do have resize. But I think we also have the speller, which we can go from C to Python. COLTON OGDEN: Oh, that's true. Yeah, we have that other spelling. Yeah, we could do that if folks are interested. We should let people decide what they want. My desk still has a few of those [INAUDIBLE].. Do you folks prefer us to do resize or to take a closer look at speller in C maybe? A good chance for some feedback. DAVID MALAN: I do think speller in C is nice. Because we could literally do it side by side. COLTON OGDEN: That's true. That's a good point. DAVID MALAN: Let's see how the lines line up. COLTON OGDEN: That's a good point. DAVID MALAN: All right. So let's do that. COLTON OGDEN: Let's start on that. DAVID MALAN: All right. So let's go ahead and let me unenhance-- close a couple of our distractions there. And let me go into my same folder as before. Let me go ahead and open up speller.c, which is indeed the version of this program that students are given. So I'm afraid, too, this won't help you out, Trebinator, with PSet5, since you two are given this distribution code. But what's nice about the IDE here is we can split our windows and actually do some side by side coding, and essentially comport or translate C from Python from left to right here. COLTON OGDEN: We could also do this magic. And now we're out of the scene. Look at that. DAVID MALAN: Oh, nice. We can even get rid of you and me. Nice. All right. So here we go. Let me go ahead and create a new file. We'll call this speller.py. And essentially, we're going to try to recreate this file over at the right hand side. So we'll do some simple, some more sophisticated things for a moment. Keep watching, Trebinator. We won't show you dictionary.c. That's where the real magic is. COLTON OGDEN: Actually, do you want me to maybe make us smaller and put us in the middle so we can still see-- DAVID MALAN: Yeah, can we unenhance us? COLTON OGDEN: Yeah, let's do that. DAVID MALAN: Goodbye. Oh, there we go. COLTON OGDEN: There we go. DAVID MALAN: Oh, we could do a little Mac versus PC, like C versus Python-- [INTERPOSING VOICES] COLTON OGDEN: I think I should get on the other side then. DAVID MALAN: Yeah, this is kind of backwards. But the keyboards-- that's OK. I'll look this way, you look this way. And we'll focus on our own code. All right, so one thing to note-- first of all, in Python, there's different ways to write comments. And so for multi-line comments like this, how would you go about writing them in your file? COLTON OGDEN: In C or-- oh, in Python. Python has a cool syntax-- I mean, one way you could do it is just preface every single line with the hash mark. DAVID MALAN: So just do every single line like this. COLTON OGDEN: Which would get very tedious. DAVID MALAN: Yeah, for sure. COLTON OGDEN: Python has another way you can do it using either a triple full quotes or single quotes, which will allow you to do the same thing, essentially, as this, except on the left. But you don't need to have the star on each line. You can just write strings, literally, as you want. DAVID MALAN: Yeah, So problem set 5 implements a spellchecker. And then down here, we can close that thought using double or single quotes, so long as we're actually consistent across the two. And now those are our comments. And you can see it's nicely highlighted. Its color coded a little differently, just because the IDE treats comments in C and Python a little differently. But same exact idea. COLTON OGDEN: I think Srini had the right idea. He had the triple hash mark. But just off on just the quote being the key there. DAVID MALAN: Indeed. You definitely want to use the single or the double quotes there and then be consistent throughout. So on the left in C, we had a whole bunch of header files that we included-- one of which was our own, dictionary.h-- and then a bunch of which were some lower level things really used for instrumentation. No problem at all. And the comparables that we'll need are a little different. So there's not necessarily a one-to-one mapping left to right. And in fact, rather than do these preemptively, we'll come back to some of our imports. And let's just trip over the libraries we need once we actually get to them. Otherwise, it's magical just by knowing where we're going with this. But I will say this. We do know that there's a file called dictionary.py from earlier. So I can immediately say, you know what, from dictionary, go ahead and import those functions-- check, load, size and unload. Those are the four functions we did implement earlier. And this way, we can just keep the logic of our spellchecker separate from the implementation details of the dictionary. We could even swap in and out different dictionaries if we want, and therefore implement the API in different ways. COLTON OGDEN: Basically how you would port a header file-- the idea of a header file from C into Python. DAVID MALAN: Yeah, exactly. Now, I've seen in some code-- and frankly, some of my own code-- this approach, what would this do? Do you know? COLTON OGDEN: So from dictionary import * means-- * being sort of the glob or a catch all operator in Unix and other systems-- basically means everything. So from dictionary-- from the dictionary.py file, just import everything and pretend like I wrote this in my own file. So I'll have access to check. I'll have access to unload. I'll have access to size. All those different ones. DAVID MALAN: Yeah. And that otherwise seems pretty handy. Because then I don't have to enumerate all four of those functions. But the catch is, you can potentially scoop in more than you intend. So for instance, we had a global variable in that file called "words", which was that set. And by scooping it up with *, we're accidentally introducing to this namespace, so to speak, speller.py, that same variable, which means I could somehow accidentally screw up and mutate or change that variable, when it really is meant to be used only by the dictionary. Now, we can defend against that in a bunch of ways. We can introduce classes in dictionary.py. We can introduce an actual module and kind of separate that out. But in general, just because * is convenient, does not mean it's good design. And in fact, it would be more conventional and proper to enumerate, even tediously, the functions that you actually care about so that you're importing only what it is you care about, and nothing extra-- especially if someone else goes and changes the implementation of the dictionary, and all of a sudden your variables are colliding. COLTON OGDEN: Exactly. DAVID MALAN: All right. So I've imported the dictionary. Let's go ahead now and start translating some of the code. And for the most part, it will line up, but not necessarily one for one. So we can point out some of the differences as we go. And we might even disagree how we might implement things. But we'll try to do it line for line here. So over here on line 20, in our C version, was the definition of a constant, as it would be called. It's, by convention, in all caps. And then this just hardcodes the path to that big file with 140,000 words. What would you propose be the equivalent in Python here? COLTON OGDEN: Well, Python doesn't really have the idea, or at least the hardcoded idea of constant variables, unfortunately. So what I would probably just do is something like capital dictionary, just like you have right there, gets-- then the string, "dictionaries/large". So almost the same thing, just without the define being necessary. DAVID MALAN: So if you can capitalize the word, though, is this not now just a constant? COLTON OGDEN: Not in functionality, but-- not ad hoc-- but implicitly a constant. Other programmers reading this are going to see the fact that dictionary is all capitalized, and just by convention across multiple programming languages, all capitalized variables with underscores tend to be constants, even if that idea is not enforced by the compiler interpreter. DAVID MALAN: Yeah, and this is just kind of a language decision. The reality is that different communities and different languages-- authors disagree. And the mentality with Python is, well yeah, you could change that value, but just don't, and be a little more mature about it. Now, it goes both ways. I mean, that's fine to be said. And OK, I promise not to touch that variable. But when programs get big and complicated, you might forget which is yours, which is someone else's, even though the capitalization is supposed to help. So a lot of languages-- C among them, Java, C++-- really try to protect you from yourself, albeit at a price of tedium, when you have to write even more code or more thinking just to get the same kind of work done. COLTON OGDEN: Exactly. DAVID MALAN: All right, so that's a pretty straightforward translation of that line. Below that is a prototype, which, in C, is a proactive declaration of a function's name, return value, and arguments-- at least the types and order thereof. There isn't really a comparable in Python. So we'll skip over that. And in Python, there isn't necessarily a main function. You can absolutely have one. But we won't bother because there's actually going to be no other functions in this file. So we'll keep it simple. So let's focus on argc here. In C, argc is the argument count-- the number of arguments that have been passed in. But in Python, you have to do this a little differently. We need a module, right? COLTON OGDEN: We do, yeah. Because Python-- C gives you argc and argv for free as part of its implementation. But it's not something that's necessarily part of Python's implementation. Real quick, though, Srini has a question. "All caps mean generally object or class?" Sorry, I'm not sure. DAVID MALAN: Yeah, good question. COLTON OGDEN: [INAUDIBLE] DAVID MALAN: Oh, you're testing me again. All caps generally means, either you're yelling in an email, or a text message or whatever-- sorry for those of you wearing headphones, actually-- or it means a constant, or a social agreement that this should be a constant value. When only the first letter is capitalized, or you have a multi-word variable name or a symbol that has capitalization throughout, but not the whole thing, that typically signifies, yes, a class. And then, if you use more traditional camel case, like lowercase letters to begin, or more conventionally, underscore variable names, otherwise known as snake case, then that's just a variable, typically. COLTON OGDEN: Snake case because Python? DAVID MALAN: No, I think name comes from back in the day. Because it looks like a snake. It's just your long string of words with the underscore. COLTON OGDEN: Oh, interesting. DAVID MALAN: Hey, you learned something new here on CS50 live. COLTON OGDEN: It's interesting how coincidental that is, though. DAVID MALAN: Snake case. Why? COLTON OGDEN: Because Python. DAVID MALAN: Oh, interesting. No, I'm pretty sure it predates it. But I'm sure Wikipedia could answer this too. COLTON OGDEN: That's fascinating. DAVID MALAN: We'll check, and next week we'll let you know. So to gain access to our command line arguments in Python, we need to import the system module, which actually gives us access to that and a bunch more features. Or frankly, if we want to be a little more precise and make it look a little more like C, we can just say, from the sys library, import argv, or argument vector, which is a subset of the functionality therein. And so down here, if I actually want to check did the user run this program with the right number of command line arguments, I can actually say something like, well, if the length of argv does not equal 2, and length of argv does not equal 3, then I can go ahead and exit and complain. So I can actually do this in a couple different ways. In Python, you generally use print instead of print def. So you would say "usage: speller" and then, if I mimic exactly what we have in C, I would do something like this. This is going to be the name of the program. If I actually add a shebang and put it in a file called "speller", or install it via package, dictionary is in square brackets because it's meant to be optional. That's the file that you want to load optionally. And then text is the file name that you actually want to spellcheck. And then I can go ahead just like in C, exit with a certain value. But for this, I need to do exit 1, which if I want to return a 1 status code, I would do that. But honestly, you can actually do this a little more simply. You can get rid of all that. And you can just exit with a string. And if you read the documentation for Python, when you do exit with a string, it's going to return 1 for you. But it's also going to print out that value. Now Python-- or the IDE here isn't liking this. Because I've done something wrong. Oh, I'm taking C too literally. You have to literally say "and"-- professionals here-- and change it back to some human English. COLTON OGDEN: And so, because dictionary is optional, that means that we're going to essentially use that hardcoded as a default, I'm guessing? DAVID MALAN: Exactly. That's why we have the line 14, a constant that we'll use as a default. Why? Just because we decided that we don't want to force the student to have to specify a dictionary every time they run their program. Instead, we want them to have some default built in. COLTON OGDEN: And we saw len before-- the len function-- when we used a set. And now we're using it on argv. So what is argv? What is the data structure that argv is, if it's following the same model of data structures being able to use the length function? DAVID MALAN: Yeah, in this case, it happens to be a list. And this is a list of the command line arguments that the human is presumed to have typed in from index location 0 on up. And so, leng just tells us the length of that list, just like the leng in the context of sets gives us the whole size of the set itself. COLTON OGDEN: OK, awesome. DAVID MALAN: All right. So we're already trimming away some lines. We're on line 17 instead of 32. So it's a little more compact. And frankly, a lot of our lines were comments anyway. So let's proceed here. Now, in C, we had to do something a little different using certain structs, or data structures called rusage structs, before and after. We don't strictly need those in Python. Because we can actually time our program a little more cleanly. But more on that in just a moment. But you'll notice that in C, we did define a few variables-- time_load, time_check, time_size, and time_unload-- all of which seem to be floats, because they have decimal points in them. And they're all initialized to 0. I did this on one line on the left hand side, just because you could. You don't strictly have to do that. You could have four separate lines or variables. But I can do roughly the same here. I can say time_load, time_check, time_size, and time_unload equals 0.0, 0.0, 0.0, 0.0, effectively inducing what's called a tuple on the left hand side and the right hand side. A tuple is a collection of one or more values, often written in parentheses, but not necessarily so. But if I want to be really explicit as to what I just did, it's kind of like doing this, which is a nice one liner of doing four things on the right to four things on the left. And in this case, the assignment is what you're actually doing here. COLTON OGDEN: And we can call this tuple unpacking, correct? DAVID MALAN: Yeah, indeed. Because you're going from a tuple with four elements in it to a tuple with four elements in it. But you're really doing a one-to-one mapping between all of the tuple elements. And so the parentheses are not strictly required. The commas imply the presence thereof. All right, so what else? Let's go ahead and actually initialize. Let me bring us back to the middle here. Here we go. Let's put this wall between us again. There we go. All right, so now if we want to go ahead and initialize this variable, dictionary, to be either argv1, which is the name of the dictionary the human maybe typed in, or the default. We can do this with the ternary operator here in Python as well. Let me go ahead here and first, let's see what Python's-- oh, accidental breakpoint. No complaints at all. Let me go ahead and just initialize a dictionary variable, no type needed, to either argv1, if the length of argv equals 3, else initialize it to dictionary, just like that there. So you'll see that this is a little different from C syntax in that I'm using if and else there. But this is a nice one liner, a ternary operator effectively, that's assigning dictionary to argv1 if, literally, the length argv is 3, else it's assigning it to dictionary. COLTON OGDEN: So dictionary, lowercase now, is our actual 100% dictionary. And then capital dictionary is our ostensible dictionary when we start off, if we haven't actually input a dictionary into the command line when they ran the program. DAVID MALAN: Exactly. Yep. And so we have here another example of Python code that kind of reads like English. On line 21 here on the right, dictionary equals argv1, if the length of argv is 3, else dictionary is what it equals. So it reads a little more naturally than the more arcane C syntax for ternary operator. All right, let's just do a few initializations. Again, this one doesn't map perfectly to Python, because get rusage is very much a C thing. But the closest I could come up with was to declare a few variables like this-- the time involved before we're going to run some code. Then we're going to go ahead here and let's get a variable called "loaded" and call the load function in our actual file. Then let's go ahead and get the time after that by using process_time again, which is the same function built into the time module like this. And then, just as a sanity check here, let's ask the same question. Well, wait a minute. If the dictionary was not loaded per its return value there, let's go ahead and bail out and say something like, exit could not load dictionary. You'll notice, though, we might want to substitute in a full sentence there. So I can actually paste in the variable's value, so long as I make this a format string, otherwise known as an F string. Now the IDE is yelling at us here. Do you have a sense of why this red X appeared? COLTON OGDEN: OK, so before time.process_time-- so before-- I'm a little confused as to what you're doing there. DAVID MALAN: All right, well let me explain to you a la my rubber duck here. So what I'm trying to do is just figure out what is the current time of day, if you will, right now. Then I want to call my load function. Then I want to ask another question-- what is the time of day now? Because you can infer from that, presumably, if you subtract after minus before, how many seconds, milliseconds, whatever, transpires. COLTON OGDEN: It's basically like having a stopwatch. Time functions like a stopwatch-- start and then stop. DAVID MALAN: Exactly. COLTON OGDEN: But I notice you're still getting the error though, even though you put the equals sign there. So the IDE is helpful. It's giving us some information-- undefined variable time. And so time, presumably, is a Python module just like sys. So it looks like we just haven't included that-- [INTERPOSING VOICES] DAVID MALAN: We have to import time, which is kind of a magical concept. But yeah. And that's it. Import time. COLTON OGDEN: What was the-- there's a fun little thing you can do in Python. It's like import something, and it redirects you to a web page or something, right? What is that? DAVID MALAN: Oh, I don't know. But you can import functions from the future, which is cool. COLTON OGDEN: Python-- hold on. Python funny import. What is it? What is it? What is it? What is it? Import this, I think. DAVID MALAN: Import this? What is that? What do you mean? COLTON OGDEN: I think it does-- let me look it up before we do it on screen. Python import this-- foo.py. [INAUDIBLE] wants me to do what? COLTON OGDEN: Import this. DAVID MALAN: Import this. COLTON OGDEN: You can do it in interpreter too. And I think that will be-- if you can open interpreter. DAVID MALAN: All right, slight digression here. Let's open a terminal window here. Let's run Python 3, the latest version. Let's run a line of code with dash-C and just say import this. COLTON OGDEN: And this is "The Zen of Python" by Tim Peters. DAVID MALAN: That's a wonderful Easter egg. Mind you, on the second screen, off-camera, we are googling funny Python imports. COLTON OGDEN: I remembered that-- I didn't remember what it was. Oh, we missed a few comments or things in the chat here. So argv[1] gets the second argument-- DAVID MALAN: Hey, wait, wait. I want to import antigravity first. That's do important things first, if we could. COLTON OGDEN: I missed that one. I missed that one. DAVID MALAN: This is trying to make a connection somewhere. COLTON OGDEN: This is the one that does the web page. Here, we can-- do we have a computer on [INAUDIBLE] DAVID MALAN: I can do it on my Mac here, probably. All right. Here, Andre, here we go. Here we go-- Python 3-3, import anti-gravity. COLTON OGDEN: Nice. DAVID MALAN: That's amazing that this is built into an actual language in 2018. I had never actually seen that. That's pretty beautiful. COLTON OGDEN: I think I saw it a couple of years ago. But I forgot what it was. DAVID MALAN: OK, now Serenity is being a very good student here and is following along. Yes, time was missing. So, Serenity, you are absolutely correct. Apologies for the digression here. We got a little carried away with xkcd. But, yes, import time was missing. So replacing that does actually fix that problem. And then, let's see-- we had another question from SB Venner. COLTON OGDEN: SB Venner-- on line 16, would it not be a little better to say something like argv underscore line equals line argv? And then on the next line, say, if line underscore argv is not equal to 2 and line underscore argv is not equal to 3, it would save calling the length function method twice? DAVID MALAN: It's a good question. I don't know if we'll agree on this. Short answer-- yes, theoretically, by tucking away that answer in a variable, you don't have to ask the same question again. I would generally argue, though, that it's a premature optimization, if you will. There are so many other things I could be doing poorly in this program. And just counting the number of command line arguments efficiently probably isn't one of them. The reality is, too, the human is the one that's going to be providing these command line arguments most likely. And he or she is not going to type all that many words. So argv is probably going to be size one, two, three, maybe 10, but most likely not. And so even if we are, in linear time, checking the length of those things, it's still going to be super fast. And, frankly, if Python's interpretation of the implementation of a list is efficient, Python is probably using its own variable to keep track of the length of a list, so that it can return its length in constant time. So good instincts. But those are the kinds of details that you should generally relegate to the library and trust or check the documentation to see if it's doing things as smartly as you might be inclined. COLTON OGDEN: If we were incurring this function call six million times instead of a nested for loop, I would say that would be a good point and to cash that value. DAVID MALAN: OK. That's fair. COLTON OGDEN: For this app's, I don't think that's necessary at all. Two function calls versus six million function calls-- I think that's a different kind of optimization, for sure. DAVID MALAN: I agree. And I would say it doesn't have to be six million. If we're talking about a dictionary with 140,000 lines, I would worry about anything you're doing inefficiently with that input, not the human's two or three command line argument. But totally valid, but probably not something you should worry about. Better to spend those mental cycles on harder problems. COLTON OGDEN: Exactly. Trinity up above also asked-- argv[1] gets the second argument or the first one? DAVID MALAN: So in argv[0], as in C, is the name of the program itself or the name of the Python file, which would be speller.pi in this case. So argv[1] gets the first command line argument, but the second word that the human provided at the command line, if that distinction makes sense. COLTON OGDEN: And the word Python itself doesn't even get included. If I'm remembering correctly, it doesn't get included in argv at all. DAVID MALAN: Correct. Just the name of the file is in argv[0]. COLTON OGDEN: In C, though, you do get the dot slash, whatever your script is, as the first-- DAVID MALAN: Whatever the name of the file. But same in Python 2. If you did Python space dot slash foo.py, you'd see that as well in argv[0]. DAVID MALAN: Trebinator-- COLTON OGDEN: Do you want to go ahead and read his question? DAVID MALAN: Yeah. When having a try with a struct try letters-- an array-- what is a letters initialized when using malloc? So the initialization of that is totally up to you. If you're just calling malloc, it's not getting initialized with anything whatsoever. You're just getting back a chunk of memory-- if enough is available-- equal to the number of bytes that you've requested. If you want to clear that memory, setting it equal to all zeros-- so not technically null-- null is a pointer value-- but all zeros, you can use calloc, or clear alloc, which is just another C function. Usage is identical, but it will do the clearing. Now, historically, people would use calloc very sparingly, or not at all, because why bother clearing your memory if you're just going to overwrite it yourself. But in the case of a try, and in a CS50 spell checker, if you're planning to clear that memory anyway, as with a for loop or a while loop, then, yeah, let calloc do it for you. Because it probably is using a for loop or a while loop itself to do that. COLTON OGDEN: I was just reading an article recently about calloc and how it was not really used historically too. DAVID MALAN: No, because-- I mean, performance. Like, literally, by calling calloc you are asking the computer to do, let's say, twice as much work. Give me memory. Oh, and by the way, clear it for me. I mean, clear your own memory was kind of the mindset. And only initialize it when you absolutely have to. COLTON OGDEN: And that would've been O of N, too, right? Because it doesn't have to go over every single byte in that set and [INAUDIBLE] DAVID MALAN: Most likely, unless there's some hardware optimizations where you can actually clear a whole page of memory efficiently. COLTON OGDEN: But I guess that'd be up to the [INAUDIBLE] DAVID MALAN: Beyond my bi familiarity. All right. So where did we leave off? We had just benchmarked, if you will, the running time of the load function by computing after and before. So if we actually want to compute the total time involved in loading, let's go ahead and call that time underscore load, and go ahead and calculate that value-- which is actually pretty easily done-- just do after minus before. In C, we have to do some annoying calculations because of how those structures work. Python, you just don't need to worry about that. You can just compute it with arithmetic. COLTON OGDEN: Did we write those functions ourselves that calculate functions? DAVID MALAN: We wrote calculate ourselves. If you go to the bottom of this file, you'll see an annoying function that actually dives into this data structure being used. You just don't need to do that in Python at all. COLTON OGDEN: Import time. DAVID MALAN: Import time, or anti-gravity if you prefer. So this is going to be a heck of an implementation before long. So let's see what we're actually going to code live versus just kind of whip out a sample solution from the oven here. But let's see how this actually maps to the code in question. Because I think we'll probably just want to pull up a spoiler at some point here. COLTON OGDEN: It's getting close. It's 4:06. We have around 24 minutes-ish before the stream officially ends. DAVID MALAN: So let's see how far along we get here. Let's go ahead and open the actual text. So on the left-hand side here, we get some pointers involved. Thankfully, those are not going to be involved here in Python. But let me go ahead and convert this as closely as I can here. Give me a variable called text on the left. Let me go ahead and go into argv[2]. If the length of argv equals 3, else argv[1]. So this is just giving me the last command line argument, which is either position 2 or position 1, depending on how many total words were passed in. Then let me go ahead and open the file called Text in read-only mode. And just for good measure, I can actually specify an encoding, though this might not strictly be necessary given what dictionary you're using. But I'm going to specify Latin 1, which is essentially Ascii, as it is in our C problem set here. Notice I misspelled argv, so the IDE is yelling at me with that little red circle. And then I'm just going to do some error checking. So if there's not actually a file, we might have screwed up, or the user might have messed up. So I'm just going to go ahead and say something like print could not open the name of that text. Let's go ahead and make this an F string, or format string. Then let's go ahead and call on load-- whatever it does-- even though we secretly know it doesn't need to do anything in Python. And then just go ahead and exit with 1. So some of these lines aren't strictly necessary. We could introduce a context manager, like you proposed before. But I wanted to handle the errors exactly like we are in C, just so that you can kind of see C to Python left to right. COLTON OGDEN: There's one thing on line 33 that I would suggest for folks who might not know some of the shortcuts you can use with lists in Python. For example, all we're essentially trying to do here on line 33 is just take the last-- no matter what it is, we're always going to take the last thing, right? So instead of using argv[2] or argv[1], we can always take, without any condition, argv[-1]. And that will just give us the last element in that list. DAVID MALAN: Indeed. Now this does assume that earlier in the file, we have validated that this has the right number of command line arguments. Because if it's got like 10 arguments, we don't want to take the 10th, or the 9th, or whatever. We want to make sure it has exactly the right number. But we did that, indeed, earlier already. I like that a lot. Let's keep that. All right. So after that, notice that we want to prepare to report the misspelled words. That's actually kind of an easy one-liner. So let's give ourselves some room and scroll up here so we have a place to put this. All right. So right up here, let's go ahead and just print out exactly as we did in C, using some new lines to format it. Nothing really interesting there. Misspelled words, couple new lines. Notice-- I only need one new line, though, because we get one for free with Python. Because the end of a line in Python is by default that. But if we know that, we can just leverage that assumption. Now let's go ahead and prepare to spell-check the word. And I'm going to go ahead and give myself an empty word, which is the equivalent of this line over here. In C, you have to know how many characters you're going to fit. It turns out from the problem statement-- if you read into CS50's homework-- you'd see that the maximum length of any word is going to be "length," which is a constant, which is like 45 or so characters, defined elsewhere. But we don't care in Python, because the words can grow and shrink. Then I'm going to go ahead and initialize some variables on the Python side to be like an index of misspellings and words, all equaling 0, 0, and 0, using this fancy [INAUDIBLE] approach, as before. All right. Very welcome, Trebinator. All right. So now let's go ahead and spell-check every word in the file. This is painful in C because you have to take baby steps in reading the input. The problem in C is that you can't really safely read in from an arbitrary text file an arbitrary number of bytes. You have to decide in advance how many bytes maximally you're going to read. The problem, though, is if we're reading words from a file, I don't know how long or how short my lines are going to be. It could be infinitely long, or however much memory I actually have on disk. And so here the best way in C is to take baby steps-- read one little character at a time and just accumulate those characters, rather than blindly read as many as you can. Python-- I don't have to worry about any of that. I'm going to go ahead and do the following in an infinite loop, while true as follows. If I want to read a character from the file, let me go ahead and just say-- oops, that's not how you spell true in Python. Let me just go ahead and say file.read one character here. If there's not a character, then I'm going to go ahead and break out of this. That means I'm at the end of the file. But if I actually want to now assemble words from the file-- and this is what's tricky here. Unlike the dictionary, where I can just read one word at a time from a line, the file we're spell-checking might have sentences and paragraphs. They might have apostrophes and punctuation and all this nastiness. So we need to actually distinguish. So I'll do this one a little quickly, essentially porting our C from the left to our Python code here at the right. But among the approaches I could take here is this-- if re.match this regular expression-- A-Za-z, c-- or it is the case that c equals an apostrophe and I have already read in part of a word, implying that this variable called index from our C version is greater than 0, then I can go ahead and append this current character to the word. And I'm going to go ahead and increment my index counter advancing in the word. So what am I doing here? I'm tip-toeing through the file that we're spell-checking, taking one character at a time. So if the word in the file is "Colton," I'm getting a C-O-L-T-O-N-- yay, ra ra. So we're reading in Colton and one character at a time and then just appending it to a variable called "word," incrementally building up your name. COLTON OGDEN: And index you're using to keep track of the fact that you're inside a word, right? DAVID MALAN: Exactly. Now if, meanwhile, I've read too many lines, I might actually say this-- so if the index is actually now longer than the maximum length of a word that's allowed, I'm actually going to go ahead and consume the rest of the string and just throw this away, because it's not supported. Something is wrong. So while true, I'm going to go ahead and read in one more character. And if it is not a character or it is not the case that that character matches the following regular expression, or pattern, as it really is, A-Z or a-z, then what I'm going to go ahead and do is just break out of this all together. Essentially, something has gone wrong. If nothing has gone wrong, I'm going to go ahead and reset index and word to 0 in the empty string, with a little one-liner there. But I could write that syntactically a little different and actually proceed as follows. So I don't like how the IDE is yelling at me in so many different places here. Let's see-- undefined variable re. COLTON OGDEN: Well, we solved that problem before. DAVID MALAN: We did. So that's re-- regular expression-- refers to pattern matching capabilities of a library. So let's try that trick before-- import re, scroll back down, come on-- magic, nice-- we've whittled this down further. How about here? Undefined variable length. So I cut some corners here and I didn't actually define the length of the longest word. Should've done that way early on. Looks like I skipped a step. Length equals 45. I think I just said that one verbally. All right. So let's go back down here. Voila. All my code looks correct, of course. And now let's go ahead and check a couple other things. So this line up here was saying if each of the characters we're reading one at a time out of the file is alphabetical-- A-Z-- uppercase or lowercase-- or it is an apostrophe, like Colton's-- the possessive-- then we're going to go ahead and accumulate it. But if we run into other things, we want to handle those differently. So else if it is the case that C is a digit-- this is a method built into strings-- then I'm going to go ahead and just consume these. Let's just get rid of them, because they're not part of the file I want to read. So while true, let's go ahead and read one more character with file-read(1). Then let me go ahead and say, well, if that's not actually a character, or it's not an alphabetical character, and it's not a digit, then let me go ahead and do this with it-- all on one line here-- let's go ahead and just break out. And then let's go ahead and reset ourselves-- index word gets 0 here. So we're going kind of quickly through the Python version. But, again, the goal here is to convert the C code on the left to the Python code on the right. So just to fast forward to where we are in the story, we've just implemented on the left this C code with isdigit with the equivalent Python code on the right. COLTON OGDEN: Is this all distrib code that they usually get, or this is the solution that we're looking at? DAVID MALAN: This is all distribution code. So everything on the left students were given. They didn't have to come up with this all on their own. COLTON OGDEN: Because they just had to implement the dictionary files, or dictionary functions, rather. DAVID MALAN: Indeed. So I think, why don't we go ahead and-- COLTON OGDEN: Oh, sorry, we got a question from Andre. DAVID MALAN: We got a decent amount left. Let's take this question from Andre. Of course. COLTON OGDEN: Can I just ask-- line 76 in C-- let's take a look at line 76-- is a for loop that sometimes causes some confusion. Is this something you would typically use? Or is it there just for pedagogical purposes to show they aren't strictly limited to simple incrementation within a for loop? DAVID MALAN: It's a good question. And, Andre, I do think you make a valid point. In CS50 we tend not to show for loops involving non-numeric initialization and updates in for loops. We pretty much always use I and J and K and so forth. I would say we're using this because it is the clearest design. The for loop does exactly this-- initialize it to some value, check a condition, and then iteratively do something again and again and again. And as such, that is the right looping construct. You could use a while loop and build it that way. But this is what a for loop is good at. So if anything, it's my fault or the course's fault that we don't show students different paradigms of for loops. But all the better if they get to see it here in the distribution code. COLTON OGDEN: Yeah, I think if you have a discreet way of knowing what the end result of your loop is, without having some non-deterministic thing happening in the loop-- which is normally infinite-- I think that's a great reason to use a for loop at all costs. I think it's easier to read. DAVID MALAN: But if you want us to sound smarter then, yeah, absolutely, this is a pedagogical decision. It's informed by years of experience. And we really just wanted students to glean some additional insight from our distribution code. COLTON OGDEN: I think it is illustrative of that paradigm, though. I think it's a good point. DAVID MALAN: I know. That's fair. But I think we should just 'fess up, that, well, we just did it that way because. COLTON OGDEN: It just happened to be what we felt was the best. DAVID MALAN: Indeed. And, amazingly, we've been talking so long, that Trebinator has implemented load in C. So congratulations. That's some really good progress. I suppose it helps that we're showing the logic in Python. But that's fine. Zamyla has walked through, similarly walks students through the process of thinking about how you load. But nicely done, Trebinator. COLTON OGDEN: I think somebody taking some Python and implementing in C is much more impressive than the opposite. DAVID MALAN: So for all those of you who have just done the opposite-- C to Python-- Colton is not impressed. Just kidding, just kidding. So why don't we go ahead and finish up this last for loop. And then I think I'll probably pull out some of the actual solution code, because it's going to get tedious to type out some of the last stuff, the printing stuff. So here we just claim if, at this point in our code, we have found a whole word, then we can go ahead and do the following-- elif index is greater than 0, thereby implying that we've accumulated at least one character, let's go ahead and increase our counter of words, which is a variable we initialized way up above to 0. Then let's do some timing. So the time before we're about to do this benchmark we'll call before again. And let's just call time process time with no arguments. Then let's go ahead and get the return value of check word, thereby checking if it's misspelled or not. And then let's get the time after-- time.process.time. And this is actually cool. Notice here that we did this one-liner. So we want a variable called misspelled to be true if the word is misspelled and false otherwise. The catch is that check returns true if a word has checked out and it's not misspelled. But that's OK. In C you can invert true and false, or false and true with a bang, or an exclamation point. In Python, you don't do that. You actually literally say not. And that reverses the logical effect from true to false or false to true. COLTON OGDEN: And a lot of people will read the line on the left as not check word either way. DAVID MALAN: Yeah, they'll still say not, even though in Python you literally have to write "not." COLTON OGDEN: We got a few-- it looks like Trebinator says-- DAVID MALAN: Hey, we rock. Nice. Or maybe that's for Trebinator. Good job. COLTON OGDEN: I missed initializing my letter array with-- [INTERPOSING VOICES] DAVID MALAN: I don't think it's Trebinator then. That's OK. That's OK, you caught it pretty fast. We'll be posting on Facebook. We'll take a look. This is our first one. We're going to do some post-processing. And just ping us on email or on Facebook to ask after this if you don't mind. COLTON OGDEN: I think it'll at least be on YouTube. DAVID MALAN: Yeah. All right. So let's finish this up. So I've just done a benchmark here with checking for misspelled. Let me do some total time checking here. So time check is going to now equal the difference between after minus before. That's the total amount of time. And we're doing this in our loop, which is why I'm now accumulating. We want to do this for every word, see how much time we've totally spent. And now here, if a word was misspelled, let's go ahead and print that word, which is required by the problem specification in the homework assignment itself. And then let's just increment misspellings by 1 to keep track of how many words are misspelled. And then let's get started to do another loop. Index and word should be reset to 0 and quote-unquote. Could do this in different ways, but I just adopted that little tuple approach. And then assuming all of this went super well, let's close that file. And as before, we could use context manager and let that happen for us. But we're trying to do a one-to-one mapping with the C code. And indeed, if I scroll down here, you'll see a called fclose, which is also checking. There's some f error code which isn't necessary in Python, but checking if there was an error, what went wrong. We would ideally do these in Python using exceptions anyway. You know what? We're almost at the bottom of the file. Let's take it home. COLTON OGDEN: I think we should. We should do it. DAVID MALAN: Let's make ourselves a spell checker here. COLTON OGDEN: Scotty says, "You guys are awesome." Thanks, Scotty. DAVID MALAN: All right. Here we go. Here we go. Don't say that until we get to the end here. Now let's go ahead and count how long it takes to compute the dictionary size, even though it's probably pretty fast since we're just checking a variable's value. N shall be the size, the result of calling our size function. And after, of course, becomes time.process.time. Then here let's store in our variable time underscore size, which we initialized earlier to 0, which strictly speaking wasn't necessary. We were just doing that for parity with the C version. Now let's go ahead and time the unload function, which should really be fast. Process.time. COLTON OGDEN: As an aside, for the P set, is there a maximum time that the students are given for their solution to work? DAVID MALAN: Not really. I mean, they're given a week to implement it. But the code itself can run as long as they'd like. Realistically, if they run it through check 50, our auto-grading infrastructure, probably gets stopped automatically after 30 seconds. But even with the slowest of data structures, like an array, that should be plenty of time, even to spell-check some of those larger files. We're really talking milliseconds, not so much seconds or minutes. After we've done this, let's do an after benchmark, calling time.process.time again. And then let's go ahead and check-- if not unloaded, let's go ahead and yell as much to the user-- could not load our dictionary, just as in C. COLTON OGDEN: 104's got a very subtle bug. DAVID MALAN: If not unload-- oh, you are right. I was accidentally checking a function value. That was not going to work. Nice catch. Let's go ahead and exit with this one there. Let's scroll up. And I honestly don't really want to type out all these prints in Python. This is very ugly looking. So just like Julia Child, we're going to go ahead and go in to our Python version here, which is in our speller.pi file. And voila, let's move it over to the right-hand side. Voila-- COLTON OGDEN: Magic. DAVID MALAN: Look at that. I suppose it's mostly copy-paste, although the format strings change and so forth. So if you're not familiar, let's point out a couple of things here. One, we're using print instead of print def. Two, we are using f strings or format strings to plug those things in there. And, three, we're using these format codes with f strings, where you specify the variable name that you want to print, colon, and then dot some number, which is the number of decimal points you want to place. Print-- 2, in this case-- and then f for a floating point value. And then the nicest line of all is this exit success. We don't need to do sys, since I didn't import sys in that way. You can just use exit in this case. But sys would be more proper version, I'd say. COLTON OGDEN: For folks who might have familiarity with older Python, but not 3.6 and over, the f string is a new feature. DAVID MALAN: It is. You need this in Python 3.6. COLTON OGDEN: Previously, you used percentage signs and the string.format function. And it was a lot uglier. But now you can just preface a string with f and then include the sort of bracket variables in there with format specifiers and it'll just work. DAVID MALAN: Indeed. And Scotty529 asks, "Are there any more in-person visits to Harvard available?" Indeed. Let me see if I can remember the URL off-hand. I think it's CS50 lectures 2018-- no. CS50 lectures-- there we go. Is this the right year? Let's just check. 2018. Yeah. So if you go to-- let me enhance-- picatic.com/cs50-lectures, you will see a landing page where you can sign up for free tickets to any of CS50's lectures that remain this fall. We only have a few left, if this is of interest to you, and you can travel to Cambridge or you'll be in the Boston area and want to drop by. They're all on Friday mornings. Thereafter, you can go on a free tour of the campus with some of our undergraduate friends, who are kindly offering those right after lecture itself, and then see a bit of the university too. Any other questions from folks here tuned in? COLTON OGDEN: We didn't run it. We got to run it, right? Are we going to run this? DAVID MALAN: Oh, you want to see that my code actually works? Oh, I missed-- did we miss that question? COLTON OGDEN: No. DAVID MALAN: OK. Now you're really putting me on the spot. Now we have to trust that the code actually works. So let's go ahead and open up a terminal window here. Let me go into, if you will, just the pre-baked version, just in case I did anything stupid here, and go into speller. In this directory, notice, I've got speller.py and dictionary.py from before. In dictionaries, recall that we have large and small. We use the big one. And if I do ls on text, we see a whole bunch of sample texts. Let's say, Shakespeare, which is a pretty big file. In fact, if you want to look at Shakespeare in the text directory, this will actually be a pretty big file, as indicated by the progress bar there. Come on, come on, come on. And you'll see the complete works of Shakespeare. So, mind you, there's going to be a lot of misspelled words that are really just old English. But that's fine, because we are spell-checking against something. So let's go ahead and run this. Let me go ahead and run Python, or Python 3, to be specific on, let's say, the large dictionaries. So we'll go into dictionaries-large, and then we'll run this on-- let's do this on texts and Shakespeare. Something went wrong. Wait. What am I doing? Oh, I'm not running-- I'm trying to interpret-- COLTON OGDEN: Can't interpret the dictionary file, right? DAVID MALAN: Yes, key step, key step. OK, so let's actually run speller.py. OK, that was going to be the worst ending to any stream here. And now let's do this on large. Aardvarks is not a Python command. Here we go. All right, here we go. We are spell-checking all these words coming out, recall, are the result of that print statement. Because I have a pretty tall screen right now and because this is a web-based IDE, my spell-checker seems to be god awful. But that's just an internet latency thing. It's just taking some time for all the print statements to get spit out. And, indeed, if you actually compute the total CPU time running that spell checker on the complete works of Shakespeare, it only took 1.7 seconds. The number of words that were misspelled was 45,000 or more. The number of words in that dictionary is indeed just over 140,000. The number of words in Shakespeare is a very talkative 904,000-plus words. And you can see a breakdown of the number of milliseconds we spent in each of those functions checking taking the most time. COLTON OGDEN: And those print statements-- those are all about misspelled words, correct? DAVID MALAN: All the misspelled words were what got spit out. And so if we actually hid that print statement, we would actually see these results much more quickly. It's just because of the internet speed and the printing thereof that creates the illusion that the spell-checker is slower than it is. But those running times are the result of time.process.time, which is indeed an accurate count. COLTON OGDEN: Based on the words that I saw coming up on the terminal, it looked like it was all pretty correct to me. DAVID MALAN: I think that's proof by correctness. COLTON OGDEN: Scotty says, "Do you guys play video games? What's your favorite game right now? Do any help you be a better programmer?" I'll let you answer first and then I'll follow up. DAVID MALAN: I don't really anymore. I think the last video game I played was with Colton. We played the newish Zelda-- COLTON OGDEN: Breath of the Wild-- DAVID MALAN: Breath of the Wild on Nintendo Switch. But I realized that if I only allowed mys-- we played until I essentially needed to fall asleep, at which point I decided, OK, that's it for Switch, Wii, Nintendo for the year. COLTON OGDEN: I'm pretty sure you fell asleep too. DAVID MALAN: Did I fall-- OK. So, yes. But it was fun. I think it was after last year's CS50 fair. We decided, no, we're going to get through the school year. Then we're going to play a few hours of Switch. COLTON OGDEN: Was it after the hackathon? I think-- DAVID MALAN: No, it wouldn't have been after the hackathon. COLTON OGDEN: Maybe it was after the fair. DAVID MALAN: I think it was after CS50 was all done. This is like a year ago now. COLTON OGDEN: Two years ago. It was 2016. DAVID MALAN: We didn't even play anything after last year's CS50 fair. COLTON OGDEN: No, you're right. It was like during J term of 2017. DAVID MALAN: All right, so the point is no, is really-- when I was a kid, I used to play Commodore 64, Macintosh games. We weren't allowed to have a Nintendo, but I used to play on my best friend's Nintendo when he let me borrow it. So-- COLTON OGDEN: That's rough. DAVID MALAN: So did any of that help me be a better programmer? Yes. You can-- yes. CS50 at Harvard says that playing video games makes you a better programmer. Honestly, no, not in my case. I don't think I played enough or thought about it to any kind of sophisticated level where that's an issue. But, Colton, maybe? COLTON OGDEN: Certainly, games back then were less flexible in that area. Scottie mentions, I heard you can program in Minecraft. That's absolutely true. There are certain mods in Minecraft-- one of them is called Computer Craft, which actually lets you program in Lua, which is pretty cool. You can actually program the game blocks in the world in Lua in real time. It was called Computer Craft. I'm not sure if it's been redone recently. This was several years ago. But there's games like SHENZHEN I-O, which is a hacking game. And TIS-800, I believe, are also another programming game. Very sort of-- well, not realistic, but kind of realistic. It requires the same logic. And I would say if you're looking for games that could help you program, definitely check those out. As for what I'm playing, right now, I'm playing Fallout New Vegas, which is not helpful for programming at all. DAVID MALAN: And Colton's being modest here. He's not plugged a course he teaches here at Harvard's Extension School, that's now freely available on edX. If you go to CS50.edx.org/games, that will whisk you away to this website here, where you can register for free. They want to know our location for some reason. But you can enroll now for free by clicking the audit option. And this will actually introduce you to Lua the programming language, Love 2D, which is a gaming library framework that you can use to make your own games, and, actually then, yes, claim it is homework to be playing games. COLTON OGDEN: True. Yeah, and I think, going forward too, I would definitely like to start streaming games made using these same technologies. So if any of those appeal to you, definitely check the course out. DAVID MALAN: And a question here-- "Guys, I'm planning to enroll in a computer science degree, but it will be after a year from now. I finished CS50. What course do you recommend me to study during my preparing to the CS degree?" Totally depends. A typical undergraduate here at Harvard, for instance, would generally take another software class, specifically a class in functional programming or object-oriented programming. And I think that's-- for those folks interested in software especially and introductory computer science-- it's a nice way to kind of counterbalance the very procedural programming that we do in a class like CS50, and a lot of courses also do, albeit some courses do focus on object-oriented stuff. But I also recommend of course by some of our friends at Princeton. There's a free introduction to algorithms class that's on Coursera. If I do a quick Google here-- Coursera with Kevin Wayne and his colleague on algorithms at Princeton-- this should be freely available, part 1 and part 2. You should be able to sign up with Bob and with Kevin here on Coursera.org, also for free. And this is more theoretically oriented class. But that also gives you another perspective as to what CS is and should definitely challenge you in a different way. COLTON OGDEN: Do you know Robert, by chance? Have you met him? DAVID MALAN: A little bit. We've met once or twice. But I know Kevin much better from past conferences. COLTON OGDEN: Interesting. I happen to know his-- I've seen his books, Robert's book. DAVID MALAN: Yeah, the two of them now actually writes quite a bit I think, together as well. COLTON OGDEN: Nice. And I think Trebinator mentioned-- you can program mods for it. I think you have to use Java. That is correct, yes, for Minecraft. DAVID MALAN: So please feel free to join us here again online. We're going to figure out what the next day and time is. We'll publicize it on the Facebook channel, Twitter, subreddit and the like. We'll figure out how to get some of these assets online too, if you want to follow along later today or thereafter. Certainly tell your friends if you'd like to join as well. COLTON OGDEN: Yeah, and definitely let us know what sorts of topics you all are interested in us going over and teaching. I know I'm going to probably cater towards game programming, but I'm more than happy to do like Python and web stuff, even though web's not necessarily as much of a strong suit. And I think we're also talking to Brian about him setting up-- I think Brian is going to maybe set up a microblog-- do a live microblog implementation. DAVID MALAN: Does Brian know this yet? COLTON OGDEN: Yeah, he does. DAVID MALAN: Oh, he does now. OK. He didn't this morning. COLTON OGDEN: Oh, no, I talked to him yesterday. DAVID MALAN: Oh, you did. Oh, OK, he knew as of yesterday. COLTON OGDEN: But, yeah, we have a bunch of ideas. But let us know what's obviously most valuable to you, so we can have a better idea of where to go. DAVID MALAN: Yeah, I think we're pretty much at the end point here. Special thanks to Dan, who's been off-camera here, helping us run the show. And all the credit for this idea goes to Colton, who spearheaded this idea with Twitch. So we're really glad you all tuned in today. COLTON OGDEN: Yeah, thanks, everybody for tuning in. It was a lot of fun. DAVID MALAN: Yeah. See you online soon.
B1 中級 SPELLER IS EASY (IN PYTHON) - CS50 on Twitch, EP.1 (SPELLER IS EASY (IN PYTHON) - CS50 on Twitch, EP. 1) 2 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字