Placeholder Image

字幕列表 影片播放

  • And in fact, we can visualize this even a little metaphorically.

  • So for instance, here is, for instance, a mailbox.

  • And suppose that this is address 123.

  • What is in address 123?

  • Well it's a variable of type int, called n,

  • looks like it's storing the number 50.

  • Right?

  • We saw these letters--

  • these numbers last week.

  • So here's the number 50, which is an integer inside of this variable, today,

  • represented as a mailbox instead of as a locker.

  • Well suppose that this mailbox over here is not n but suppose this is p.

  • And it happens to be an address 456.

  • But who really cares?

  • If this variable p is a pointer to an integer, namely that one over there,

  • when I open this door, what am I going to find?

  • Well I'm hoping I find the equivalent of-- we

  • picked these up at the Coop earlier --the equivalent

  • of a conceptual pointer saying the number n is over there.

  • But what specifically, at a lower level, is actually inside this mailbox

  • if that variable n is at location 0x123?

  • What's probably inside this mailbox?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: Yeah, the address, indeed, 123.

  • So it's sort of like a treasure map if you will.

  • Oh, I have to go to 123 to get this value.

  • Oh, the integer in question is indeed 50.

  • And that's the fundamental difference.

  • This is the int that happens to be inside of this variable of type int.

  • This is the address that's a pointer that's in this other variable, p,

  • but that is conceptually, simply pointing from one variable

  • to another, thereby giving any sort of conceptual breadcrumbs.

  • And we'll see-- frankly, in one week --how amazingly powerful it is.

  • When you can have one piece of memory pointing at another,

  • pointing at another, pointing at another,

  • you can start to construct very sophisticated data structures,

  • as they're called, things like family trees,

  • and lists, and other data structures that you might have heard of.

  • Or even if you haven't, these will be the underpinnings next week

  • of all of today's fanciest algorithms used by,

  • certainly the Googles, and the Facebooks,

  • and the Microsofts of the world to manage large data sets.

  • That's where we're going next week, in terms of application.

  • So questions about that representation?

  • Yeah, in the middle.

  • AUDIENCE: Does that mean that your memory has to be twice as big?

  • DAVID J. MALAN: Sorry can you say it once more?

  • AUDIENCE: Is that to say your memory has to be twice as big to store pointers?

  • DAVID J. MALAN: Ah, really good question.

  • Is it the case that your pointers need to be twice as big?

  • Not necessarily, just, this is the way life is these days.

  • On most modern Macs and PCs, pointers use 64 bits-- the equivalent of a long,

  • if you recall that brief discussion in Week 1.

  • So I deliberately drew my pointer on the screen

  • here as taking up 8 bytes or 64 bits.

  • I've deliberately drawn my integer n as taking up 4 bytes or 32 bits.

  • That is convention these days on modern hardware.

  • But it's not necessarily the case.

  • Frankly, I could not find a bigger mailbox at Home Depot,

  • so we went with two identical different colored ones.

  • So metaphor is imperfect.

  • All right.

  • So moving from this to something more familiar now, if you will.

  • Recall that we've been talking about strings for quite some time.

  • And in fact, most of the interesting programs we've written thus far

  • involve maybe input from the human and some form of text

  • that you are then manipulating.

  • But string we said in Week 1 is a bit of a white lie.

  • I mean, it is the training wheels that I promised

  • we would start taking off today.

  • So let's consider what a string actually is now in this new context.

  • So if we have a string like EMMA here, declared in a variable

  • called s, and quote unquote, EMMA in all caps, as we've done a couple of times

  • now.

  • What does this actually look like inside of the computer?

  • Well somewhere in my computer's memory there are four, nay, five bytes,

  • storing E-M-M-A, and then additionally, that null terminating character that

  • demarcates where the end of the string is.

  • This is just eight individual 0 bits.

  • So that's where EMMA might be represented in the computer's memory.

  • But recall that the variable in question was s.

  • That was my string.

  • And so that's why over the past few weeks

  • any time you want to manipulate a string, you use its name, like s.

  • And you can access bracket 0, bracket 1, bracket 2, bracket 3,

  • to get at the individual characters in that string like EMMA, E-M-M-A,

  • respectively.

  • But of course it's the case, especially per today's revelation, that really,

  • all of those bytes have their own addresses.

  • Right?

  • We're not going to care after this week what those addresses are

  • but they certainly exist.

  • For instance, E might be at 0x123.

  • M might be at 0x124--

  • 1 byte away --0x125, 0x126, 0x127.

  • They're deliberately 1 byte away because remember a string is defined

  • by characters back-to-back-to-back.

  • So let's say for the sake of discussion that EMMA name in memory

  • happens to start at 0x123.

  • Well, what then really is that variable s?

  • Well, I dare say that s is really just a pointer.

  • Right?

  • It can be a variable, depicted here just as before, called s.

  • And it stores the value 0x123.

  • Why?

  • That's where Emma's name begins.

  • But of course, we don't really have to care about this level of precision,

  • the actual numbers.

  • Let's just draw it as a picture.

  • s is, if you will, a pointer to Emma's actual name in memory,

  • which might be down over here.

  • It might be over here.

  • It might be over here, depending on where in the computer's memory

  • it ended up by chance.

  • But this arrow just suggests that s is pointing to Emma, specifically

  • at the first letter in her name.

  • But that's sufficient though, right?

  • Because how-- if s stores the beginning of Emma's name, 0x123.

  • And that's indeed where the E is but we just

  • draw this pictorially with an arrow.

  • How does the computer know where Emma's name

  • ends if all it's technically remembering is the beginning?

  • AUDIENCE: The null terminating character.

  • DAVID J. MALAN: The null terminating character.

  • And we stipulated a couple of weeks ago that that is important.

  • But now it's all the more important because it turns out

  • that s, this thing we've been calling a string,

  • has no familiarity with MMA or the null terminator.

  • All s is pointing at technically, as of today,

  • is the first letter in her name, which happens to be in this story at 0x123.

  • But the computer is smart enough to know that if you just point it

  • at the first letter in a string, it can figure out

  • where the string ends by just looking--

  • as with a loop --for that null terminating character.

  • So this is to say ultimately, that there is no such thing as string.

  • And we'll see if this strikes a chord.

  • There is no such thing as a string.

  • This was a little white lie we began telling in Week 1

  • just so that we could get interesting, real work done, manipulating text.

  • But what is string most likely implemented as would you say?

  • AUDIENCE: An array of characters.

  • DAVID J. MALAN: An array of characters, yes.

  • But that was Week 1's definition.

  • What technically now, as of today, must a string be?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: Sorry, over here.

  • AUDIENCE: A pointer.

  • DAVID J. MALAN: A pointer.

  • Right? s, the variable in which I was storing

  • Emma's name would seem to manifest a pattern just

  • like we saw with the numbers a moment ago, the number 50.

  • s seems to be storing the address of the first character

  • in that sequence of characters.

  • And so indeed, it would seem to be a string.

  • Well, how do we actually connect these dots?

  • Well suppose that we have this line of code

  • again where we had int n equals 50.

  • And then we had this other line of code where we said,

  • go ahead and create a variable called p and store in it the address of n.

  • That's where we left off earlier.

  • But it turns out that this thing here is our data type from Week 1.

  • This thing here, int star, is a new data type as of today.

  • The variable stores, not an int, but the address of an int.

  • It turns out that something like this line of code, with Emma's name,

  • is synonymous with char star.

  • Right?

  • If a star represents an address and char represents the type of address being

  • pointed at, just as int star can let you point at a value like n--

  • which stored 50 --so could a char star--

  • by that same logic --allow you to store the address of and therefore

  • point at a character.

  • And of course, as you said, from Week 1, a string

  • is just a sequence of characters.

  • So a string would seem to be just the address of the first byte

  • in the sequence of characters.

  • And the last byte happens to be all 0s by convention, to help us find the end.

  • So what then more technically is a string

  • and what is the CS50 library that we're now going

  • to start taking off as training wheels?

  • Well last week we introduced you to the notion of typedef,

  • where you can create your own customized data type that does not exist in C

  • but does exist in your own program.

  • And we introduced this keyword, typedef.

  • We proposed last week that this was useful because you could actually

  • declare a fancy structure that encapsulates

  • multiple variables, like name and number,

  • and then we called this data structure, last week, a person.

  • That was the new data type we invented.

  • Well it turns out you can use typedef in exactly the same way

  • even more simply than we did last week by saying this.

  • If you say typedef char star string--

  • typedef means give me a new data type, just for my own use.

  • Char star means the type of value is going to be the address of a character.

  • And the name I want to give to that data type is going to be string.

  • And so literally, this line of code here, this

  • is one of the lines of code in CS50 dot h-- the header

  • file you've been including for several weeks,

  • where we are creating a data type called string

  • to make it a synonym for char star.

  • So that if you will, it's an abstraction,

  • a simplification on top of the idea of a sequence of characters

  • being pointed at by an address.

  • Any questions?

  • And honestly, this is why-- and maybe those sort

  • of blank stares --this is why we introduced strings in Week 1

  • as being an actual type as opposed to not existing at all.

  • Because who really cares about addresses and pointers

  • and all of that when all you want to do is like,

  • print, hello world, or hello, so and so's name?

  • Yeah, question.

  • AUDIENCE: What other-- what other functions are created--

  • major functions are created by CS50 are not intrinsic to--

  • DAVID J. MALAN: Really good question.

  • We'll come back to this later today.

  • But other functions that are defined in the CS50 library that

  • are training wheels that come off today are getString,

  • getInt, getFloat, and the other get functions as well.

  • But that's about it that we do for you.

  • Other questions?

  • Yeah.

  • AUDIENCE: Can you define all of these words again?

  • Like, it's-- so string is like a character pointer which points--

  • I was confused about that.

  • Can you repeat that?

  • DAVID J. MALAN: Sure.

  • A string, per this definition, is a char star, as a programmer would say.

  • What does that mean?

  • A string is quite simply a variable that contains the address of a character.

  • By our human convention, that character might be the beginning

  • of a multi character sequence.

  • But that's what we called strings in Week 1.

  • So a string is just the address of a single character.

  • And we leave it to human convention to know that the end of the string

  • will just be demarcated by eight 0 bits, a.k.a.

  • the null terminator.

  • And this is the sense in which-- especially

  • if you have some prior programming experience

  • --that C is much more low level.

  • In Python, as you'll soon see in a few weeks,

  • everything just works so splendidly easily.

  • If you want a string, you can have a string.

  • You don't have to worry about any of these low level details.

  • But that's because Python is built here, conceptually,

  • where C is built down here-- so to speak --closer to the computer's memory.

  • But there's no magic.

  • If you want to string, fine.

  • Just remember where it starts, remember where it ends.

  • And boom, you're done.

  • The star in the syntax today is just a way of expressing those ideas in code.

  • So let's go ahead then and experiment with this string,

  • just as we did a moment ago using Emma's name now instead of an int.

  • So let me go ahead and erase those lines earlier.

  • And let me go back to Week 1 style stuff, where I just say string s

  • equals quote unquote, Emma.

  • And then of course, if I to print this, I can simply say this as before.

  • So just as a quick safety check, let me go ahead and make address again.

  • Whoops.

  • What did I do wrong?

  • Let me scroll up to the first--

  • of many it seems --errors.

  • Yeah.

  • AUDIENCE: You're using string, [INAUDIBLE]

  • DAVID J. MALAN: Yeah, I kind of shouldn't have taken

  • off all the training wheels just yet.

  • I'm still using string.

  • So let me go ahead and put that back just for now.

  • That will give me access to that typedef for string.

  • Let me recompile it as make address.

  • That worked.

  • So that was the solution, thank you.

  • And then address again.

  • We just see Emma.

  • So what can we now do that's a little bit different here?

  • Well, one, you know what I can actually do?

  • I can get rid of this-- the solution a moment ago --and say,

  • I don't need string anymore.

  • I don't need those training wheels.

  • If s is going to represent a string, technically, s

  • is just going to store the address of the first character.

  • And it suffices actually, just to write this.

  • So literally instead of string, you write char star.

  • Technically, you don't need--

  • you can have extra space to the left or right.

  • But most programmers write it just as I have here, char star variable name.

  • That looks scarier now but it's no different from what

  • we've been doing for weeks.

  • If I now do make address without the CS50 library,

  • still works, because C knows what I'm talking about.

  • And if I run address now, I still see Emma.

  • But now I can start to play around.

  • Right?

  • If s is the address of a character, what was the format code

  • I can use to print an address?

  • Not percent i, but--

  • AUDIENCE: Percent p.

  • DAVID J. MALAN: Percent p, a pointer.

  • So let me go ahead and recompile this now.

  • Make address, that compiles too.

  • And when I run dot slash address, I'm not going to see Emma now.

  • What should I see instead?

  • Some address, right?

  • I have no idea what it is.

  • It looks like Emma's name is stored at 0x42A9F2,

  • whatever that number translates to decimal, somewhere

  • in the computer's memory.

  • But it turns out then too, what about this?

  • Let me go ahead and add another line of code and say,

  • you know what, I'm really curious now.

  • What is the address of the first letter in Emma's name?

  • How do I express in C, the first letter only of Emma's name

  • if Emma is stored in s.

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: s bracket zero, right?

  • That would seem to be that.

  • But that is what?

  • That's a char. s bracket 0 is a char.

  • How do I get the address of s bracket 0?

  • AUDIENCE: Ampersand.

  • DAVID J. MALAN: Yeah, I can just say ampersand.

  • Right?

  • So it's ugly looking but that's fine for now.

  • Make address, enter.

  • Whoops.

  • It's uglier because I forgot my semicolon.

  • Let me go ahead and make address again, enter.

  • Seems to compile.

  • And when I run dot slash address now, notice I get the same thing.

  • And this is because C is taking me literally.

  • When you print out s, a string, it's technically just the address

  • of the first character.

  • And indeed, I can corroborate as much by running s

  • bracket zero then get the address of the first character.

  • And they are indeed one in the same.

  • So a string is this sort of abstraction on top of a bunch of characters.

  • But again, s is just an address.

  • And that's all we're emphasizing now.

  • And if I get really curious-- not that you would necessarily

  • do this in a real program --what if I print

  • out a few more characters in Emma's name, like s bracket 1, 2, and 3?

  • Let me go ahead, just out of curiosity and make this program and dot slash

  • address.

  • Now notice what I see, is again, s's address is at 42AB52.

  • The first character in s is at the same thing,

  • by definition of what a string is.

  • And then notice what's kind of neat-- if this is--

  • if-- for some definition of neat --53, 54, 55 is noteworthy.

  • Why?

  • They're one byte apart.

  • So this whole time, whenever you implemented Caesar, or substitution,

  • or some other cipher in problem set two, anytime

  • you were manipulating individual characters-- you didn't know it

  • --but you were just visiting different mailboxes.

  • You were just visiting different addresses in the computer's memory

  • in order to manipulate them somehow.

  • All right.

  • Can I do one last demo that's a little arcane and then

  • we'll make things more-- more real?

  • All right.

  • So it turns out if all that's going on underneath the hood

  • is just addresses, watch what I can do here.

  • If I want to go ahead and print out what is at the address s,

  • what will I find in memory if I go to the address in s?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: Sorry, a little louder.

  • AUDIENCE: The first letter.

  • DAVID J. MALAN: The first letter in Emma's name, right?

  • If we can all agree-- even if it's a little unfamiliar still

  • --that s is just the address of a character, and I say, go to s,

  • what should I see specifically?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: Probably E in Emma Right?

  • If s is the address of the first character of her name,

  • star s would mean go to that character.

  • So let me go ahead and print that as a char.

  • So let me go ahead now and make address dot slash address, enter.

  • There is the E because I can say, go to that address and print what's there.

  • And I can actually do this for all of her letters in her name.

  • Let me go ahead and print out another one here.

  • So how do I get at the second letter in Emma's name?

  • Previous-- normally, like last week, we would have done this.

  • And that just magically gets you to the second letter in her name.

  • But I can do it a little differently.

  • What if I go to s and then, from where do I want

  • to go from s to get the second letter?

  • AUDIENCE: Plus one.

  • DAVID J. MALAN: Plus one, right?

  • I mean, maybe we can literally just do arithmetic here.

  • If s is the address of her first letter, it stands to reason that s plus 1

  • is the address of her second letter.

  • So make address now dot slash address.

  • And I should see EM.

  • And I can do this twice more maybe and go ahead and do this and then this.

  • But this time add 2 and this time add 3, just doing some simple arithmetic.

  • Make address dot slash address, there is Emma but in a much lower level detail.

  • So what is this bracket symbol?

  • In computer science, this is what's called syntactic sugar.

  • It's kind of a silly name.

  • But it just refers to a handy feature so that you, the programmer, can say,

  • s bracket 0 or bracket 1.

  • But what the computer is actually doing underneath the hood-- the compiler,

  • Clang --it's actually converting all of your uses of square brackets since Week

  • 1 to this format here.

  • It's just doing arithmetic underneath the hood.

  • Now you don't have to do this moving forward.

  • But I point out this low level detail just to give you a sense of,

  • there really is no magic.

  • When you say, go print an address or go do this,

  • the computer is taking you literally.

  • Whew.

  • OK, that was a lot.

  • Yes, question.

  • AUDIENCE: So [INAUDIBLE]

  • DAVID J. MALAN: Star s would mean go to the address in s.

  • AUDIENCE: So why for instance, if you [INAUDIBLE] character [INAUDIBLE]

  • DAVID J. MALAN: Really good question.

  • Why, when you print out s, does it print out the whole string and not

  • just the character?

  • That's what the printf format code is doing for you.

  • When you tell printf to use percent s, that has special meaning to printf.

  • And it knows to go to the first address and not just print

  • the second-- the first char, but print every character thereafter

  • until it sees what?

  • AUDIENCE: The null terminator.

  • DAVID J. MALAN: The null terminating character.

  • So printf and percent s are special and have been special since the Week 1.

  • They just know to do exactly what you've described.

  • So pointer arithmetic, to be clear, is just taking addresses and like,

  • doing arithmetic with them, adding 1, adding 2, adding

  • 3, or any other manipulation like that.

  • All right.

  • So [CHUCKLE] let's take another stab at a meme here.

  • [CHUCKLE]

  • OK, a few of us.

  • All right.

  • All right, it's trying too hard.

  • All right.

  • So what then do we have when it comes to strings?

  • Well, let's now try to learn from these primitives

  • and actually trip over some mistakes that we might otherwise make.

  • I'm going to go ahead and open up a new file.

  • I'm going to go ahead and call this one, compare.

  • So we'll save this as compare dot c.

  • And this will be reminiscent of something we started doing last week.

  • And you've done this past week, particularly for implementing

  • voting and comparing strings.

  • I'm going to go ahead and make a quick program that

  • just compares two integers.

  • I'm going to put the training wheels back on temporarily,

  • just so that we can get some numbers from the user

  • pretty easily, including CS50 dot h and standard I/O dot h.

  • I'm going to do int main void as my program.

  • I'm going to get an integer called i and ask the human for that.

  • I'm going to get another integer called j, ask the human for that.

  • And then I'm going to go ahead and say if i equals equals j,

  • then go ahead and print with printf that they're the same.

  • Else, if i does not equal j, I'm going to go ahead quite simply and print out

  • different backslash n.

  • So if i equals equals j, it should say, same.

  • Else, if it's different, it should say different.

  • So let me go ahead and make compare dot slash compare.

  • And I should see, hopefully, if I type in say, 1, 2, they're different.

  • And if I instead do 1, 1, they're the same.

  • All right.

  • So it stands to reason that logically this is pretty straightforward when

  • you want to compare things.

  • So instead of using numbers, let me go ahead and change this.

  • Let me go ahead and do, say, string s gets getString, just as before

  • but using getString instead and ask the human for s.

  • Then give me another string, t, just because it's alphabetically next.

  • And I'll ask the human for t.

  • And then I'm going to go ahead and ask this question, if s equals

  • equals t, print same, else, print different.

  • So now let me go ahead and make compare again.

  • I'm going to go ahead and type in dot slash compare.

  • We'll type in Emma.

  • We'll then type in Rodrigo.

  • And of course, it's going to say different.

  • But if I instead run it again and type in Emma and all right,

  • I'll type Emma again--

  • hmm, different.

  • Maybe it's a capitalization thing?

  • No.

  • But why as of today, are they indeed different?

  • Last week we kind of waved our hands and said, ah, they're arrays,

  • you have to do some stuff.

  • But why are they different?

  • AUDIENCE: They're stored in different locations.

  • DAVID J. MALAN: Exactly, they're stored in different locations.

  • So when you get a string with getString and call it s, and then you

  • get another string with t and call it t, you're

  • getting two different chunks of memory.

  • And yes, maybe the human has typed the same thing into the keyboard,

  • but that doesn't necessarily mean that they're going

  • to be stored in the exact same place.

  • In fact, what we really have here is a picture not unlike this.

  • If I have a variable called s-- and I'm just going to draw it as a box there

  • --and if I have a variable called t--

  • I'll draw it as another box here --and I typed in Emma--

  • E-M-M-A --that's going to give me somewhere in memory,

  • E-M-M-A backslash 0.

  • And I'll try it as an actual array, albeit a little messily.

  • And then here, if I type EMMA again in all caps,

  • it's going to end up-- thanks to getString,

  • at a different location in memory.

  • By nature of how getString works, it's going to store anything you type in it.

  • And what's going to get stored in s and t?

  • Well, for the sake of discussion, let's suppose that

  • this chunk of memory with the first input--

  • sorry --happens to be at 0x123.

  • And the second chunk of memory happens to be at 0x456, just by chance.

  • Well, what am I technically storing in s?

  • 0x123.

  • And what am I storing in t?

  • 0x456.

  • So when you say, is s equal equal t.

  • Is it?

  • Well, no.

  • You're literally comparing 123 versus 456.

  • The computer is not going to presumptuously go

  • to that address for you unless you somehow tell it to.

  • Put another way, if I instead draw these boxes,

  • not as actual numbers, what we really have-- sorry --what we really have is

  • what we'll draw as an arrow more generally,

  • just a pointer to that value.

  • Who really cares where the address is?

  • So this is why last week we kind of waved our hand and said,

  • eh, you can't just compare two strings because you probably

  • have to compare every character.

  • And that was true.

  • But what you're technically comparing is indeed

  • the addresses of those two variables.

  • Any questions then on this here?

  • Yeah.

  • Sure, yes.

  • AUDIENCE: So you said earlier that the, I

  • guess, the pointer, and the actual thing it's

  • pointing are like kind of somewhere in the memory not in a specific--

  • they're just somewhere, right?

  • DAVID J. MALAN: OK.

  • AUDIENCE: So do you need something that points to the point--

  • how does the computer know where the pointer is?

  • DAVID J. MALAN: Oh, how does the computer know where these pointers are?

  • So that's a really good question.

  • And let's answer it right here.

  • All this time when you've been calling getString to get a string,

  • you've probably been assigning it to a variable like I have here on line six,

  • with string s.

  • But we know as of today that if we get rid of the CS50 library, technically,

  • string is just synonymous with char star.

  • And so both here and with t, do you technically have char star, right?

  • It's just a find and replace if we get rid of that training wheel.

  • Char star just means s is storing the address of a character.

  • And char star t means t is storing the address of a character.

  • Ergo, all this time since the Week 1 of CS50, what type of value has getString

  • been returning, even though we never described it as such?

  • What must getString be returning?

  • Yeah.

  • AUDIENCE: The index of the first letter.

  • DAVID J. MALAN: Not even the index per se, but rather the--

  • AUDIENCE: It houses the memory of that.

  • DAVID J. MALAN: The address of the first character.

  • So anytime you called getString, getString code we wrote

  • is finding in your computer's memory some free space,

  • enough bytes to fit whatever the word was that got typed in.

  • getString then, if we looked at its code,

  • is designed to return the address of the first byte of that chunk of memory.

  • So getString, this whole time, has been returning, if you will,

  • what's called a pointer.

  • But again, nuances that we didn't want to get into in the very first week

  • certainly, of C programming.

  • All right.

  • Well, let's go ahead and make this a little more concrete.

  • If I pull up this code, I don't have to just check

  • if they're same or different, let me just go ahead and print them out.

  • If I do percent p backslash n, I can literally print out s.

  • And if I go ahead and print out the same thing for t using percent p,

  • I can print out the value of t.

  • So let me go ahead and make compare.

  • Seems to compile OK.

  • And I don't know what the addresses are in advance.

  • But let me go ahead and type in, for instance, Emma and Emma.

  • So even though those strings look the same notice,

  • it's a little subtle this time, the first Emma's at 0xED76A0.

  • The second Emma's at 0xED76E0, which is a few numbers away from the first Emma.

  • So that just corroborates the instincts last week

  • that we can't just compare them like that.

  • So what are the implications then?

  • Let's do one other example here.

  • Let me go ahead and save this as copy dot C.

  • And let's try a very reasonable goal.

  • If I want to go ahead and get the user's input and actually copy a string

  • and capitalize the string from the user, let's see this.

  • So let me go ahead and give myself the temporary training wheels again, just

  • so I can get a string from the human.

  • Let me go ahead and include standard I/O dot h and then an int main void.

  • Let me do a simple example, the goal of which

  • now, is to get a string from the user and capitalize a copy thereof.

  • So I'm going to go ahead and do string s gets getString and call it s,

  • as before.

  • I'm going to go ahead and then do string t equals

  • s to make a copy of the variable.

  • And then I'm going to go ahead and say what?

  • Let me go ahead and capitalize the copy.

  • And to capitalize the copy, I can just change the first character

  • in t, so t bracket 0, to what?

  • I think we had toupper a while back.

  • Does this seem familiar?

  • You can call the toupper function.

  • And the toupper function, if you don't recall,

  • you technically have to use C type dot h.

  • This might be reminiscent of the second c problem

  • set, where you might have used this in Caesar, or substitution, or the like.

  • All right.

  • And now, let me go ahead and print out these two strings.

  • Let me go ahead and print out s.

  • And let me go ahead and print out t.

  • So again, all I've done in this program is get a string from the user,

  • copy that string, capitalize the copy called t.

  • And let's just print out the end results.

  • So let me go ahead and save the file.

  • Let me go ahead and make copy.

  • Seems to compile OK.

  • Let me go ahead and run copy.

  • And let me go ahead and type in emma, in all lowercase,

  • deliberately, because I want to see that t is capitalized but not s.

  • Hmm.

  • But somehow they're both capitalized.

  • Notice, that emma in all lowercase ended up being both capitalized in s

  • and capitalized in t per the two lines of output.

  • That's a bug?

  • Right?

  • I only capitalized t, how did I accidentally

  • also capitalize s do you think?

  • Any thoughts?

  • Doesn't matter if I avert the lights, I still can't see any hands.

  • OK, how about here in front?

  • Yeah.

  • AUDIENCE: So when you say t equal s you have to [INAUDIBLE]

  • DAVID J. MALAN: Exactly.

  • When I say t equals s on this line, I am getting a second variable called t.

  • And I am copying s.

  • But I'm copying s literally.

  • s as of today, is an address.

  • After all, string is the same thing as char star for both s and t.

  • And so technically, all I'm doing is copying an address.

  • So if I go back to my picture from before, this time,

  • if I've gone ahead and typed in an array of emma, with all lowercase--

  • e-m-m-a --and then a backslash 0, somewhere in memory using getString,

  • and I've gone ahead initially and stored that in a variable called s--

  • and I don't care about the addresses anymore.

  • I'm just going to use arrows now to depict it graphically.

  • When I created a second variable called t and I set t equal to s,

  • that's like literally copying the arrow that's in s

  • and storing it in t, which means t is also pointing at the same thing.

  • Because again, if I didn't do this hand wavy arrow notation,

  • I literally wrote out 0x123.

  • I would have just written out 0x123 in both s and t.

  • So when, in my code, I go ahead and say, you

  • know what, go to the first character in t and then go ahead and uppercase it.

  • Guess what the first character in t is?

  • Well, it's this e.

  • But guess what the first character in s is, literally that same e.

  • So this does not suffice to copy a string

  • by just saying t equals s, as it has up until now with every other variable.

  • Any time you've needed a temporary variable or a copy of something

  • this worked.

  • Intuitively, what do we have to do probably instead

  • to truly copy Emma into two different places in memory?

  • Yeah.

  • AUDIENCE: Probably create a char or create a variable exactly the same size

  • and copy each character individually.

  • DAVID J. MALAN: Nice.

  • So maybe we should give ourselves a variable that

  • has more memory, the same amount of memory being stored

  • for the original Emma, and then copy the characters from s

  • into the space we've allocated for t.

  • And so we can actually do this.

  • Let me go ahead and get rid of all but that first line, where

  • I've gotten s as before.

  • And I'm going to go ahead and do this, I'm to say that t is a string--

  • but you know, we don't need that training wheel anymore.

  • String, char star, even though it looks uglier.

  • Let me go ahead and allocate more memory for myself.

  • How do I do that?

  • Well, it turns out-- we've not used this before --there's a C function called

  • malloc, for memory alloca.

  • And all it asks as input is how many bytes you want.

  • So how many bytes do I want for Emma to store her name?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: I heard 4, 5.

  • Why, 5?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: So we need the null terminating character,

  • e-m-m-a and then backslash 0.

  • So that's 5.

  • So I could literally hard code this here.

  • Of course, this feels a little fragile because I'm

  • asking for any string via getString.

  • I don't know it's going to be Emma.

  • So you know what, let me go ahead and ask a question?

  • Whatever the length is of the human's input in s,

  • go ahead and add 1 to it for the null character

  • and then allocate that many bytes.

  • So now my program's more dynamic.

  • And once I have this, well, how can I go ahead and copy this?

  • Well, let me just do old school loop.

  • So for int I get 0, i is less than the string length of s, i plus plus--

  • so this is just a standard for loop iterating over a string

  • --and I think I can just do t bracket i equals s bracket i in order

  • to copy the two strings.

  • There's a subtle bug and a subtle inefficiency though.

  • Anyone want to critique how I've gone about copying s into t?

  • Yeah.

  • AUDIENCE: [INAUDIBLE] getString [INAUDIBLE]..

  • DAVID J. MALAN: Yeah.

  • This was inefficient.

  • We said a couple of weeks ago this is bad design

  • to just keep asking the question, what's the length the s?

  • What's the length of s?

  • So remember that we had a little optimization a couple of weeks ago.

  • Let's just declare n to equal the string length of s

  • and then do a condition of i is less than n.

  • So we've improved the design there.

  • It's a little more efficient.

  • We're wasting less time.

  • There's still a subtle bug here.

  • How many byte-- yeah.

  • AUDIENCE: Aren't you not copying the null terminator

  • DAVID J. MALAN: I'm not copying the null terminator.

  • So every other time we've iterated over a string, this has been correct.

  • Iterate up to the length but not through the length of that string.

  • But I technically do want to go one more step

  • this time, or equivalently, one more step.

  • Because I also want to copy not just e-m-m-a, which is str length 4--

  • e-m-m-a is 4 --I also want to do it a fifth time for the null character.

  • So in this case, I'm deliberately going one step

  • past where I usually want to go to make sure I copy 5 bytes for Emma,

  • not just 4.

  • All right.

  • Let's go ahead now and capitalize Emma.

  • So t bracket 0 gets toupper of Emma's first character in the copy.

  • And now let's go ahead and print out both strings s

  • and t, just as before, with percent s of t.

  • And let me make one change, I use strlen now.

  • So I know I'm going to get an error if I don't do this.

  • I need to use string dot h-- recall --anytime you use string length.

  • So I'm going to go proactively add that.

  • So what's different?

  • This line is the same as before.

  • I'm getting a string from the user.

  • This line is the same as before.

  • I'm capitalizing the first letter.

  • And these two lines are the same.

  • I'm just printing out s and t.

  • So the new idea here is, with my malloc, am I allocating as many bytes

  • as I need to store a copy of Emma, and then with this for loop

  • am I actually doing the actual copy?

  • Let me go ahead and do make copy again.

  • Seems to run OK.

  • Run dot slash copy.

  • Type e-m-m-a in all lowercase.

  • And voila, now I've capitalized t but not s.

  • Yeah?

  • AUDIENCE: When you use malloc, it's just allocating number of bytes,

  • it doesn't matter where?

  • DAVID J. MALAN: It is just allocating that many bytes for you.

  • It does not matter where.

  • You indeed should not care where it is because you're just

  • being handed the address and using C code,

  • can you just go there as you want.

  • All right.

  • Let's clean this up too.

  • Surely, people copy strings for years.

  • And in fact, we don't need to do this for loop ourself.

  • It turns out we can simplify this code a little bit by enhancing this

  • as follows.

  • It turns out, if you look in the manual page for strings,

  • you can actually use something called strcopy--

  • no-- without any vowels.

  • And you can copy into t, the contents of s.

  • strcpy is a function written a long time ago by some other human.

  • And they went ahead and implemented, probably, that loop for us.

  • And it tightens up our code here a little bit more.

  • AUDIENCE: Professor?

  • DAVID J. MALAN: Yeah.

  • AUDIENCE: What if I forgot to copy in the null character at the end?

  • DAVID J. MALAN: Really good question.

  • What if you forgot to copy in the null character at the end?

  • It is unclear what would happen.

  • If there just happened to be some bits in that location in memory

  • from earlier-- from some other part of your program

  • --and you try printing out s and printing out t,

  • you might print out many more characters than you actually

  • intended-- if there's no backslash 0 actually there.

  • We'll see this more and more.

  • Anytime you don't initialize the value of a variable,

  • it's what's called a garbage value, which means

  • who knows what 0s and 1s are there.

  • You might get lucky and it's all 0s.

  • But most likely it's going to print some garbage value instead.

  • All right.

  • Any questions on this?

  • Yeah.

  • AUDIENCE: Is the string length function only in the CS50 library?

  • DAVID J. MALAN: Is the--

  • which function?

  • AUDIENCE: String length.

  • DAVID J. MALAN: Oh, strlen, no, that's in string dot h.

  • That is a standard C thing.

  • AUDIENCE: OK.

  • If string length is a standard function but strings are not--

  • DAVID J. MALAN: So what's the dichotomy here then?

  • If strings don't exist--

  • as I've noted multiple times.

  • And yet, there's functions like strcpy and strlen --what's going on?

  • C calls them char stars.

  • It is c that does not call them strings.

  • We, CS50, and the world in general, calls addresses

  • of sequences of characters, strings.

  • So the only training wheel here, really is the semantics.

  • We gave you a data type called string so that in the first week of C and CS50,

  • you don't have to see or type char star, which would arguably

  • be a lot more cryptic so early on.

  • It's arguably a bit cryptic today too.

  • Other questions?

  • All right, yeah.

  • AUDIENCE: So is char star ID type [INAUDIBLE]

  • DAVID J. MALAN: Is--

  • say that once more.

  • AUDIENCE: Char star ID type [INAUDIBLE].

  • DAVID J. MALAN: Not all of them, but any of them that take a string, yes.

  • In fact, any time you have seen us or TF in CS50 say string,

  • you can literally, starting today, change that expression to char star

  • and it will be one and the same.

  • Phew.

  • OK.

  • That was a lot.

  • Let's take our five minute break here with cookies outside.

  • All right.

  • So we are back.

  • That was a lot.

  • Let me draw our attention to what the newest feature was just

  • a moment ago, this notion of malloc, memory allocation.

  • So recall that getString I claim as of today, all this time,

  • it's just returning to you the address of the string

  • that was gotten from the human.

  • malloc, similarly, has a return value.

  • And when you ask malloc for this many bytes-- maybe it's five, for emma,

  • plus the null terminator, malloc's purpose in life

  • is to return to you the address of the first byte of that memory as well.

  • So memory alloc means, go get me a chunk of memory somewhere,

  • hand me back a pointer there too.

  • And the onus is on me to remember that address,

  • as I'm doing here, by storing it in t.

  • But it turns out, now that we're taking the training wheels off,

  • unfortunately, we have to kind of do a bit more work ourselves.

  • And there's actually a latent bug in this program.

  • It turns out that I am mal-allocating memory with this

  • but I'm never actually freeing it.

  • The opposite of malloc is a function called free, whose purpose in life

  • is to hand back the memory that you asked for so that you

  • have plenty of memory available for other parts of your program

  • and so forth.

  • And long story short, if you've ever-- on your Mac or PC

  • --been running a program that maybe is a little bit

  • buggy --you might notice your computer is getting slower, and slower,

  • or maybe it even runs out of memory explicitly,

  • per some error message --that might be quite

  • simply, that the programmer of that program kept using mallc,

  • and malloc, and malloc to grow, and grow, and grow their use of memory,

  • but they never got around to freeing any of that memory.

  • So programs can run out of memory.

  • Your computer can run out of memory.

  • So it's good practice, therefore, to free any memory you're not using.

  • However, how do you find this mistake?

  • So we've got one final debugging tool for you.

  • This one's not CS50 specific like debug50.

  • This one is called Valgrind.

  • Unfortunately, it's not the easiest thing to understand at first glance.

  • So I'm going to go ahead and do this.

  • I'm going to run Valgrind on this program, dot slash copy, and hit Enter.

  • And unfort--

  • AUDIENCE: [INAUDIBLE]

  • [CHUCKLE]

  • [COUGH]

  • DAVID J. MALAN: Gotcha.

  • OK.

  • I'm going to go ahead and--

  • there we go.

  • AUDIENCE: [INAUDIBLE]

  • So what you missed was a very scary message.

  • So I'm going to go ahead and run Valgrind on dot slash copy.

  • We see this esoteric output up top and then my prompt for s--

  • because it's the same program.

  • It's prompting me for a string --so I'm going to give it emma, all lowercase,

  • and enter.

  • And you'll notice now, that there's some summary going on here

  • but also some mention of error.

  • So heap summary-- we'll come back to that in a bit --5 bytes in 1 blocks

  • are definitely lost in loss record 1 of 2.

  • Leak summary, I've got 5 bytes leaking in 1 blocks.

  • I mean, this is one of these programs in Linux-- the operating

  • system that we use, that's quite common in industry too --I mean, my god.

  • There's so-- there's so many more characters on the screen that

  • are actually enlightening for me.

  • Let's see if we can focus our attention on what matters.

  • Memory leaking, bad.

  • So how do we go about chasing down where memory is leaking?

  • Well, as before, we can use help50.

  • And in fact, help50 will analyze the output of Valgrind-- it's still

  • going to prompt me first string.

  • So I'm going to again, type in emma --it's going to look at that.

  • It's to ask for help.

  • And voila, highlighted in yellow, is a message that we, help50, recognize.

  • And notice our advices, looks like your program leaked 5 bytes of memory.

  • Did you forget to free memory that you allocated via malloc.

  • Take a closer look at line 10 of copy dot C.

  • Now once you've done this a couple of times and made the same mistake,

  • you can probably scroll up and glean for yourself where the error is.

  • We're not revealing any more information than is right in front of you.

  • And in fact, you can see here, ah, in main on copy dot C, line 10,

  • there's some kind of 5 bytes in 1 blocks are definitely lost.

  • So there's a lot of words there but it does draw attention to the right place.

  • So let me go ahead and scroll down, focus on line 10.

  • And indeed, line 10 is where I allocated the memory.

  • So it turns out the solution for this is quite simple.

  • Down here, I'm just going to go ahead and free

  • t, the address of the chunk of memory that malloc returned to me.

  • So I'm undoing the effects of allocating memory by de-allocating memory.

  • So now let me go ahead and run copy.

  • And if I run copy, it's not going to seem to run any differently.

  • It's still going to work correctly.

  • But now if I analyze it for mistakes with Valgrind, so Valgrind of dot

  • slash copy--

  • I'm going to again type in emma in all lowercase

  • and I cross my fingers --that indeed now, leaked

  • summary, 0 bytes in 0 blocks.

  • So unfortunately, even when all is well, it still spits out a mouthful.

  • But now I see no mention of blocks that are actually

  • leaked, at least in the top part here.

  • And we'll see more of this over the next couple of weeks

  • as we use it to chase down more complicated bugs.

  • But it's just another tool in the toolkit that allows

  • us to detect these kinds of errors.

  • Let me try one other thing actually.

  • This is a program that I wrote in advance.

  • This one is called memory dot C. And as always, these are all on the course's

  • website if you'd like to tinker after.

  • And it's a little pointless.

  • It's just meant for demonstration purposes.

  • So here is a program.

  • And it's copied from this online manual for Valgrind, the tool I just used.

  • So let's see what's going on.

  • Here I have main, at the bottom of my code.

  • I copied it.

  • I didn't use a prototype.

  • I just copied what they did.

  • And see here, it calls a function called f and then returns 0.

  • Well what does f do? f is this random function

  • up here that takes no inputs per the void.

  • And in English, how would you describe what's happening in line 7 now--

  • that we've introduced malloc and stars--

  • or pointers?

  • What's this doing?

  • Yeah.

  • AUDIENCE: It's allocating enough memory in [INAUDIBLE] for [INAUDIBLE]..

  • DAVID J. MALAN: Good.

  • Allocate enough memory for 10 integers-- and then

  • let me add-- elaborate on your words --and then

  • store the address of that chunk of memory

  • in a pointer called x, if you will.

  • So sizeof is new.

  • But it literally does what it says.

  • If you say sizeof open paren, close paren,

  • and then the name of a data type, it will tell you that an int is 4 bytes.

  • It will tell you that a long is 8 bytes.

  • It will tell you that a char is one byte.

  • It's just a dynamic way of avoiding having

  • to memorize those kinds of things.

  • So this just means give me 10 times the size

  • of an int, which happens to be 4 bytes.

  • So that means give me 10 times 4, or 40 bytes of memory.

  • That's effectively an array of memory that I can store integers in.

  • And malloc, per its definition, is going to return to me the address

  • of the first byte of that memory.

  • What is now scary about line 8, relatively speaking?

  • What might worry you with line 8, which is buggy, unfortunately?

  • Yeah.

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: Exactly.

  • I'm doing x bracket 10 and just arbitrarily storing the number 0.

  • Why?

  • Just because.

  • But 10 does not exist.

  • Right?

  • If I have 10 int, it's bracket 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, not bracket 10.

  • So this is an example of overflowing a buffer, so to speak.

  • Anytime you're talking about memory, any time

  • you're talking about an array of memory-- which this effectively

  • is, 10 integers, room for 10 integers back to back

  • to back --if you go one step too far, that's what's called a buffer overflow,

  • whereby the buffer is the array.

  • And in fact, this would make it even more clear.

  • Suppose I tried to go there, bracket 10,000.

  • That is definitely not among the bytes of memory I allocated.

  • That's definitely going beyond the boundaries of my array.

  • But so is it true that bracket 10 is one step too far.

  • So what's nice about Valgrind is this.

  • Let me go ahead and rerun Valgrind after compiling this memory program--

  • whoops --in my source directory.

  • Let me go ahead and make memory.

  • All right.

  • It compiled OK.

  • Valgrind dot slash memory--

  • and unfortunately, we're going to see some crazy arcane error

  • messages for a moment.

  • But let's see what it says.

  • Notice here, invalid write of size 4--

  • that sounds bad --and 40 bytes in one blocks are--

  • OK, they didn't really add an if condition in Valgrind.

  • --40 bytes in 1 blocks--

  • plural --are definitely lost.

  • So let's fix the second of those first.

  • Why am I leaking 40 bytes exactly?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: I'm never freeing it.

  • So I think I can get away with just doing this here, just free the memory

  • after I'm done using it-- even though I'm not really

  • using it for anything purposeful here.

  • So let me try this again.

  • Make memory, now let me do Valgrind dot slash memory.

  • And-- OK, better.

  • I don't see 40 bytes lost anymore.

  • So that's good.

  • But I do still have this issue.

  • But here's where it's sometimes useful to understand the various data

  • types and their sizes.

  • Invalid write of size 4.

  • Writing in a program just means changing a value.

  • And it mentioned line 8 here.

  • In what sense is this an invalid write of size 4?

  • Well, how big is an int?

  • Four bytes.

  • You're trying to change it arbitrarily to 0.

  • But I could have made that 50 or any other number.

  • But I'm trying to touch an int that should not

  • be within the memory I have allocated for myself.

  • I asked for 40 bytes, or 10 ints, but again, because arrays are zero indexed,

  • this is like going one beyond the boundary.

  • So let me fix this and just arbitrarily say, let's go touch that part of it.

  • Let me go here and do make memory.

  • Let me go ahead and do Valgrind dot slash memory.

  • And now, arcane output aside, notice that that error message went away too.

  • So this will be helpful over the coming couple of weeks

  • as we continue to use C to implement a number of programs

  • that now start to manipulate memory.

  • It's just a tool that helps you spot errors

  • that certainly, your TF might otherwise, or that

  • might be causing your program to crash, or to freeze, or to segfault--

  • if you've seen that yourselves before.

  • All right.

  • So that's just a tool.

  • Let's go ahead and transition now to some actual use cases here.

  • Recall from last week that it was pretty useful to be able to swap values.

  • Right?

  • With bubble sort, with selection sort, we needed to be able to exchange values

  • so that we could put things into the right place.

  • Turns out this is pretty straightforward.

  • Right?

  • And we can actually mimic this in the real world.

  • We just have opportunity for one volunteer today, one volunteer.

  • Can we get a little--

  • OK, over here.

  • Yeah.

  • What's your name?

  • FARRAH: Farrah.

  • DAVID J. MALAN: Sorry.

  • FARRAH: Farrah.

  • DAVID J. MALAN: Vera.

  • FARRAH: Farrah.

  • DAVID J. MALAN: Oh, here, come on up.

  • Then I can hear you up here.

  • OK, what's your name?

  • FARRAH: Farrah.

  • DAVID J. MALAN: Vera.

  • FARRAH: With an F.

  • DAVID J. MALAN: Fera.

  • FARRAH: Farrah.

  • DAVID J. MALAN: Farrah.

  • FARRAH: Yes.

  • DAVID J. MALAN: Farrah.

  • Yes, OK.

  • Good.

  • Come on up.

  • Still come up.

  • [CHUCKLE] Thank you.

  • Thank you.

  • [APPLAUSE]

  • [CHEERS]

  • OK, nice to meet you.

  • FARRAH: Hi, nice to meet you.

  • DAVID J. MALAN: Farrah.

  • FARRAH: Yes.

  • DAVID J. MALAN: OK.

  • So let's go ahead here.

  • Let me give you this so that you can be mic'd as well.

  • OK, so the goal at hand is here, I have two glasses of colored water.

  • So we have some purple here.

  • [WATER POURING]

  • OK.

  • And we've got some green here.

  • [WATER POURING]

  • And the only goal at hand is to do a very simple operation

  • like we needed to do quite a bit last week,

  • which is to swap two variables just like we swap two numbers.

  • So if you could go ahead and get the purple liquid in here

  • and the green liquid in here, go.

  • [CHUCKLE]

  • FARRAH: Is it OK if they overlap?

  • DAVID J. MALAN: Ideally, no.

  • We want to put only purple here, and only green here,

  • and no temporary store.

  • [LAUGHTER]

  • FARRAH: Oh.

  • DAVID J. MALAN: OK, but you're hesitating.

  • Why?

  • FARRAH: Because you told me they couldn't touch [CHUCKLE] and--

  • DAVID J. MALAN: Well, you can touch the glasses.

  • But you're hesitating to swap them, why?

  • [CLINK]

  • OK, that's just cheating.

  • [LAUGHTER]

  • [APPLAUSE]

  • [CHEERS]

  • OK, very clever.

  • Supposing you can't just move things around

  • in memory, what if I gave you a temporary variable.

  • FARRAH: OK.

  • DAVID J. MALAN: Does this help?

  • FARRAH: Yes.

  • DAVID J. MALAN: So how can we now get purple in there and green in there?

  • [CHUCKLE]

  • FARRAH: Can I put purple in here first?

  • DAVID J. MALAN: Sure.

  • FARRAH: I'm going to spill it.

  • DAVID J. MALAN: It's OK.

  • [WATER POURING]

  • OK.

  • So purple goes into the temporary, very nice.

  • [APPLAUSE]

  • FARRAH: Thank you.

  • DAVID J. MALAN: Green goes into what was purple.

  • FARRAH: Yes

  • DAVID J. MALAN: OK, good.

  • And then purple goes in-- from the temporary variable

  • into the original green glass.

  • Now, a proper round of applause if we could.

  • OK.

  • [APPLAUSE]

And in fact, we can visualize this even a little metaphorically.

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

CS50 2019--閱讀4--指點江山 (CS50 2019 - Lecture 4 - Pointers)

  • 4 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字