B1 中級 美國腔 99 分類 收藏
DAVID J. MALAN: All right, this is CS50 and this is a lecture four.
So we're here in beautiful Lowell Lecture Hall
and Sanders is in use today.
And we're joined by some friends that will soon
be clear and present in just a moment.
But before then, recall that last time we took a look at CS50 IDE.
This was a new web-based programming environment similar in spirit
to CS50 Sandbox and CS50 Lab, but added a few features.
For instance, what features did it add to you--
to your capabilities?
AUDIENCE: Debugger.
DAVID J. MALAN: What's that?
AUDIENCE: The debugger.
DAVID J. MALAN: The debugger.
So debug50, which opens that side panel that
allows you to step through your code, step by step, and see variables.
AUDIENCE: Check50.
DAVID J. MALAN: Sorry, say again?
AUDIENCE: Check50.
DAVID J. MALAN: Check50 as well, which is a CS50 specific tool that
allows you to check the correctness of your code
much like the teaching fellows would when providing feedback on it.
Running a series of tests that pretty much are
the same tests that a lot of the homework's
will encourage you yourself to run manually,
but it just automates the process.
And anything else?
DAVID J. MALAN: So that is true too.
There's a little hidden Easter egg that we don't use this semester,
but yes indeed.
If you look for a small puzzle piece, you
can actually convert your C code back to Scratch like puzzle pieces
and back and forth, and back to forth, thanks to Kareem and some of the team.
So that is there, but by now, it's probably better
to get comfortable with text as well.
So there's a couple of our other tools that we've
used over time of course besides check50 and debug50.
We've of course used printf and when is printf useful?
Like when might you want to use it beyond needing to just print something
because the problem set tells you to.
AUDIENCE: To find where your bug is.
DAVID J. MALAN: Yeah, so to find where your bug is.
If you just, kind of, want to print out variables, value or some kind of text
so you know what's going on and you don't necessarily
want to deploy debug50, you can do that.
When else?
AUDIENCE: If you have a long formula for something [INAUDIBLE]
and you want to see [INAUDIBLE].
AUDIENCE: How running-- like going through debug50 50 times.
Well, in real life-- so you might want to use printf
when you have maybe a nested loop, and you want to put a printf inside loop
so as to see when it kicks in.
Of course, you could use debug50, but you
might end up running debug50 or clicking next, next, next, next, next, next,
next so many times that gets a little tedious.
But do keep in mind, you can just put a breakpoint deeper into your code
as well and perhaps remove an earlier breakpoint as well.
And honestly, all the time, whether it's in C or other languages,
do I find myself occasionally using printf just to type out printf in here
just so that I can literally see if my code got to a certain point in here
to see if something's printed.
But the debugger you're going to find now
and hence forth so much more powerful, so much more versatile.
So if you haven't already gotten to the habit of using debug50 by all
means start and use those breakpoints to actually walk through your code
where you care to see what's going on.
So style50, of course, checks the style of your code much like the teaching
fellows might, and it shows you in red or green
what spaces you might want to delete, what spaces you might
want to add just to pretty things up.
So it's more readable for you and others.
And then what about help50?
When should you instinctively reach for help50?
AUDIENCE: When you don't understand an error message.
DAVID J. MALAN: Exactly.
Yeah, when you don't understand an error message.
So you're compiling something.
You're running a command.
It doesn't really quite work and you're seeing a cryptic error message.
Eventually, you'll get the muscle memory and the sort of exposure
to just know, oh, I remember what that means.
But until then, run help50 at the beginning of that same command,
and it's going to try to detect what your error is
and provide TF-like feedback on how to actually work around that.
You'll see two on the course's website is a wonderful handout made
by Emily Hong, one of our own teaching fellows,
that introduces all of these tools, and a few more,
and gets you into the habit of thinking about things.
It's kind of a flow chart.
If I have this problem, then do this or else
if I have this problem do this other thing.
So to check that out as well.
But today, let's introduce really the last, certainly for C,
of our command line tools that's going to help
you chase down problems in your code.
Last week, recall that we had talked about memory a lot.
We talked about malloc, allocating memory,
and we talked about freeing memory and the like.
But it turns out, you can do a lot of damage
when you start playing with memory.
In fact, probably by now, almost everyone-- segmentation fault?
Yeah, so that's just one of the errors that you might run into,
and frankly, you might have errors in your code now
and hence forth that have bugs but you don't even realize it
because you're just getting lucky.
And the program is just not crashing or it's not freezing,
but this can still happen.
And so Valgrind is a command line program that is probably
looks the scariest of the tools we've used,
but you can also use it with help50, that
just tries to find what are called memory leaks in your program.
Recall that last week we introduced malloc,
and malloc lets you allocate memory.
But if you don't free that memory, by literally calling the free function,
you're going to constantly ask your operating system, MacOS, Linux,
Windows, whatever, can I have more memory?
Can I have more memory?
Can I have more memory?
And if you never, literally, hand it back by calling free your computer
may very well slow down or freeze or crash.
And frankly, if you've ever had that happen on your Mac or PC, very likely
that's what some human accidentally did.
He or she just allocated more and more memory
but never really got around to freeing that memory.
So Valgrind can help you find those mistakes before you or your users do.
So let's do a quick example, let me go CS50 IDE, and let me go ahead
and make one new program here.
We'll call it memory.c because we'll see later today how
I might chase down those memory leaks.
But for now, let's start with something even simpler, which all of you
may be done by now, which is to accidentally touch memory
that you shouldn't, changing it, reading it and let's see what this might mean.
So let me do the familiar at the top here.
Include standard IO.
Well, let's not even do that yet.
Let's just do this first.
Let's do int, main(void), just to start a simple program
and in here let me go ahead and just call a function called f.
I don't really care what its name is for today.
I just want to call a function f, and then that's it.
Now this function f, let me go ahead and define it as follows, void f(void).
It's not going to do much of anything at all.
But let's suppose, just for the sake of discussion, that f's purpose in life
is just to allocate memory for whatever useful purpose,
but for now it's just for demonstration's sake.
So what's the function with which you can allocate memory?
So suppose I want malloc space for, I don't know,
something simple like just one integer.
We're just doing this for demonstration purposes,
or actually let's do more, 10 integers, 10 integers.
I could, of course, do-- well, give me 10, but how many bytes do what I want?
How many bytes do I need for 10 integers?
AUDIENCE: sizeof(int).
DAVID J. MALAN: Yeah, so I can do literally sizeof(int)
and most likely the size of an int is going to be?
DAVID J. MALAN: Four, probably.
On many systems today, it's just 4 bytes or 32 bits,
but you don't want to hard code that lest someone else's computer not use
those same values.
So the size of an int.
So 10 times the size of an int.
Malloc returns what type of data?
What does that hand me back?
DAVID J. MALAN: Yeah, returns an address or a pointer.
Specifically, the address, 100, 900, whatever, of the chunk of memory
it just allocated for you.
So if I want to keep that around, I need to declare a pointer.
Let's just call it x for today that stores that address.
Could call it x, y, z, whatever, but it's not an int that it's returning.
It's the address of an int.
And remember, that's what the star operator now means.
The address of some data type.
It's just a number.
All right, so now if I were to--
first, let's clean this up.
Turns out that you use malloc, I need to use stdlib.h.
We saw that last week, albeit briefly, and then of course
if I'm going to call f, what do I have to do to fix this code?
AUDIENCE: You need to declare.
DAVID J. MALAN: Yeah, I need to declare it up here,
or I could just move f's implementation up top.
So I think this works, even though this program at the moment
is completely stupid.
It doesn't do anything useful, but it will allocate memory.
And I'll do something with it as follows.
If I want to change the first value in this chunk of memory,
well how might I do that?
Well, I've asked the computer for 10 integers or rather space
for 10 integers.
What's interesting about malloc is that when
it returns a chunk of memory for you it's contiguous, back-to-back.
And when you hear contiguous or back-to-back,
what kind of data structure does that recall to mind?
AUDIENCE: An array.
DAVID J. MALAN: An array.
So it turns out we can treat this just random chunk of memory
like it's an array.
So if we want to go to the first location in that array of memory,
I can just do this and put in the number say 50.
Or if I want to go to the next location, I can do this.
Or if I want to do the next location, I can do this.
Or if I want to go to the last location, I might do this,
but is that good or bad?
DAVID J. MALAN: Why bad?
AUDIENCE: It's-- it's out of bounds
DAVID J. MALAN: Yeah, so it's out of bounds.
This is sort of week one style mistakes when it came to loops.
Recall, with for loops or while loops, you might go a little too far,
and that's fine.
But now we actually will see we have a tool that
can help us notice these things.
So hopefully, just visually, it's apparent that what I have going on here
is just-- on line 12, I have a variable x
that storing the address of that chunk of memory.
And then on line 13, I'm just trying to access location 10
and set the value 50 there.
But as you note, there is no location 10.
There's location 0, 1, 2, 3, all the way through 9, of course.
So how might we detect this with a program?
Well, let me go ahead and increase my terminal window just a bit
here, save my file, and let me go ahead and compile make memory.
OK, all is well.
It compiled without any error messages, and now
let me go ahead and run memory, enter.
All right, so that worked pretty well.
Let's actually be a little more explicit here just for good measure.
Let me go ahead and print something out.
So printf, %i for an integer, and let's make it just more explicit.
You inputted %i and then comma x bracket 10.
And what do I have to include you use printf?
AUDIENCE: stdio.h.
DAVID J. MALAN: Yeah, so stdio.
So let's just quickly add that, stdio.h, save.
All right, let me recompile this, make memory, enter.
And now let me go ahead and do ./memory.
Feels like it's a correct program.
And yet, for a couple of weeks now we've been claiming that mm-hmm,
don't do that.
Don't go beyond the boundaries of your array.
So how do we reconcile this?
Feels like buggy code or at least we've told you it's buggy code,
and yet it's working.
DAVID J. MALAN: That's a good way of putting it.
AUDIENCE: It's still very similar.
We want that.
AUDIENCE: So we can theoretically--
it just created a program.
DAVID J. MALAN: Yeah, and I think if I heard you correctly,
you said C doesn't scream if you go too far?
So that's a good way of putting it.
Like, you can get lucky in C. And you can
do something that is objectively, pedagogically, like technically wrong,
but the computer's not going to crash.
It's not going to freeze because you just get lucky.
Because often, for performance reasons, when
you allocate space for 10 integers, you're
actually going to get a chunk of memory back
that's a little bigger than you need.
It's just not safe to assume that it's bigger than you need,
but you might just get lucky.
And you might end up having more memory that you can technically get away
with touching or accessing or changing, and the computer's not going to notice.
But that's not safe because on someone else's Mac or PC,
their computer might just be operating a little bit differently than yours,
and bam, that bug is going to bite them and not you.
And those are the hardest, most annoying bugs to chase down as some of you
might have experienced.
It works on your computer but not a friends or vise versa.
These are the kinds of explanations for that.
So Valgrind can help us track down even these most subtle errors.
The program seems to be working.
Check50 or tools like it might even assume
that it's working because it is printing the right thing,
but let's take a look at what this program Valgrind thinks.
Let me increase the size of the terminal window here,
and go ahead and type in Valgrind ./memory.
So same program name ./memory but I'm prefixing it with the name Valgrind.
All right?
Unfortunately, Valgrind is really quite ugly,
and it prints out a whole bunch of stuff here.
So let's take a look.
At the very top, you'll see all these numbers on the left,
and that's just an unfortunate aesthetic.
But we do see some useful information.
Invalid read of size 4 and then it has these cryptic
looking letters and numbers.
What are those?
They're just addresses and hexadecimal.
It doesn't really matter what they are, but Valgrind
can tell us where the memory is that's acting up suspiciously.
You can then see next to that, that Valgrind is pointing
to function f on memory. c 15th line.
So that's perhaps helpful, and then main on line 8
because that's the function that was called.
So Valgrind is actually kind of nice in that it's showing us all the functions
that you called from bottom up, much like the stack from last week.
And so something's going wrong line 15, and if we go back to that,
let's see line 15 was--
well, sure enough.
I'm actually trying to access that memory location
and frankly I did it on line 14 as well.
So hopefully fixing one or both of those will address this issue.
And notice here, this frankly just gets overwhelming pretty quickly.
And then, oh, 40 bytes in one block are definitely lost in lost record.
I mean, this is the problem with Valgrind, honestly.
It was written some years ago, not particularly user friendly,
but that's fine we have a tool to address this.
Let me go ahead and rerun Valgrind with help50,
enter, and see if we can't just assist with this.
All right, so still the same amount of black and white input but down here now
help50 is noticing, oh, I can help you with an invalid write of size 4.
So it's still at the same location, but this time--
or rather same file, memory.c but line 14.
And we propose, looks like you're trying to modify 4 bytes of memory that
isn't yours, question mark.
Did you try to store something beyond the bounds of an array?
Take a closer look at line 14 of memory.c.
So hopefully, even though Valgrind's output is crazy esoteric,
at least that yellow output will point you toward, ah, line 14.
I'm indeed touching 4 bytes, an integer, that shouldn't be.
And so let's go ahead and fix this.
If I go into my program, and I don't do this.
Let's change it to location 9, and location 9 here and save.
Then let me go ahead and rerun Valgrind without help50.
All right, progress except--
Nope, no progress.
I skipped the step.
Yeah, I didn't recompile it.
A little puzzled why I saw the same thing.
So now let's rerun Valgrind and here it seems to be better.
So I don't see that same error message up
at the very top like we did before, but notice here, 40 bytes in one blocks.
OK, that was bad grammar in the program, but are definitely
lost in loss record 1 of 1.
So I still don't quite understand that.
No big deal.
Let's go ahead and run help50 and see what the second of two errors
apparently is here.
So here it's highlighting those lines.
40 bytes and one blocks are definitely lost, and looks like your program
leaked 40 bytes of memory.
Did you forget the free memory that you allocated with malloc?
Take a closer look at line 13 of memory.c.
So in this case line 13 indeed has a call to malloc.
So what's the fix for this problem?
DAVID J. MALAN: Per help50 or your own intuition?
What do I have to add to this program?
Yeah, free, and where does that go?
Right here.
So we can free the memory.
Why would this be bad?
DAVID J. MALAN: Exactly.
We're freeing the memory, which is like saying to the operating system,
I don't need this anymore.
And yet, two lines later we're using it again and again.
So bad.
We didn't do that mistake last week, but you should only
be freeing memory when, literally, you're
ready to free it up and give it back, which should probably
be at the end of the program.
So let me go ahead and re-save this, Open, up my terminal window,
recompile it this time, and now, let me run Valgrind one last time
without help50.
And still a little verbose, but zero errors, from zero contexts.
That sounds pretty good.
And moreover, it also explicitly says, all heap blocks were freed.
And recall that the heap, is that chunk of memory
that we drew visually up here, which is where malloc takes memory from.
So, done.
So this is kind of the mentality with which
to have when approaching the correctness of your code.
Like, it's one thing to run sample inputs, or run the program like I did.
All looked well.
It's one thing to run tools like check50, which we humans wrote.
But we too are fallible, certainly, and we might not think of anything.
And thankfully, smart humans have made tools, that at first glance,
might be a little hard to use.
Like debug 50, as is Valgrind now.
But they ultimately help you get your code 100% correct
without you having to struggle visually over just staring at the screen.
And we see this a lot in office hours, honestly.
A lot of students, to their credit, sort of reasoning through, staring
at the screen, just trying to understand what's going wrong,
but they're not taking any additional input other than the characters
on the screen.
You have so many tools that can feed you more and more hints along the way.
So do acquire those instincts.
Any questions on this?
AUDIENCE: Sir, if you had a main function that took arguments.
Would you run Valgrind with those arguments as well?
DAVID J. MALAN: Yes, indeed.
So Valgrind works just like debug 50, just like help50.
If you have command line arguments, just run them as usual,
but prefix your command with Valgrind, or maybe even help50 Valgrind,
to help one with the other.
Good question.
Other thoughts?
AUDIENCE: Where does the data go [INAUDIBLE]??
DAVID J. MALAN: Good question.
So at the end of the day, think about what's
inside the computer, which is just something like this.
So physically, it's obviously still there.
It's just being treated by the operating system--
Mac, OS, Windows, Linux, whatever, as like a pool of memory.
We keep drawing it as a grid that looks a little something like this.
So the operating systems job is to just keep track of which of those squares
is in use, thanks to malloc.
And which has been freed.
And so you can think of it as having little check
marks next to them saying, this is in use, this is in use,
these others are not in use.
So they just go back on the so-called free list into that pool of memory.
Good question.
If you take a higher level course on operating systems in fact,
or CS61 or 161 at Harvard, you'll actually build these kinds of things
And implement tools like, malloc, yourself.
AUDIENCE: So why did we have to allocate memory in this case, and what happens
DAVID J. MALAN: Good question.
Why did we have to allocate memory in this case?
We did not.
This was purely, as mentioned, for demonstration purposes.
If we had some program in which we wanted
to allocate some amount of memory, then this is how we might do it.
However, a cleaner way to do all of this,
would have been to say, hey, computer, give me 10 integers like this,
and not have to worry about memory management.
And that's where we began in week one, just using arrays on the stack,
so to speak.
Not using malloc at all.
So the point is only, that once you start using malloc, and free,
and memory more generally, you take on more responsibilities
than we did in week one.
Good question.
And the others?
All right.
So, turns out, there's one more tool, in all seriousness.
This is the thing.
[? DDB50. ?] So debug 50 is an allusion to a very popular tool called, GDB 50,
[? Gnu ?] debugger.
It's an older tool that you won't use at the command line,
but it's what makes debug 50 work.
Turns out, there's a thing.
And there's an actual Wikipedia article that you
might have clicked on in my email last night, called rubber duck debugging.
And frankly, you don't have to go as all out, as excessive, as we did here,
but the purpose of this technique, of rubber duck debugging,
is to keep, literally, like a rubber duck on your shelf, or on your desk.
And when you have a bug and you don't have the luxury of a teaching fellow,
or a roommate who took CS50, or a more technical friend who can help walk you
through your code, literally, start walking through your code
verbally, talking to the duck saying, well, online 2, I'm declaring main,
and on line 3, I'm allocating space for an array.
And then, on line 4, I'm calling-- ah!
That's what I'm doing wrong.
So if any of you have ever had that kind of moment, whether in office hours,
or alone, where you're either talking in your head,
or you're talking through your code to someone else.
And here, she doesn't even have to respond.
You just hear yourself saying the wrong thing, or having that aha moment.
You can approximate that by just keeping one of these little guys on your desk,
and have that conversation.
And it's actually not as crazy sounding as it actually is.
It's that process of just talking through your code logically,
step by step, in a way that you can't necessarily do in your own mind.
At least I can't.
When you hear yourself say something wrong,
or that didn't quite follow logically, bam, you
can actually have that aha moment.
So on the way out today, by all means, take any one of these ducks.
That took quite a long, time for [? Colten ?] to lay out today.
And we'll have more at office hours in the weeks to come, if you would like.
So some of you might recall such a duck from [? Currier ?] House
last year too, which was a cousin of his as well.
All right.
So that is rubber duck debugging.
Now, last week, recall that we began to take off training wheels.
We'd use for a few weeks, the CS50 library.
And that's kind of in the past now.
That was just a technique, a tool, via which
we could get user input a little more pleasantly, than if we actually
started dealing with memory early on.
And we revealed last week that a "string", quote, unquote,
is just what, underneath the hood in C?
Say again.
An array of characters.
And even more specifically, it's a synonym S-T-R-I-N-G for what actual
data type?
char star, as we've called it.
So a char star is just the computer scientists
way of describing a pointer to a character,
or rather the address of a character, which
is functionally equivalent to saying an array of memory, or sequence of memory.
But it's kind of the more precise, more technical way of describing it.
And so now that we know that we have char stars underneath the hood, well,
where is all of that coming from?
Well, indeed, it maps directly to that memory.
We keep pointing out that something like this is inside of your computer.
And we can think of the memory as just being chunks of memory,
all of whose bytes are numbered.
0 on up to 2 gigabytes, or 2 billion, whatever the value might be.
But of course last week, we pointed out that you think about this memory
not as being hardware per se, but as just being this pool of memory that's
divided into different regions.
The very top of your computer's memory, so to speak,
is what we call the text segment.
And what goes in the text segment of your computer's memory
when you're running a program?
Text is like, poor choice of words, frankly, but what is it?
Say again.
AUDIENCE: File Headers?
DAVID J. MALAN: Not the file headers, in this case.
This is in the context of running a program, not necessarily saving a file.
AUDIENCE: String literals.
DAVID J. MALAN: Not string literals here,
but they're nearby, actually, in memory.
AUDIENCE: Functions.
DAVID J. MALAN: Functions, closer.
The text segment of your computer's memory
is where, when you double click a program to run it,
or in Linux, when you do dot flash something, to run it.
That's where the zeros and ones of your actual program, the machine code,
that we talked about in week zero, is just loaded into RAM.
So recall from last week, that, you know, anything physical in this world--
hard drives, solid state drives, is slow.
So those devices are slow, but RAM, the stuff we keep pulling up on the screen,
is relatively fast.
If only because it has no moving parts.
It's purely electronic.
So when you double click a program on your Mac or PC,
or do dot slash something in Linux, that is
loading from a slow device, your hard drive,
where the data is stored long term, into RAM or memory,
where it can run much more quickly and pleasurably in terms of performance.
And so, what does this actually mean for us?
Well, it's got to go somewhere.
We just decided, humans, years ago that it's
going to go at the top, so to speak, of this chunk of memory.
Below that though, are the more dynamic regions of memory--
the stack and the heap.
And we said this a moment ago, and last week as well, what goes on the heap?
Or who uses the heap?
AUDIENCE: Dynamic memory.
DAVID J. MALAN: Dynamic memory.
Any time you call malloc, you're asking the operating system
for memory from the so-called heap.
Anytime you call free, you're sort of conceptually putting it back.
Like, it's not actually going anywhere.
You're just marking it as available for other functions and variables to use.
The stack, meanwhile, is used for what?
AUDIENCE: Local variables.
DAVID J. MALAN: Local variables and any of your functions.
So main, typically takes a sliver of memory at the bottom.
If main calls another function, it gets a sliver of memory above that.
If that function calls one, it gets a sliver of memory above that.
So they each have their own different regions of memory.
But of course, these arrows, both pointing at each other,
doesn't seem like such a good design.
But the reality, is bad things can happen.
You can allocate so much memory that, bam, the stack overflows the heap.
Or the heap overflows the stack.
Thus was born websites like Stack Overflow, and the like.
But that's just a reality.
If you have a finite amount of memory, at some point,
something's going to break.
Or the computer's going to have to say, mm-mm, no more memory.
You're going to have to quit some programs, or close some files,
or whatnot.
So that was only to say that that's how the memory is laid out.
And we started to explore this by way of a few programs.
This one here-- it's a little dark here.
This one here, was a swap function.
Now it's even darker.
It was a swap function that actually did swap two values, A and B.
But it didn't actually work in the way we intended.
What was broken about this swap function last week?
Like, I'm pretty sure it worked.
And when our brave volunteer came up and swapped the orange juice and the milk,
that worked.
So like, the logic was correct, but the program itself did not work.
AUDIENCE: It changed the values of the copy variables.
DAVID J. MALAN: Exactly.
It changed values in the copies of the variable.
So recall, that when main was the function
we called, and it had two values, x and y, that chunk of memory was here.
That chunk of memory was here.
And it had like the numbers 1 and 2.
But when it called the swap function, that got its own chunk of memory.
So main was at the bottom, swap was above that.
It had its own chunks of memory called, a and b, which
initially, got the values 1 and 2.
1 and 2 were indeed successfully swapped,
but that had no effect on x and y.
So we fixed that.
With the newer version of this program, of course,
it looked a lot more cryptic at first glance, but in English,
could someone just describe what it is that happens
in this example that was more correct?
Like, what does this program do line by line?
AUDIENCE: Instead of passing copies of the variables,
you pass pointers to their addresses.
DAVID J. MALAN: Exactly.
Instead of passing the values of the variables, thereby copying them,
it passes the addresses of those variables.
So that's like saying, I don't technically care where it is in memory,
but I do need to know that it is somewhere in memory.
So instead of passing an x in the number 1,
let's suppose that x is at location 100--
my go to example.
It's actually the number 100 that's going to go there.
And if y is at the location like, 104, well, it's
104 that's going to go there, which are not the values we want to swap,
but those are sort of like little maps, or breadcrumbs if you will,
that lead us to the right location.
So that when we execute this code, what we're ultimately
swapping in those three lines, is this and this, and all along the way,
recall, we're using a temporary variable there
that can be just thrown away after.
So that's what pointers allowed us to do.
And that's what allowed us to actually change values on the so-called stack,
even by calling on other function.
All right.
Any questions then, on where we left off last time with the stack and with swap?
All right.
So recall we introduced Binky as well, who lost his head at one point,
but why?
What went horribly, horribly awry with this scene from last week's film
from Stanford?
Binky was doing everything correctly, right?
Like, moving values.
42 was successful.
And then, yeah?
AUDIENCE: He tried to dereference something that
wasn't pointing to any actual address.
DAVID J. MALAN: Exactly.
He tried to dereference a pointer, an address, that wasn't actually pointing
to a valid address.
Recall that this was the line in code in question that was unlucky and bad.
Star y, means, go to the address in y, and do something to it.
Set it equal to the number 13.
But the problem was, that in the code we looked at last week,
all we did at the start was say, hey, computer give me a pointer to an int,
and call it x.
Do the same, and call it y.
Allocate space and point x at it.
But we never did the same for y.
So whereas x contained, last week, the address of an actual chunk of memory,
thanks to malloc, what did y contain at that point in the story?
The yellow line there.
What did y contain?
What value?
But it's not obvious because there's no mention of null in the program.
We might get lucky.
Null is just 0.
And sometimes we've seen that 0 are the default values in a program.
So maybe.
But I say, maybe, and I'm hedging why.
And it doesn't allocate-- well, allocate, is not quite the right word.
That suggests you are allocating actual memory.
It's a garbage value.
There's something there.
My Mac has been running for a few hours.
And your Macs, and PCs, and phones, are probably running all day long.
Or certainly when the lid is up.
And so, your memory is getting used, and unused, and used.
Like, lots of stuff is going on.
So your computer is not filled with all zeros or all ones.
If you look at it at some random point in the day,
it's filled with like bunches and bunches of zeros and ones
from previous programs that you quit long ago.
Windows you have in the background and the like.
So, the short of it is, when you're running
a program for the first time, that's been running now for some time,
it's going to get messy.
That big rectangle of memory is going to have some ones over here
some zeros over here and vise versa.
So they're garbage values, because those bytes have some values in them.
You just don't necessarily know what they are.
So the point is, you should never ever dereference a pointer
that you have not set yourself.
Maybe you will crash.
Maybe it won't crash.
Valgrind can help you find these things but sometimes.
But it's just not a safe operation.
And lastly, the last thing we introduced last week,
which will be the stepping stone for what problems we'll solve this week,
was struct.
So struck is kind of cool, in that you can design your own custom data
C is pretty limited out of the box, so to speak.
You only have chars and boules, and floats, and ints, and doubles,
and longs, and str--
well, we don't even have strings, per se.
So it doesn't really come with many features, like a lot of languages do.
Like Python, which we'll see in a few weeks.
So with struct in C, you have the ability
to solve some problems of your own.
For instance, with the struct, we can actually
start to implement our own features.
Or our own data types.
For instance, let me go up here.
And let me go ahead and create a file called say,
student, or rather destruct dot h.
So recall that dot h is a header file.
Thus far, you have used header files that other people made.
Like, CS50 dot h, and standard IO dot h, and standard [? lid ?] dot h,
but you can make your own.
Header files are just files that typically contain code that you
want to share across multiple programs.
And we'll see more of this in time.
So let me go ahead and just save this file.
And suppose that I want to represent a student in memory.
A student of course, is probably going to have what?
For instance, how about a string for their name,
a string for their dorm-- but string is kind of two weeks ago.
Lets call this char star.
And lets call name, char star.
And so you might want to associate like, multiple pieces of data with students.
And you don't want to have multiple variables, per se.
It would be nice to kind of encapsulate these together.
And recall at the very end of last week, we
saw this feature where you can define your own type,
with typedef, that is a structure itself.
And you can give it a name.
So in short, simply by executing this these lines of code,
you have just created your own custom data type.
It's now called student.
And every student in the world shall have, per this code, a name
and a dorm associated with them.
Now, why is this useful?
Well the program, we looked at the very end of last time looked
a little something like this.
Instruct zero dot c, we had the following,
I first allocated some amount of space for student.
I asked the user what's the enrollment in the class or whatnot?
That gives us an int.
And then, we allocated an array of type student, called students, plural.
This was an alternative, recall, to doing something
like this, string names enrollment, and string dorms enrollment.
Which would work.
You could have two separate arrays, and you'd just
have to remember that name zero and dorm zero is the same human.
But why do that if you can keep things together.
So with structs, we were able to do this.
Give me this many student structures, and call the whole array, students.
And the only new syntax we introduce to satisfy this goal, was what operator?
AUDIENCE: The dot.
DAVID J. MALAN: The dot.
So in the past, recall from like week two, we introduced arrays.
And arrays allow you to do square bracket notation.
So that is no different from a couple of weeks back.
But if your array is not storing just integers, or chars, or floats,
or whatever, it's actually storing a structure, like a student,
you can get at that student's name by literally just saying dot name.
And you can get at their dorm by doing dot dorm.
And then everything else is the same.
This is what's called, encapsulation.
And it's kind of like a fundamental principle of programming
where, if you have some real world entity, like a student,
and you want to represent students with code, yeah,
you can have a bunch of arrays that all have called names, dorms, emails, phone
numbers, but that just gets messy.
You can instead encapsulate all of that related Information about a student
into one data structure so that now you have, per week zero, an abstraction.
Like, a student is an abstraction.
And if we break that abstraction, what is a student actually?
Not in the real world, but in our code world here?
Student is an abstraction.
It's a useful word, all of us can kind of agree means something,
but technically, what does it apparently mean?
A student is actually a name in a dorm, which really kind of is
diminutive to everyone in this room, but we've distilled it in code
to just those two values.
So there we have encapsulation.
You're kind of encapsulating together multiple values.
And you're abstracting away just have a more useful term,
because no one is going to want to talk in terms of lines of code
to describe anything.
So, same topic as in the past.
So, now we have the ability to come up with our own custom data structures
it seems.
That we can store anything inside of them that we want.
So let's now see how poorly we've been designing
some things for the past few weeks.
So it turns out that much of the code, hopefully
we've been writing in recent weeks has been correct,
but we've been not necessarily designing solutions in the best way.
Recall that when we have this chunk of memory,
we've typically treated it as at most, an array.
So just a contiguous chunk of memory.
And thanks to this very simple mental model, do we get strings,
do we get arrays of students now.
But arrays aren't necessarily the best data structure in the world.
Like, what is a downside of an array if you've encountered ones thus far.
In C, what's a downside of an array?
DAVID J. MALAN: Can or cannot?
DAVID J. MALAN: You cannot.
That is true.
So in C, you cannot mix data types inside of an array.
They must all be ints, they must all be chars, they must all be students.
It's a bit of a white lie because technically, you
can have something called a void star, and you can actually map-- but yes.
That is true though, strictly speaking-- cannot mix data types.
Though frankly, even though other languages let you do that,
it's not necessarily the best design decision.
But sure, a limitation.
Other thoughts.
AUDIENCE: The size cannot change.
DAVID J. MALAN: The size cannot change.
Let's focus on that one.
Because that's sort of even more constraining it would seem.
So if you want an array for, say, two values, what do you do?
Well, you can do something like int, x, bracket, 2, semi-colon.
And what does that actually give you inside of your computer's memory?
It gives you some chunk that we'll draw a rectangle.
This is location 0.
This is location 1.
Suppose that, oh, a few minutes later, you change your mind.
Oh, darn, I just took a--
I want to type in a third value, or I want
to add another student to the array.
Where do you put that?
Well, you don't.
If you want to add a third value to an array of size 2,
what's your only option in C?
AUDIENCE: You make a new array.
DAVID J. MALAN: You make a new array.