字幕列表 影片播放
[MUSIC PLAYING]
DAVID MALAN: So odds are you know someone who is or perhaps are
yourself a computer person.
But what does that mean?
Well I propose that a computer person is simply
someone who's really good at computational thinking.
Thinking methodically, or more formally, thinking algorithmically,
whereby you take inputs to some problem, you solve that problem
and produce output, but you do so step by step by step, along the way
defining any of the terms or ideas that you
need to use so that anyone else following along
can understand exactly what it is you're doing.
Computer science itself is really about computational thinking, then.
It's not about programming, with which it's
often conflated, although programming is a wonderfully useful tool with which
you can solve problems, but computer science itself
is about solving problems themselves.
And it can be in any number of domains, whether it's
software or hardware or artificial intelligence or machine learning
or data science or beyond.
That's the wonderful thing about computer science--
it's just so wonderfully applicable to other fields.
It's all about equipping you with a mental model,
so to speak, a set of concepts, as well as
some practical skills via which you can bring to bear solutions
to problems in other domains entirely.
Now what, though, does it mean to solve a problem?
I propose that we think about problem-solving quite simply as this.
If you've got some problem to be solved, you have an input.
And the goal is to solve that problem and come up with some solution, which
we'll call simply our output.
And in between those inputs and outputs hopefully
are the step-by-step instructions via which we can solve that problem.
Now how we're going to solve that problem we'll have to come back to.
For now we'll consider it to be the proverbial black box.
So we don't quite know or need to know right now just how it works,
but that it can work, and we'll come back to this soon.
But for now, we do need a way to represent these inputs,
and we do need a way to represent our outputs.
Right now I happen to be speaking English
and you happen to be hearing English, and so we've sort of
tacitly agreed that we'll represent our words using this particular language.
Now computers tend not to use English, they have their own system,
if you will, with what you might be familiar even if you don't quite
speak it yourself, and indeed, there are different ways
you can represent information.
For instance, let's consider the simplest of problems-- for instance,
counting the number of people in a room.
I could do this old-school.
I don't need computers or English, I can simply use my physical hand and say,
I'm starting with zero people and now I'll
count one, and then two, and then three, and then four, and then five,
and then--
I'm out of fingers, but I can at least employ my other hand, maybe
a couple of feet and get as high as 10, maybe even
20 using this physical system.
This is pretty equivalent, actually, to just keeping
score counting the old-school way with hash marks.
Right now we've not counted anything, but as soon
as I start drawing some lines, I might represent the number 1,
or 2 people in the room, or 3, or 4.
Or by convention I can draw a slash just to make super clear that this is now
representing 5, and then I can do another five of those
to count up the 10 and beyond.
Now these systems of hashes, as well as this system of counting on my fingers,
actually has a name.
That of unary notation.
Uno implying one, simply signifying that I only have one digit--
pun not intended-- via which I can count things.
This takes the form of my fingers on my hand or these hash marks on the screen.
But it doesn't get me very far, because of course
on my one hand with five fingers, I can only count up to five total.
And even on the board it feels pretty inefficient, because for every number
I want to count higher, I have to draw yet
another line on the screen, which just continue to accumulate.
But suppose we were a little more clever about this
and we thought about this problem from a different angle.
Maybe I could take into account not just the number of fingers
that I've raised, but the order in which I've raised them
or the pattern via which I've permuted them rather than just taking
into account how many of them are up.
If I take that approach, maybe I can actually count even higher than five
on just one hand before resorting to another hand or feet.
Now how could I do that?
Well instead of counting up 0 to 1 to 2 to 3 to 4
to 5, why do I instead start with some patterns instead?
So I'll still start with 0, I'll still start with 1, but now for 2,
I'm not just going to raise the second finger,
I'm instead going to go from 0 to 1 to 2,
putting down that first finger or thumb, simply representing 2
with this one finger.
And when I'm ready to represent the number 3,
I'll bring that finger back up.
And so using just two fingers now have I represented four possible values--
0, 1, 2, and 3.
From 0 to 3, of course, is four total values,
and so I've counted as high as 3 now using not three fingers, but only two.
How, though, can I count higher than 3?
Well, I just need to raise an additional finger.
And so let's start at 0, to 1, to 2, to 3, to 4-- whoops--
to 5, to 6, and now to 7.
And if it didn't hurt so much, I could continue counting as high as 8 and 9
and 10 and beyond by simply permuting my fingers in different ways.
So how high can I now count on this one hand?
Using unary notation with fingers just up or down, I can count as high as 5--
from 0 to 5.
But with these same five fingers, if I instead use not unary,
but let's call it binary notation where we're actually
taking into account whether the fingers are up or down and the pattern thereof,
well now it would seem that each of these fingers
can be in one of two possible states or configurations.
They can either be up or they can be down, or just one of them can be up
or can be down, or just one other can be up or down.
So if there's two possible states or configurations for each of these five
fingers, that's like 2 times 2 times 2 times 2 times 2,
or 2 to the power of 5 for five fingers, which is actually 32 possible patterns.
Which is to say with my one human hand, now can I
count using not unary, but binary from 0 to 31, which is 32 possible values.
Now what's the goal at hand?
It's quite simply to represent information.
Because if we have inputs and outputs, we
need to decide a priori how we're going to represent those values.
Now it turns out in computers, binary is wonderfully well-suited.
Because after all, what's the one physical input to any computer
you have, whether it's your laptop or desktop or these days,
even a phone in your pocket?
Well odds are, at the end of the day-- or even perhaps earlier in the day--
you need to physically plug that device into the wall and actually recharge it.
And somehow or other there's electricity or electrons
flowing from the wall that are stored in the battery in your device
if it has a battery, and that is the input that drives your entire machine's
operation.
And so if that's the only possible input,
it's pretty straightforward to imagine that you might have your computer
plugged in or not, power is flowing or not,
you've got a charge in your battery or you don't.
And so this binary world where something can be in two possible states--
on or off, plugged in or not--
is wonderfully conducive to now using binary in the context of computers
to represent information.
After all, if I want to represent a number,
I might simply turn on the lights.
And so my phone here has a light built in, and right now that light is off,
but if I consider this then to be off, we'll call it a 0.
And if I turn this flashlight now on, I can be said to be representing a 1.
And so simply with a simple light bulb can I represent two possible values
just like my finger can be up and down.
But computers don't necessarily use lots of little light bulbs,
they use something else that's called a transistor.
A transistor is a tiny little switch, and that switch, of course,
can be either on or off-- letting electricity in or not.
And so when you have a computer, inside of which
is a motherboard and CPU and bunches of other pieces of hardware,
one of the underlying components ultimately
are these things called transistors.
And computers today have millions of them inside, each of which
can be on or off-- so far more than my five fingers,
and that's ultimately how a computer can go about representing information.
So long as it has access to some physical supply of electricity can
it use that electricity to turn these switches
on or off in distinct patterns, thereby representing 0, 1, 2, 3, 4,
all the way up to 31 or even millions or billions or beyond.
And so computers use binary.
Computers speak binary-- only 0's and 1's or off and on,
because it's so wonderfully well-conducive to the physical world
on which they're ultimately based.
We humans, meanwhile, tend to communicate
not only in English and other spoken languages,
but in a numeric system called decimal.
So decimal, dec meaning 10, has 10 digits at its disposal--
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, whereas binary has just two--
0 and 1-- and unary, you might say, has just one--
1, the hash mark that I drew on the board.
But if I only have 0's and 1's at my disposal, how can
I possibly count to 2 or 3 or beyond?
Well, we need some way of mapping these 0's and 1's to the more familiar
numbers that we ourselves know.
So how can we go about doing this?
It's one thing to describe it in terms of patterns on your fingers,
but again, a device just has switches that it can turn on and off.
Well it turns out even if you don't yet speak binary
or have never really thought about doing so,
turns out that it's not all that dissimilar to the system of numbers
with which we grew up.
In fact, in grade school, you and I probably
grew up learning how to represent numbers in rather the same way.
For instance, consider this--
clearly the number 123.
But why is that?
It's not inherently the number 123.
Really, this is just three symbols on the screen.
We of course immediately recognize them as 1 and 2 and 3,
and we infer from that, oh, this is the number 123, but why is that exactly?
Well if you're like me, you probably grew up
representing numbers in the following way,
even if it's been quite some time since you thought about it like this.
With the symbol here, 1, 2, and 3, if you're like me,
you probably grew up thinking of this rightmost digit
as being in the ones place, and this middle digit
is being in the tens place, and this leftmost digit
as being in the one hundredths place.
And if it were a longer number, you'd have
the one thousandths place and ten thousandths place and beyond.
Now how do we get from 1, 2, 3 to 123?
Well, the arithmetic we were taught says to do 100 times 1, and then 10 times
2, and then 1 times 3, and then add all three of those products
together, giving us, of course, 100 plus 20 plus 3, which of course is 123.
Now that actually doesn't feel like much progress, because we started at 123
and we got to 123, but that's not quite that.
We actually started with the pattern of symbols--
1, 2, 3, like a pattern of fingers, and ultimately ascribe mathematical meaning
to those symbols as 123.
And it turns out computers, even though they speak binary and therefore only
have 0's and 1's at their disposal-- no 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9,
they count in exactly the same way, and they represent inputs and outputs
in fundamentally the same way.
Consider this.
Suppose that I have 3 bits at my disposal, 3 switches or light bulbs,
if you will.
And those light bulbs or switches are all initially off.
I claim that I'll represent those states as 0, 0, 0.
And together, those three 0's represent the decimal number we ourselves know
is 0.
Now why is that?
Well, if here if we consider this to be the ones place as before,
but the middle digit we won't consider being in the tens place,
we're going to consider this as being in the twos place,
and this leftmost digit as being in the fours place.
Now why?
Well in our decimal system, there's a reason
that we have the ones place, the tens place, the hundredths place,
and beyond--
those are actually all powers of 10.
10 to the 0, 10 to the 1, 10 to the 2, and beyond.
Now in binary, bi meaning two, you only have two digits at your disposal, 0
and 1, and so instead of using powers of 10, we're going to use powers of 2.
And indeed we have.
2 to the 0 is 1; 2 to the 1 is 2, 2 to the 2 is 4, and if we kept going,
we'd get 8, 16, 32, 64, and beyond.
And so this pattern of symbols or this pattern of light bulbs
or this pattern of switches, all of which I'm proposing are off,
simply represents now the number we humans know as 0.
Why?
Well we have 4 times 0 plus 2 times 0 plus 1 times 0,
which of course gives me 0 plus 0 plus 0, for a total numeric value of 0.
So in binary, 0, 0, 0 or all three switches off represents 0,
and indeed, that's what we would expect.
But we've instead we did this.
Suppose that using binary, and therefore these same three places--
ones place, twos place, and fours place and beyond,
we had a different pattern of symbols.
How, for instance, might we represent 1?
Well, if we had a pattern that's 0, 0, and 1 in binary, that's
going to translate, of course, to the decimal number we all know and agree
is the number 1.
How do I represent the number 2?
Well, if I start from scratch again, I'm going
to represent the decimal number we know as 2 as a pattern that's 0, 1, and 0.
A switch that's off, another that's on, another that's off.
Why?
4 times 0 is 0, 2 times 1 is 2, 1 times 0 is 0,
and so we're left with total of 2.
And if we want to represent 3, I don't even
need to change the value or state of those switches,
I can simply turn this one from off to on,
and now I'm representing the number 3.
If I want to count up to 4, I can simply start fresh in the fours place,
in the twos place, in the ones place here--
1, 0, and 0.
And then lastly, suppose that I want to count up even higher.
7 can simply be 1, 1, and 1, because that gives me a 4, a 2, and a 1.
4 plus 2 is 6, plus 1 is 7.
But what if I want to represent the number 8?
To represent the number 8, it would seem that I need a little more space.
And so I might need to introduce the eights place so that I
can have a 1, a 0, a 0, and a 0, eighth place being 2 to the 3, but to do this,
I do indeed need more hardware.
I need another switch, another light bulb, and another physical device
inside of my computer.
Now fortunately, our computers today have millions of these tiny transistors
and switches, so it's not all that problematic to count as high as 8,
but the implication is, that in order to represent larger and larger values,
do we need more physical storage?
And indeed, thematic and computer science is exactly that constraint.
You can only do so much computationally, perhaps,
if you have a finite amount of memory or a finite amount of hardware
or switches.
And we'll see, then, that there is non-trivial implications for how
much data you can represent, and how many problems, therefore,
you can solve.
So that's it for numbers.
Using just 0's and 1's can we count as high as we want and represent
any number of values so long as we have enough bits at our disposal.
But numbers only get us so far in computing.
After all, it's not just Excel and calculators
with which we use computers.
It's of course to send information textually,
and to use letters of the alphabet, and compose text messages and emails
and documents and more, but how, then, do you represent letters
if all you have at the end of the day are these 0's and 1's
underneath the hood, so to speak?
Well to do that, we all simply need to agree,
just like we've agreed to speak in here English for now,
on a particular system by which to represent letters of the alphabet.
Now we don't have all that many choices at hand because if we only have
0's and 1's-- switches that can be turned on and off--
that doesn't give us terribly many options.
But what we could do as humans is just collectively agree that you know what?
We are going to use a certain pattern of 0's and 1's to represent capital A.
And we're going to use a different pattern of 0's and 1's to represent
capital B. A different pattern of 0's and 1's
to represent C and D and all the way through Z. And you know what?
If we care about lowercase letters, we'll
use slightly different patterns of 0's and 1's to represent those as well.
And this is exactly what humans did decades ago.
We standardized on something called ASCII--
the American Standard Code for Information Interchange,
which was the result of people agreeing that we are going to use--
you know what?
The number 65 in decimal to represent capital A,
and the number 66 to represent capital B,
and the number 97 to represent lowercase a,
and 98 to represent lowercase b, and everything before and beyond.
And so this system, this code, ASCII, is simply a collective agreement
that whenever you're in the context of a text-based program and not
a numeric-based program, any patterns of bits,
such as those that might represent 65 in a calculator,
should instead be interpreted in the context of Microsoft Word
or an SMS message or iMessage as actually representing a letter.
So how bits are interpreted is entirely context-dependent.
Depending on the program you have opened, the same pattern of bits
might be interpreted in different ways--
as numbers or letters or even something else.
So how, then, do we represent so many other symbols
that aren't just A through Z?
Well it turns out that the American Standard
Code for Information Interchange was indeed
fairly skewed toward American representation of letters,
particularly English.
But there are so many more characters with accents and other symbology, let
alone punctuation marks, and in foreign languages,
too, are there hundreds if not thousands of distinct characters
that you need to represent ideally in order to send that text
and write that document.
But ASCII alone couldn't handle it.
Because ASCII tended to use 7 or maybe in some contexts 8 bits total,
and with 8 bits--
or binary digits, if you will--
can you represent how many possible values--
2 times 2 times 2 times 2 times 2 times 2 times 2 times
2 for 8 possible bits, each of which can be a 0 or 1.
That only gives you 256 possible letters.
Now that's more than enough for 26 letters of the English alphabet, A
through Z, both uppercase and lowercase, but once you
start adding in punctuation and accents and other characters,
you very quickly run out.
And so the world produced Unicode instead,
which doesn't just use 8 bits or 1 byte, if you will--
1 byte simply meaning quite simply a bit,
but instead introduced a variable length encoding of letters.
And so some letters might use 8 bits or 1 byte.
Other letters might use 2 bytes or 16 bits.
Other letters might use 3 bytes or even 4.
And via 4 possible bytes maximally--
2 to the 32 bits-- can you actually represent 4 billion possible values,
which is a huge number of values.
But how does the system of encoding actually work?
Let's consider by way of an example.
In both ASCII and Unicode, ASCII now being
a subset of Unicode insofar as it only uses 1 byte per letter,
you might represent A with--
capital A with 65, capital B with 66, and so forth.
And dot-dot-dot implies we've handled the rest as well and they all happen
to be back-to-back-to-back.
So suppose in the context of a text message
you happen to receive a pattern of 0's and 1's that if you actually
did the math in the ones place and the twos place
and so forth, actually worked out to be the decimal number 72, 73, 33.
Well what text message have you just received?
Of course at the end of the day, computers only understand binary,
and that information is sent from phone to phone
over the internet-- more on that another time.
But those 0's and 1's interpreted in the context of a text messaging program
will be interpreted not as binary, not as decimal,
but probably as ASCII or more generally Unicode.
And so if you've received a text message that says, 72, 73, 33, well what might
that spell?
Well if we consider our excerpt of a chart here,
that's 72 and 73, of course, translates to H and an I,
and you wouldn't know it from that preceding chart, but 33 it turns out--
just represents a very excited exclamation point,
because that, too, has a representation in ASCII and Unicode.
Now underneath the hood are patterns of 0's and 1's, but we don't need
to think about things at that level.
Indeed, what we've done here by introducing ASCII,
and in turn, Unicode, is introduce what's
called an abstraction, which is a technique in computer science via which
we can think about a problem more usefully at a higher level as opposed
to the lowest level in which it's actually implemented.
After all, it'd be much easier to receive a message that you view
as H-I-!
than actually as a pattern of 0's and 1's that you then
have to think about and translate in your head
to the actual message that was intended.
So an abstraction is just a way of taking a fairly low level if not very
complicated representation and thinking about it
in a higher, more useful level that lends itself
more readily to communication, or more generally to problem-solving.
Now what about all of those other characters on the screen?
After all, on a typical board might you have
letters and numbers and punctuation, but also these accented characters, too.
Well thanks to Unicode, you indeed have as many as 4 billion possibilities,
and it's no surprise, perhaps, that proliferating on the internet
these days are a little friendly faces called emojis
that you might have received yourself in text messages, emails,
or some other form.
Now these emojis here, though they might look like smiley faces,
are actually characters that companies like Google and Microsoft and Apple
have simply decided to represent pictorially.
Underneath the hood, anytime you receive one of these emojis,
you're actually receiving a pattern of bits--
0's and 1's that in the context of an email or text
message or some other program, are being interpreted as
and therefore displayed as a corresponding picture
that Google and Microsoft and Apple and others
have decided how to present visually.
And indeed, different platforms present these same patterns of 0's and 1's
differently depending on how the artist at those companies
have decided to draw these emoji.
Consider this one, for instance-- face with tears of joy is its formal name,
but more technically, this is encoded by a very specific decimal number.
Capital A is a 65, capital B is 66, what is this face with tears of joy?
Well it is the number 128,514.
After all, if you've got four billion possibilities,
surely we could use something within that range
to represent this face and any number of others.
But underneath the hood, if you actually looked
at a text message in which you received a face with tears of joy,
what you've actually received is this pattern of 0's and 1's somehow encoded
on your phone or your computer.
And so in the context of a calculator or Microsoft Excel
might we interpret some pattern of bits as numbers.
And in the context of a text messaging program
or Google Docs or Microsoft Word might we
interpret that same pattern of bits as instead characters or even emoji.
But what if we want to represent different types of media altogether?
After all, the world of computing is not composed of just numbers and letters.
There's also colors and images and videos and audio files and more,
but at the end of the day, if we only have at our disposal
0's and 1's, how do we go about representing all of those richer media?
Well as before, we humans just need to agree collectively
to represent those media using 0's and 1's in some standard way.
Perhaps one of the most familiar acronyms, even if you've never
really thought about what it is that you might see on a computer, is this one--
RGB.
It stands for red, green, and blue.
And this refers to the process by which many computers represent
colors on the screen.
In fact, what is a color?
Well, you might represent the color black and white simply by a single bit.
1 might represent black and 0 might represent white, or vice versa.
We in that case only have what's called 1-bit color, whereby
the bit is either on or off implying black or white or white or black.
But with RGB, you actually use more bits than just one.
Conventionally you would use 8 bits to represent
red, 8 bits to represent green, and 8 bits to represent blue.
So 24 bits total, the first 8 of which or first byte is red, second is green,
third is blue.
Now what does that mean?
Well in order to represent a color, you simply
decide, how much red do you want to add to the mix?
How much green and how much blue?
Combining essentially those wavelengths of light
in order to see a disparate color on the screen.
And if you think about red, green, and blue now as being single bytes each,
that means you have a possible range of values of 0 to 255.
After all, if a single byte is 8 bits, and with 8 bits, each of which
can be a 0 or 1-- so 2 times 2 times 2 times 2 times 2 times 2 times 2 times
2 gives you 256 values.
So if you start from 0, you can count as high as 255.
Suppose that you had 72 as your amount of red for a color, and 73 for green,
and 33 for blue.
That is to say in the context of a text messaging
program, this same pattern of bits--
72, 73, 33 presented as decimal represented a textural message of HI!.
But suppose you were to see that same pattern of bits
instead in a photo-editing program like Photoshop, or in the context of opening
an image on the screen.
Well that same pattern of bits, or in turn, decimal numbers, just
represents some amount of red, some amount of green,
and some amount of blue.
So 72 is a decent amount-- a medium amount
of red out of 255 total values, 73 is a medium amount of green,
and 33 is a little bit of blue.
If you combine now all of these three values in your mind's eye
by overlapping them, what you get is this shade of yellow.
Which is to say that if you wanted to represent
this particular shade of yellow would you
use three bytes whose values are respectively 72, 73, and 33.
And in the context of Photoshop or some other graphical program
would that pattern be interpreted as and displayed as this color on the screen.
Now this color is deliberately presented here as just 1 square--
1 dot, if you will, or more properly, 1 pixel.
Because what is an image?
Well if you've ever looked really close on your computer screen
or on your phone or even on your TV, you might actually
see all of the thousands of dots that compose it.
This is getting harder to do on today's modern hardware,
because if you have something like a retina display,
that means these dots or pixels or ever so tiny and ever so close
together that it's actually hard now for the human eye to see them,
but they are in fact there.
Any image on the screen, any photo on the screen
is really just a pattern of dots or pixels--
a grid of dots from left to right, top to bottom.
And every one of those dots or pixels on the screen
probably has 24 bits representing its particular color.
Indeed, a picture is essentially just a pattern of color so small
that you don't really see all of those dots,
but you see the net effect of a beautiful photo or image or something
else altogether.
So consider how we might see these.
When Apple or Google or Microsoft or any other company that supports emojis
presents those emojis on the screen, we of course
see them as images because Apple and Microsoft and Google
have decided what images shall be displayed when
it receives a certain pattern of bits.
But how are they storing those images and how
is your Mac or PC or phone displaying that image to you?
Well it's hard to see where those bits are let alone the pixels in it
at this particular size, but if I go ahead and zoom in and zoom in and zoom
in a little more still, you can begin to see
the individual dots or squares or pixels that compose even this one emoji.
And so just to display this smiling face, this face with tears of joy,
do you need 24 bits for this pixel, 24 bits for this pixel,
24 bits for this pixel, 24 bits for this pixel, and so on and so forth?
So if you've ever noticed that when you download an image or download a photo
or receive one in the mail, odds are it's not in the order of bytes.
It's probably kilobytes for thousands of bytes,
or megabytes for millions of bytes.
How in the world does a simple photo have so many bytes or in turn bits?
It's because every one of the dots in that image
takes up some amount of space.
So to represent yellow might we use a certain pattern of 0's and 1's or 3
bytes.
To represent black or gray or any other number--
colors on the screen might we represent those using different patterns still.
Now that's it for images, but what about videos?
A video is really just a sequence of images
being presented to you so quickly that you and your human eye
don't notice that you're really just watching image after image after image.
In fact, it's conventional in Hollywood and elsewhere
to display as many as 24 images or frames per second,
or perhaps even 30 frames or images per seconds.
And so even though they are in fact distinct images,
you are seeing them so quickly, and the characters on the screen
are moving within each frame or image ever
so slightly, that it creates ultimately the illusion of movement.
And if you think back to childhood, you might
have had one of those old-school flip books
on which was drawn some kind of cartoon one frame or image at a time.
And if you flipped through that flip book one page
ever so quickly again and again and again and again,
letting the physics of it all take its toll,
you might see a character or the picture in the book actually appearing to move,
but it's not.
All of those pages are just images and all of those images are not moving,
but when you see them so fast and so quickly in succession
does it appear to create that same movement.
So these things here, Animojis in Apple's world,
are just videos ultimately that track your facial movements and such,
but all they really are are a pattern of bits, each set of which
represents an image.
And each of those images is displayed to you on your phone
so quickly that you see the illusion of movement.
And so here again is an abstraction.
What is a video?
Well, a video is a collection of images.
Well what's an image?
An image is a collection of pixels.
Well what's a pixel?
A pixel is simply some number of bits representing some amount of red
and green and blue.
Well what is a bit?
It's simply a representation digitally of something being present,
like electricity or not.
And so it's much more powerful now to be operating
at this level of abstraction-- talking about videos for what they are, and not
getting into the weeds of the lowest level that at the end of the day,
it's really just some form of electricity
that we call bits, that we in turn call pixels,
that we in turn called images, that we in turn call videos.
And we can operate in a number of these layers of abstraction
in order to represent inputs to some problem.
To create a film, to convert a film, to display a film
might be just one of the problems to solve.
All right, so we now have a way to represent information,
whether it's binary, decimal, or some other approach altogether.
So it's time to solve problems.
But how do we go about solving problems?
What is inside of this black box?
Well that's where our so-called algorithms
are-- step-by-step instructions for solving some problem.
And to solve these problems, we simply need
to decide first what problem to solve.
Well let's consider a familiar one, like that of looking up someone's name
and number in a phone book.
Now these days, the phone book, of course, takes a more digital form,
but at the end of the day, these two formats are pretty equivalent.
After all, here in my hand is a phone book with maybe 1,000 pages,
on which are bunches of names and numbers sorted alphabetically
from A to Z.
On my phone, meanwhile, while I might have to click an icon instead,
do I have contacts that are similarly sorted
from A to Z by first name or last name, and touching any one of them
pulls up its number.
So this one, though, is a little more physical for us
to see the solution to a problem, like finding, say, Mike Smith.
Mike Smith may or may not be in this phone book, but if he is,
my goal quite simply is to find him.
So let me try this.
Let me try opening this book, and then one step
at a time looking down for Mike Smith.
And if I don't see him, move on to the next page.
If I still don't see him, move on to the next page, and next page,
and next page, and so forth, one page at a time.
I propose that we consider first-- is this algorithm correct?
Opening the phone book, looking down, turning page, and repeating.
Well, at some point, if Mike is in this phone book,
I am going to reach him, at which point I can call him.
And if Mike Smith is not in this phone book,
I'm eventually not going to reach him, but I
am going to hit the end of the phone book, at which point
I'll presumably stop.
But it doesn't feel terribly efficient, however correct it may be.
Why is it not efficient?
Well I'm starting at the beginning of the phone book.
I know it's alphabetical, and yet I'm turning one page
at a time through the A's, through the B's, through the C's and beyond,
just waiting and waiting until I finally reach Mike Smith.
Well how might I do this better?
Well, I learned in grade school not only how
to count by ones, but also an account by twos.
So instead of 1 page, 2 page, 3 page, 4, why don't instead do 2,
4, 6, 8, 10, 12-- flying through the phone book.
It certainly sounds faster, and it is, but is that approach correct?
Well, I propose that there could be a problem.
I might just get unlucky, and once I do reach the S section of the phone book,
what if by bad luck Mike Smith is sandwiched between two of those pages?
And therefore I hit the T section and Z section and run out of pages?
I might conclude incorrectly that Mike Smith is not,
in fact, in this phone book.
But do I have to throw out the algorithm altogether?
Probably not, right?
I could be a little clever here and improve this algorithm,
maintaining its efficiency, but fixing its correctness.
After all, if I do see a page on which there are last names starting with T,
while I can stop and say, well we know Mike is not farther than this
because Smith starts with S, so I can double-back hopefully
just one page or few, and therefore fix what's
otherwise an incorrect approach in this algorithm.
In fact, I can be even smarter than that.
As soon as I see someone's name that starts with S-N instead of S-M,
I can similarly flip back one page and therefore ensure
that I can go twice as fast through most of the phone book,
and only once I hit that section do I have to double-back one or terribly few
pages.
So I get the overall performance gains, and I also solve the problem.
But honestly, if you even use this technology anymore,
odds are you don't start at the beginning turning one or two
pages at a time.
If you're like me, you probably more instinctively
open up roughly to say the middle.
You'll look down and you realize, oh, I haven't found Mike yet
because I'm actually here in the M section.
But what do you know about where you've just jumped?
Well Mike Smith is indeed to the right here,
because he's in the name starting with S. But if I'm in the M section,
I do know something else.
I know that Mike is not in the left-hand section of this book--
he is not among the A through M's.
And so I can both literally and figuratively tear this problem in half,
throw half of it away, thereby leaving myself with just,
say, 500 pages out of 1,000.
So whereas that first algorithm I took one byte at a time out of it--
one page at a time, the second algorithm I took two bytes at a time out of it--
two at a time.
Well with this algorithm, I just took 500 bytes out of the problem
all at once, thereby getting closer to the output to this problem
much more quickly.
What do I then do?
Well, I simply am probably going to apply that same algorithm again
and again and again-- dividing and conquering this phone book,
if you will.
Jumping roughly to the middle now, looking down-- dah, darn it!
I ended up in the T section now, so I've gone too far,
but I know Mike is not in the right-hand side of this book--
he's not among the T's through Z's.
So I can again tear the problem in half, throw that half away,
thereby leaving me with just a quarter of the original problem,
say 250 pages down, down from 1,000, finding Mike ever
so much more quickly than before.
And if I repeat and I repeat and I repeat, each time actually looking down
looking for Mike, I should hopefully find myself ultimately left
with just a single page on which Mike either is or is not, at which point
I can call him or quit.
So just how much more quickly did I find Mike Smith via this algorithm
than the first two?
Well in a 1,000-page phone book, I might get unlucky and Mike's not even there,
and so in the worst case, I might look in as many as 1,000 pages.
In the second algorithm, I might also get unlucky and not ever find Mike
because he's just not there, and therefore look at as many 500 pages.
But in this third algorithm where I'm iteratively
dividing and conquering again and again, splitting the problem in half
so I go from a very large input to a much smaller to an even smaller problem
still again and again, how many times might I split that phone in half?
Well if I start off with roughly 1,000 pages and I split it in half,
that gives me 500, 250, 125, and so forth.
Turns out that I can split 1,000 pages roughly 10 times in total
before I'm left with just that final page.
And so consider the implications of that.
First algorithm-- 1,000 steps, maybe.
Second algorithm-- 500 steps, maybe.
Third algorithm-- 10 steps, maybe, to find Mike Smith before I can call
or decide he's not there.
And that's a pretty powerful technique.
And if we extrapolate from this relatively small example or phone book
to maybe a much larger phone book, say a phone book
that a bit nonsensically has 4 billion pages in it, well,
how many steps might those three algorithms take?
The first algorithm might take 4 billion.
The second algorithm might take 2 billion.
The third algorithm might take 4 billion divided
by 2 divided by 2 divided by 2--
you can divide 4 billion and half roughly 32 times only, at which point
you're left with just that one page.
So 4 billion steps for the first algorithm, 2 billion
steps for the second algorithm, but 32 steps for the third algorithm.
And simply by harnessing this intuition that we all probably already have
in formalizing it in more methodical form-- in algorithmic form,
if you will, can we solve problems using some of the building
blocks that we already have at our disposal.
And indeed, problem-solving in computer science
is all about implementing these algorithms
and applying them to problems of interest to you.
But how do we think about just how good this algorithm is?
It sort of stands to reason that it has to be minimally correct.
But how efficient is it?
I've been talking in terms of absolute numbers.
1,000, 500, or 10, or 4 billion, 2 billion, and 32.
But it would be nice to compare algorithms in a more general way
so that I can use the same searching algorithm, if you will,
to find not Mike Smith, but someone else.
Or to search not a phone book, but Google or search engine more generally.
And so computer scientists also have a more formal technique
via which to describe the efficiency of algorithms.
And we might consider these not even with numbers or formulas, per se,
but with simple visual representations thereof.
Here, for instance, is a simple plot.
On the x or horizontal axis is going to be the size of the problem.
How many pages are in the phone book, which we'll generally call n,
where n is a number for a computer scientist.
Much like a mathematician might go with x or y
or z, computer scientists might tend to start counting with n.
On the vertical or y-axis here, we'll say
this is going to be the time to solve a problem,
be it a phone book or something else.
And that might be seconds or minutes or page turns
or some other quantifiable unit of measure.
So how do I go about describing that first algorithm where
I turn one page at a time?
Well I propose that it manifests a linear relationship, or n, or a line,
a straight line on the chart.
Why is it a straight line?
Well, if Verizon or the local phone company, for instance,
next year adds just one more page to that phone book, that means for me,
it might take me one more unit of measure of time
to find someone like Mike Smith, because we've gone from 1,000 to 1001 pages.
So the slope of this line indeed can be straight or linear.
That second algorithm now where I'm flying through the phone book
twice as fast is really fundamentally the same algorithm or same shape.
It just so happens that for a given size of the problem,
it's not going to take this many seconds n,
it's going to take me n over 2 seconds because I'm
doing twice as much work at a time.
And so this line, too, is linear or straight,
but we might describe it in terms of n over 2, where n is the number of pages
but I only have to look at half of them except for maybe that very last one
if I double-back, and so its shape is fundamentally the same.
So better, but not fundamentally better, it would seem.
But that third and final algorithm has a fundamentally disparate shape,
depicted here with our green curved line which has a logarithmic slope to it.
So if n is the number of pages, and the algorithm in use
is division and conquering, it turns out that has a logarithmic slope.
Now what does that actually mean?
Well it turns out that if Verizon adds one more page to the phone book
next year, that is a drop in the bucket for that final algorithm where
I'm just tearing the problem in half.
But more powerfully, if Verizon doesn't just add one page,
perhaps to neighboring towns merged together and form one massive phone
book next year, going from 1,000 pages to 2,000 pages
all in one big, fat book, well how many more
steps does it take that third algorithm to search for anyone in it?
Just one.
Because with 2,000 pages, you can take 1,000-page byte out of that problem all
in one fell swoop even though it's twice as big as before.
And so even as the size of the problem gets bigger and bigger and bigger
and even farther away, the amount of time
it takes to solve a problem using that algorithm only
increases ever so slightly.
That's not a one-to-one or a one-to-two ratio,
it's instead said to be logarithmic.
And this, then, is a fundamentally different curve--
it's a fundamentally better algorithm in that the problem can grow exponentially
large, and the amount of time it takes to solve it itself
grows far less quickly than that.
Now it's one thing to talk about all these algorithms.
At some point we need to put them to paper,
or more technically, program them into a computer.
So how do we go about formalizing these step-by-step instructions in such a way
that all of us can agree that they are correct, all of us
can discuss their efficiency, and most importantly, all of us
can program a device to execute these steps for us?
Well let me propose that we first implement
an algorithm like that of searching a phone book in what's called pseudocode.
Pseudocode is not a formal programming language, it has no one definition.
It generally means using short, succinct phrases, often in English,
to describe your algorithm step-by-step-by-step in such a way that
you yourself understand it, and anyone reading it can also understand it.
Now along the way do you have to make certain assumptions,
because you need to decide at what layer of abstraction to actually operate.
And we'll take a look at where the opportunities there
are, but let me propose that we implement
this algorithm for searching for Mike Smith,
for instance, in the following way.
Now this is an algorithm, or really, a program, albeit written in pseudocode.
And even though written in pseudocode or English, it
manifests certain constructs that we're actually
going to see in any number of today's popular programming languages,
a programming language being an English-like language
that humans have decided can be used in a certain way
to tell a computer what to do.
Now this pseudocode is just for us humans,
but what are some of the fundamental constructs
in it that we'll see in actual programming languages?
Well first highlighted in yellow here are
all of the verbs or actions that more technically we'll call functions.
Statements that tell the computer, or in this case, human what to do.
Pickup, open to, look at, call, open or open, or quit-- all of them
calls to actions.
In the context of a computer with these same types of verbs,
we more traditionally call it functions.
What else is laying inside of this program?
Well these pictured here in yellow.
If, else if, else if, else, well these represent
what are called conditions or branches.
Forks in the road, so to speak, via which
you make a decision to go this way or this way or this way or that.
But how do you decide down which road to go?
You need what are called Boolean expressions.
Named after mathematician Boole, these Boolean expressions
are short phrases that either have yes or no answers,
or true or false answers, or if you will, 1 or 0-- on or off answers.
Smith is among names, Smith is earlier in book.
Smith is later in book-- each of them could be asked effectively
with a question mark to which the answer is
yes or no, and based on that answer can you
go down any one of those four roads.
And then lastly is this--
go back to step 2 or go back to step 2.
Highlighted in yellow here's an example of a loop, a deliberate
cycle that gets you to do something again and again
and again until some condition is no longer true
and you eventually quit or call Mike instead.
And so looping in a program allows you to write less code,
but do something again and again and again just
by referring back to some work that you've already done.
Now these constructs, though we've seen them in the context of pseudocode
and searching a phone book, can be found in any number of languages
and programs, be it Python or C++ or Java.
And so when it comes to programming a computer, or more
generally solving a problem with a computer,
we'll ultimately express ourselves in fundamentally
the same way, with functions, conditions, and Boolean expressions,
and loops, and many other constructs still,
but we'll do it in a way that's methodical, we'll do it in a way
wherein we've all agreed how to represent
the inputs to that process and the outputs thereto,
and ultimately use that representation and these layers of abstraction
in order to solve our problems.
But sometimes too much abstraction is not always a good thing.
Sometimes it's both necessary and quite helpful
to get into the weeds of the lower level implementation details, so to speak.
And let's see if we can't see this together.
Take out if you could a pen or pencil, as well as a sheet of paper,
and in just a moment allow me to program you, if you will.
I'll use a bit of verbal pseudocode to program
you to draw a particular picture on that sheet of paper.
The onus is on me to choose an appropriate level of abstraction
so that you're on the same page, if you will,
as I, so that ultimately what I program is what you implement correctly.
Let's try.
All right.
First, go ahead if you would and draw a square.
All right.
And next to that square to the right, go ahead if you could and draw a circle.
To the right of that circle, go ahead and draw a diagonal line
from bottom left to top right.
All right, and lastly, go ahead if you would
and draw a triangle to the right of that line.
All right.
Now hopefully you're quite proud of what you just drew,
and it's perfectly correct as you followed my instructions step-by-step.
So surely what you drew looks like this?
Well that's a square, to a circle to the right, a straight line from bottom
left to top right, to the right of which is a triangle.
Now with some probability, you probably didn't draw quite this.
Fortunately there's no one there to see, but where might you have gone wrong
or maybe where might I have gone wrong?
Well I didn't exactly specify just how big that square should be, let alone
where it should be on the page.
And you might have quite reasonably drawn it right in the center--
maybe the top left or the top right.
But if you drew in a place that I didn't intend,
you might not have had enough room for that actual square
unless you flipped perhaps the paper around.
As for the circle, I said draw it to the right,
but I didn't technically say that it should be just as wide or just as high
as that square, so there might have been a disparity there.
As for that straight line, I said from bottom left to top right.
I knew that I meant from the bottom of the circle
to the top right of what would then be the triangle,
but maybe you started from here and all the way up to here.
I was making an assumption, and that, perhaps,
was not necessarily precise enough.
And then lastly, the triangle.
Pretty sure that I learned growing up that there's
all sorts of different triangles.
Some have right angles, some have acute angles or obtuse angles,
or some triangles or even isosceles.
Now I happen to intend this one and you might have drawn just that,
but if you did, you probably got a bit lucky.
And unfortunately when it comes to computing,
you don't want to just get lucky when programming your computer,
you want to actually ensure that what you write is what it does.
And in fact, if you've ever seen on your Mac or PC
a spinning beach ball or hourglass indicating
that something is taking quite long, or the computer freezes or reboots,
odds are that's because some programmer, now like myself,
might not have been specific enough to the computer
as to what to do in all circumstances, and so sometimes unforeseen behavior
happens.
The computer starts thinking forever.
The computer reboots or crashes or freezes,
which is simply the result of one or more lines of code--
not in pseudocode, but in Java or C++ or something else--
not having anticipated some user's input or some particular direction.
So maybe it would help to operate not quite
at this high of a level of abstraction.
And indeed, all of these shapes are abstractions.
A square, a circle, a line, and a triangle are abstractions in the sense
that we've ascribed words to them that have some semantic meaning to us,
but what really is a square?
Well again, if we go back to grade school,
a square is a rectangle, all of whose sides are of equal length
and whose angles are all right angles.
But very quickly, my own eyes start to glaze over
when you get into the weeds of that implementation, and so all of us
have agreed since to just call it a square.
And same for circle and line and triangle,
but even then, there are variations.
So let's get a bit more into the weeds this time
and let me take an approach with the second of two pictures,
programming you this time at a much lower level of detail.
Go ahead and take out another sheet of paper or flip over the one
that you already have.
Toward the top middle of that sheet of paper, roughly one inch from the top,
go ahead and put down the tip of your pen or pencil.
Keeping your pen or pencil applied to the paper,
start to draw from the top middle of the paper
down toward the left at roughly a 45 degree angle for two or so inches
and then stop.
Keeping the tip of your pen or pencil on the paper,
go ahead and draw another line perhaps three inches straight
down toward the bottom of the paper, stopping two or so inches
shy of the bottom of the paper.
Then, if you would, hook a right, this time
moving at a 45-degree angle toward the bottom-middle of the paper,
stopping ultimately one or so inches from the bottom of the paper.
Then bear right yet again, this time going up at a 45-degree angle
upward toward the middle of the paper, this time extending a couple of inches,
and then stopping at the same height your previous line started at.
Now draw a vertical line about three inches high,
stopping exactly where two lines ago started.
And if you're on the same page, no pun intended, go ahead
and hook a left this time, going back a 45-degree angle toward the top-middle
of the page, hopefully closing the loop, if you will,
and connecting your very first dot to the last place you stopped.
At this point, you hopefully have a shape on your page
that has six total sides.
What comes next?
Go ahead and from the top-middle of the page follow the line you already drew
down to the left at a 45-degree angle and stop where you made a corner
before--
before you went straight down on the page.
And instead of following that vertical line,
go ahead and go down 45 degrees to the right a couple of inches,
stopping once you're directly below the top-middle of the dot you
initially drew.
Then go ahead and do two things from the point you're now at.
Go ahead and draw the opposite line up and to the right at a 45-degree angle,
connecting that line to one of the corners you previously made.
And then from that same initial point, draw a vertical line
down to the bottom-middle of the page where you had another angle still.
Lift up your pen or pencil and take pride in what you've just drawn,
because it is most surely exactly this.
Was it?
I don't know, because that was quite painful for me to say in this program
because I was operating at such a low level
really with no abstractions other than line and point and angle.
I was instead directing you much like a paintbrush on a page to go up and down
and left and right, collectively creating
what will be an abstraction that I would dare say call a cube, but a cube
that I did not name by name.
Had I said draw a cube, frankly we learned from last time
that that could have opened up multiple possibilities.
What should the angles be?
What should the size be?
What should the rotation there of it be?
And so an abstraction alone might not have been enough
if I wanted you to draw precisely this cube.
But if I do want you to be ever so correct and consistent with what
I had in mind on the screen here, I really
did need to go down to such a low level, so to speak,
that I was talking in terms of edges' length and angles therein.
And even then, I might not have been as clear
as I might have been-- more precise measurements and angles
and such might have been ideal so that you ultimately
end up precisely with the same thing.
So if you veered off-track, not a problem, my fault more so than yours.
But what's the takeaway here?
Well at some point it would be nice not to have
to write programs at such a low level again and again and again.
Wouldn't it be nice if just one of us, the best artist among us
could implement code--
write code-- that implements a cube, that
implements a square, a circle, a line, and a triangle?
But when using those drawing functions, if you will,
that someone else has implemented, you should probably
get into the habit of asking me additional questions.
You want a square.
Well how long should each edge be and where should its position
be on the page or the canvas?
Same goes for circle or line or triangle--
what are the angles and lengths and positioning be?
And if you can parametrize your functions in that way--
in other words, write your functions in such a way that they themselves
take input so that what a function does is at the end of the day produce output
subject to those inputs, we have then solved a problem.
Not necessarily the one problem at hand, but we've
solved other people's problems.
And indeed, that's what the world of programming ultimately is--
writing code to solve problems, and ideally
solving those problems once and only once.
And whether it's internally within your firm or your company,
writing code in such a way that other people-- maybe yourself
and your colleagues can then use it-- or in the case of open source software,
anyone in the world can use it.
So that in an ideal world, only one of us humans
ever need write code in some language to draw a circle or cube or square or line
or triangle, and everyone else in the world
can stand on his or her shoulders, use that code
to build their tool or product on top it.
This, then, was computational thinking.
You have inputs, you want outputs, and in between, you have algorithms.
You just first need to decide how to represent those inputs and outputs,
and you have to decide for yourself at what level of abstractions to work.