Placeholder Image

字幕列表 影片播放

  • Okay, so welcome to lecture two of CS231N.

  • On Tuesday we, just recall, we, sort of, gave you

  • the big picture view of what is computer vision,

  • what is the history,

  • and a little bit of the overview of the class.

  • And today, we're really going to dive in, for the first time,

  • into the details.

  • And we'll start to see, in much more depth,

  • exactly how some of these learning algorithms

  • actually work in practice.

  • So, the first lecture of the class

  • is probably, sort of, the largest big picture vision.

  • And the majority of the lectures in this class

  • will be much more detail orientated,

  • much more focused on the specific mechanics,

  • of these different algorithms.

  • So, today we'll see our first learning algorithm

  • and that'll be really exciting, I think.

  • But, before we get to that,

  • I wanted to talk about a couple of administrative issues.

  • One, is Piazza.

  • So, I saw it when I checked yesterday,

  • it seemed like we had maybe 500 students

  • signed up on Piazza.

  • Which means that there are several hundred of you

  • who are not yet there.

  • So, we really want Piazza to be the main source

  • of communication between the students and the core staff.

  • So, we've gotten a lot of questions to the staff list

  • about project ideas or questions about midterm attendance

  • or poster session attendance.

  • And, any, sort of, questions like that

  • should really go to Piazza.

  • You'll probably get answers to your questions faster

  • on Piazza, because all the TAs are knowing to check that.

  • And it's, sort of, easy for emails to get lost

  • in the shuffle if you just send to the course list.

  • It's also come to my attention that some SCPD students

  • are having a bit of a hard time signing up for Piazza.

  • SCPD students are supposed to receive a

  • @stanford.edu email address.

  • So, once you get that email address,

  • then you can use the Stanford email to sign into Piazza.

  • Probably that doesn't affect those of you who are

  • sitting in the room right now,

  • but, for those students listening on SCPD.

  • The next administrative issue is about assignment one.

  • Assignment one will be up later today,

  • probably sometime this afternoon,

  • but I promise, before I go to sleep tonight,

  • it'll be up.

  • But, if you're getting a little bit antsy

  • and really want to start working on it right now,

  • then you can look at last year's version

  • of assignment one.

  • It'll be pretty much the same content.

  • We're just reshuffling it a little bit to make it,

  • like, for example, upgrading to work with Python 3,

  • rather than Python 2.7.

  • And some of these minor cosmetic changes,

  • but the content of the assignment will still be the same

  • as last year.

  • So, in this assignment you'll be implementing your own

  • k-nearest neighbor classifier,

  • which we're going to talk about in this lecture.

  • You'll also implement several different linear classifiers,

  • including the SVM and Softmax,

  • as well as a simple two-layer neural network.

  • And we'll cover all this content

  • over the next couple of lectures.

  • So, all of our assignments are using Python and NumPy.

  • If you aren't familiar with Python or NumPy,

  • then we have written a tutorial that you can find

  • on the course website to try and get you up to speed.

  • But, this is, actually, pretty important.

  • NumPy lets you write these very efficient vectorized

  • operations that let you do quite a lot of computation

  • in just a couple lines of code.

  • So this is super important for pretty much

  • all aspects of numerical computing and machine learning

  • and everything like that,

  • is efficiently implementing these vectorized operations.

  • And you'll get a lot of practice with this

  • on the first assignment.

  • So, for those of you who don't have a lot of experience

  • with Matlab or NumPy or other types of vectorized

  • tensor computation, I recommend that you start looking

  • at this assignment pretty early

  • and also, read carefully through the tutorial.

  • The other thing I wanted to talk about

  • is that we're happy to announce that

  • we're officially supported through Google Cloud

  • for this class.

  • So, Google Cloud is somewhat similar to Amazon AWS.

  • You can go and start virtual machines up in the cloud.

  • These virtual machines can have GPUs.

  • We're working on the tutorial for exactly how to use

  • Google Cloud and get it to work for the assignments.

  • But our intention is that you'll be able to just download

  • some image, and it'll be very seamless

  • for you to work on the assignments

  • on one of these instances on the cloud.

  • And because Google has, very generously,

  • supported this course,

  • we'll be able to distribute to each of you

  • coupons that let you use Google Cloud credits for free

  • for the class.

  • So you can feel free to use these for the assignments

  • and also for the course projects

  • when you want to start using GPUs and larger machines

  • and whatnot.

  • So, we'll post more details about that,

  • probably, on Piazza later today.

  • But, I just wanted to mention,

  • because I know there had been a couple of questions

  • about, can I use my laptop?

  • Do I have to run on corn?

  • Do I have to, whatever?

  • And the answer is that, you'll be able to run on

  • Google Cloud and we'll provide you some coupons for that.

  • Yeah, so,

  • those are, kind of, the major administrative issues

  • I wanted to talk about today.

  • And then, let's dive into the content.

  • So, the last lecture we talked a little bit

  • about this task of image classification,

  • which is really a core task in computer vision.

  • And this is something that we'll really focus on

  • throughout the course of the class.

  • Is, exactly,

  • how do we work on this image classification task?

  • So, a little bit more concretely,

  • when you're doing image classification,

  • your system receives some input image,

  • which is this cute cat in this example,

  • and the system is aware of some predetermined set

  • of categories or labels.

  • So, these might be, like, a dog or a cat or a truck

  • or a plane, and there's some fixed set of category labels,

  • and the job of the computer is to look at the picture

  • and assign it one of these fixed category labels.

  • This seems like a really easy problem,

  • because so much of your own visual system in your brain

  • is hardwired to doing these, sort of,

  • visual recognition tasks.

  • But this is actually a really, really hard problem

  • for a machine.

  • So, if you dig in and think about, actually,

  • what does a computer see when it looks at this image,

  • it definitely doesn't get this holistic idea of a cat

  • that you see when you look at it.

  • And the computer really is representing the image

  • as this gigantic grid of numbers.

  • So, the image might be something like 800 by 600 pixels.

  • And each pixel is represented by three numbers,

  • giving the red, green, and blue values for that pixel.

  • So, to the computer,

  • this is just a gigantic grid of numbers.

  • And it's very difficult to distill the cat-ness

  • out of this, like, giant array of thousands, or whatever,

  • very many different numbers.

  • So, we refer to this problem as the semantic gap.

  • This idea of a cat, or this label of a cat,

  • is a semantic label that we're assigning to this image,

  • and there's this huge gap between the semantic idea of a cat

  • and these pixel values that the computer is actually seeing.

  • And this is a really hard problem because

  • you can change the picture in very small, subtle ways

  • that will cause this pixel grid to change entirely.

  • So, for example, if we took this same cat,

  • and if the cat happened to sit still

  • and not even twitch, not move a muscle,

  • which is never going to happen,

  • but we moved the camera to the other side,

  • then every single grid, every single pixel,

  • in this giant grid of numbers

  • would be completely different.

  • But, somehow, it's still representing the same cat.

  • And our algorithms need to be robust to this.

  • But, not only viewpoint is one problem,

  • another is illumination.

  • There can be different lighting conditions going on

  • in the scene.

  • Whether the cat is appearing in this very dark, moody scene,

  • or like is this very bright, sunlit scene, it's still a cat,

  • and our algorithms need to be robust to that.

  • Objects can also deform.

  • I think cats are, maybe, among the more deformable

  • of animals that you might see out there.

  • And cats can really assume a lot of different, varied poses

  • and positions.

  • And our algorithms should be robust to these different

  • kinds of transforms.

  • There can also be problems of occlusion,

  • where you might only see part of a cat, like, just the face,

  • or in this extreme example, just a tail peeking out

  • from under the couch cushion.

  • But, in these cases, it's pretty easy for you, as a person,

  • to realize that this is probably a cat,

  • and you still recognize these images as cats.

  • And this is something that our algorithms

  • also must be robust to,

  • which is quite difficult, I think.

  • There can also be problems of background clutter,

  • where maybe the foreground object of the cat,

  • could actually look quite similar in appearance

  • to the background.

  • And this is another thing that we need to handle.

  • There's also this problem of intraclass variation,

  • that this one notion of cat-ness, actually spans a lot of

  • different visual appearances.

  • And cats can come in different shapes and sizes

  • and colors and ages.

  • And our algorithm, again, needs to work

  • and handle all these different variations.

  • So, this is actually a really, really challenging problem.

  • And it's sort of easy to forget how easy this is

  • because so much of your brain is specifically tuned

  • for dealing with these things.

  • But now if we want our computer programs

  • to deal with all of these problems, all simultaneously,

  • and not just for cats, by the way,

  • but for just about any object category you can imagine,

  • this is a fantastically challenging problem.

  • And it's, actually, somewhat miraculous

  • that this works at all, in my opinion.

  • But, actually, not only does it work,

  • but these things work very close to human accuracy

  • in some limited situations.

  • And take only hundreds of milliseconds to do so.

  • So, this is some pretty amazing, incredible technology,

  • in my opinion, and over the course of the rest of the class

  • we will really see what kinds of advancements

  • have made this possible.

  • So now, if you, kind of, think about

  • what is the API for writing an image classifier,

  • you might sit down and try to write a method in Python

  • like this.

  • Where you want to take in an image

  • and then do some crazy magic

  • and then, eventually, spit out this class label

  • to say cat or dog or whatnot.

  • And there's really no obvious way to do this, right?

  • If you're taking an algorithms class

  • and your task is to sort numbers

  • or compute a convex hull

  • or, even, do something like RSA encryption,

  • you, sort of, can write down an algorithm

  • and enumerate all the steps that need to happen

  • in order for this things to work.

  • But, when we're trying to recognize objects,

  • or recognize cats or images,

  • there's no really clear, explicit algorithm

  • that makes intuitive sense,

  • for how you might go about recognizing these objects.

  • So, this is, again, quite challenging,

  • if you think about,

  • if it was your first day programming

  • and you had to sit down and write this function,

  • I think most people would be in trouble.

  • That being said,

  • people have definitely made explicit attempts

  • to try to write, sort of, high-end coded rules

  • for recognizing different animals.

  • So, we touched on this a little bit in the last lecture,

  • but maybe one idea for cats is that,

  • we know that cats have ears and eyes and mouths and noses.

  • And we know that edges, from Hubel and Wiesel,

  • we know that edges are pretty important

  • when it comes to visual recognition.

  • So one thing we might try to do is

  • compute the edges of this image

  • and then go in and try to categorize all the different

  • corners and boundaries, and say that, if we have maybe

  • three lines meeting this way, then it might be a corner,

  • and an ear has one corner here and one corner there

  • and one corner there,

  • and then, kind of, write down this explicit set of rules

  • for recognizing cats.

  • But this turns out not to work very well.

  • One, it's super brittle.

  • And, two, say, if you want to start over for another

  • object category, and maybe not worry about cats,

  • but talk about trucks or dogs or fishes or something else,

  • then you need to start all over again.

  • So, this is really not a very scalable approach.

  • We want to come up with some algorithm, or some method,

  • for these recognition tasks

  • which scales much more naturally to all the variety

  • of objects in the world.

  • So, the insight that, sort of, makes this all work

  • is this idea of the data-driven approach.

  • Rather than sitting down and writing these hand-specified

  • rules to try to craft exactly what is a cat or a fish

  • or what have you,

  • instead, we'll go out onto the internet

  • and collect a large dataset of many, many cats

  • and many, many airplanes and many, many deer

  • and different things like this.

  • And we can actually use tools like Google Image Search,

  • or something like that,

  • to go out and collect a very large number of examples

  • of these different categories.

  • By the way, this actually takes quite a lot of effort

  • to go out and actually collect these datasets

  • but, luckily, there's a lot of really good, high quality

  • datasets out there already for you to use.

  • Then once we get this dataset,

  • we train this machine learning classifier

  • that is going to ingest all of the data,

  • summarize it in some way,

  • and then spit out a model

  • that summarizes the knowledge of how to recognize

  • these different object categories.

  • Then finally, we'll use this training model

  • and apply it on new images

  • that will then be able to recognize

  • cats and dogs and whatnot.

  • So here our API has changed a little bit.

  • Rather than a single function

  • that just inputs an image and recognizes a cat,

  • we have these two functions.

  • One, called, train, that's going to input images and labels

  • and then output a model,

  • and then, separately, another function called, predict,

  • which will input the model and than make predictions

  • for images.

  • And this is, kind of, the key insight

  • that allowed all these things to start working really well

  • over the last 10, 20 years or so.

  • So, this class is primarily about neural networks

  • and convolutional neural networks

  • and deep learning and all that,

  • but this idea of a data-driven approach is much more general

  • than just deep learning.

  • And I think it's useful to, sort of,

  • step through this process

  • for a very simple classifier first,

  • before we get to these big, complex ones.

  • So, probably, the simplest classifier you can imagine

  • is something we call nearest neighbor.

  • The algorithm is pretty dumb, honestly.

  • So, during the training step we won't do anything,

  • we'll just memorize all of the training data.

  • So this is very simple.

  • And now, during the prediction step,

  • we're going to take some new image

  • and go and try to find the most similar image

  • in the training data to that new image,

  • and now predict the label of that most similar image.

  • A very simple algorithm.

  • But it, sort of, has a lot of these nice properties

  • with respect to data-drivenness and whatnot.

  • So, to be a little bit more concrete,

  • you might imagine working on this dataset called CIFAR-10,

  • which is very commonly used in machine learning,

  • as kind of a small test case.

  • And you'll be working with this dataset on your homework.

  • So, the CIFAR-10 dataset gives you 10 different classes,

  • airplanes and automobiles and birds and cats and different

  • things like that.

  • And for each of those 10 categories

  • it provides 50,000 training images,

  • roughly evenly distributed across these 10 categories.

  • And then 10,000 additional testing images

  • that you're supposed to test your algorithm on.

  • So here's an example of applying this simple

  • nearest neighbor classifier to some of these test images

  • on CIFAR-10.

  • So, on this grid on the right,

  • for the left most column,

  • gives a test image in the CIFAR-10 dataset.

  • And now on the right, we've sorted the training images

  • and show the most similar training images

  • to each of these test examples.

  • And you can see that they look kind of visually similar

  • to the training images,

  • although they are not always correct, right?

  • So, maybe on the second row, we see that the testing,

  • this is kind of hard to see,

  • because these images are 32 by 32 pixels,

  • you need to really dive in there

  • and try to make your best guess.

  • But, this image is a dog and it's nearest neighbor is also

  • a dog, but this next one, I think is actually a deer

  • or a horse or something else.

  • But, you can see that it looks quite visually similar,

  • because there's kind of a white blob in the middle

  • and whatnot.

  • So, if we're applying the nearest neighbor algorithm

  • to this image,

  • we'll find the closest example in the training set.

  • And now, the closest example, we know it's label,

  • because it comes from the training set.

  • And now, we'll simply say that this testing image is also

  • a dog.

  • You can see from these examples that is probably not

  • going to work very well,

  • but it's still kind of a nice example to work through.

  • But then, one detail that we need to know is,

  • given a pair of images,

  • how can we actually compare them?

  • Because, if we're going to take our test image and compare it

  • to all the training images,

  • we actually have many different choices

  • for exactly what that comparison function should look like.

  • So, in the example in the previous slide,

  • we've used what's called the L1 distance,

  • also sometimes called the Manhattan distance.

  • So, this is a really sort of simple, easy idea

  • for comparing images.

  • And that's that we're going to just compare individual pixels

  • in these images.

  • So, supposing that our test image is maybe just a tiny

  • four by four image of pixel values,

  • then we're take this upper-left hand pixel

  • of the test image,

  • subtract off the value in the training image,

  • take the absolute value,

  • and get the difference in that pixel between the two images.

  • And then, sum all these up across all the pixels

  • in the image.

  • So, this is kind of a stupid way to compare images,

  • but it does some reasonable things sometimes.

  • But, this gives us a very concrete way

  • to measure the difference between two images.

  • And in this case, we have this difference of 456

  • between these two images.

  • So, here's some full Python code

  • for implementing this nearest neighbor classifier

  • and you can see it's pretty short and pretty concise

  • because we've made use of many of these vectorized

  • operations offered by NumPy.

  • So, here we can see that this training function,

  • that we talked about earlier,

  • is, again, very simple, in the case of nearest neighbor,

  • you just memorize the training data,

  • there's not really much to do here.

  • And now, at test time, we're going to take in our image

  • and then go in and compare using this L1 distance function,

  • our test image to each of these training examples

  • and find the most similar example in the training set.

  • And you can see that, we're actually able to do this

  • in just one or two lines of Python code

  • by utilizing these vectorized operations in NumPy.

  • So, this is something that you'll get practice with

  • on the first assignment.

  • So now, a couple questions about this simple classifier.

  • First, if we have N examples in our training set,

  • then how fast can we expect training and testing to be?

  • Well, training is probably constant

  • because we don't really need to do anything,

  • we just need to memorize the data.

  • And if you're just copying a pointer,

  • that's going to be constant time

  • no matter how big your dataset is.

  • But now, at test time we need to do this comparison stop

  • and compare our test image

  • to each of the N training examples in the dataset.

  • And this is actually quite slow.

  • So, this is actually somewhat backwards,

  • if you think about it.

  • Because, in practice,

  • we want our classifiers to be slow at training time

  • and then fast at testing time.

  • Because, you might imagine, that a classifier might go

  • and be trained in a data center somewhere

  • and you can afford to spend a lot of computation

  • at training time to make the classifier really good.

  • But then,

  • when you go and deploy the classifier at test time,

  • you want it to run on your mobile phone

  • or in a browser or some other low power device,

  • and you really want the testing time performance

  • of your classifier to be quite fast.

  • So, from this perspective, this nearest neighbor algorithm,

  • is, actually, a little bit backwards.

  • And we'll see that once we move to

  • convolutional neural networks,

  • and other types of parametric models,

  • they'll be the reverse of this.

  • Where you'll spend a lot of compute at training time,

  • but then they'll be quite fast at testing time.

  • So then, the question is,

  • what exactly does this nearest neighbor algorithm

  • look like when you apply it in practice?

  • So, here we've drawn, what we call the decision regions

  • of a nearest neighbor classifier.

  • So, here our training set consists of these points

  • in the two dimensional plane,

  • where the color of the point represents the category,

  • or the class label, of that point.

  • So, here we see we have five classes

  • and some blue ones up in the corner here,

  • some purple ones in the upper-right hand corner.

  • And now for each pixel in this entire plane,

  • we've gone and computed what is the nearest example

  • in these training data,

  • and then colored the point of the background

  • corresponding to what is the class label.

  • So, you can see that this nearest neighbor classifier

  • is just sort of carving up the space

  • and coloring the space according to the nearby points.

  • But this classifier is maybe not so great.

  • And by looking at this picture

  • we can start to see some of the problems that might come out

  • with a nearest neighbor classifier.

  • For one, this central region actually contains

  • mostly green points,

  • but one little yellow point in the middle.

  • But because we're just looking at the nearest neighbor,

  • this causes a little yellow island to appear

  • in this middle of this green cluster.

  • And that's, maybe, not so great.

  • Maybe those points actually should have been green.

  • And then, similarly we also see these, sort of, fingers,

  • like the green region pushing into the blue region,

  • again, due to the presence of one point,

  • which may have been noisy or spurious.

  • So, this kind of motivates a slight generalization

  • of this algorithm called k-nearest neighbors.

  • So rather than just looking for the single nearest neighbor,

  • instead we'll do something a little bit fancier

  • and find K of our nearest neighbors,

  • according to our distance metric,

  • and then take a vote among each of our neighbors.

  • And then predict the majority vote

  • among our neighbors.

  • You can imagine slightly more complex ways of doing this.

  • Maybe you'd vote weighted on the distance,

  • or something like that,

  • but the simplest thing that tends to work pretty well

  • is just taking a majority vote.

  • So here we've shown the exact same set of points

  • using this K=1 nearest neighbor classifier,

  • as well as K=3 and K=5 in the middle and on the right.

  • And once we move to K=3, you can see that that spurious

  • yellow point in the middle of the green cluster

  • is no longer causing the points near that region

  • to be classified as yellow.

  • Now this entire green portion in the middle

  • is all being classified as green.

  • You can also see that these fingers

  • of the red and blue regions

  • are starting to get smoothed out

  • due to this majority voting.

  • And then, once we move to the K=5 case,

  • then these decision boundaries

  • between the blue and red regions

  • have become quite smooth and quite nice.

  • So, generally when you're using nearest neighbors

  • classifiers,

  • you almost always want to use some value of K,

  • which is larger than one

  • because this tends to smooth out your decision

  • boundaries and lead to better results.

  • Question?

  • [student asking a question]

  • Yes, so the question is,

  • what is the deal with these white regions?

  • The white regions are where there was no majority

  • among the k-nearest neighbors.

  • You could imagine maybe doing something slightly fancier

  • and maybe taking a guess or randomly selecting among

  • the majority winners,

  • but for this simple example we're just coloring it white

  • to indicate there was no nearest neighbor

  • in those points.

  • Whenever we're thinking about computer vision

  • I think it's really useful to kind of flip

  • back and forth between several different viewpoints.

  • One, is this idea of high dimensional points in the plane,

  • and then the other is actually looking at concrete images.

  • Because the pixels of the image actually

  • allow us to think of these images as high dimensional

  • vectors.

  • And it's sort of useful to ping pong back and forth

  • between these two different viewpoints.

  • So then, sort of taking this k-nearest neighbor

  • and going back to the images

  • you can see that it's actually not very good.

  • Here I've colored in red and green

  • which images would actually be classified correctly

  • or incorrectly according to their nearest neighbor.

  • And you can see that it's really not very good.

  • But maybe if we used a larger value of K

  • then this would involve actually voting among

  • maybe the top three or the top five

  • or maybe even the whole row.

  • And you could imagine that that would end up being

  • a lot more robust to some of this noise that we see

  • when retrieving neighbors in this way.

  • So another choice we have when we're working

  • with the k-nearest neighbor algorithm

  • is determining exactly how we should be comparing

  • our different points.

  • For the examples so far we've just shown

  • we've talked about this L1 distance

  • which takes the sum of the absolute values

  • between the pixels.

  • But another common choice is the L2 or Euclidean distance

  • where you take the square root of the sum of the squares

  • and take this as your distance.

  • Choosing different distance metrics actually

  • is a pretty interesting topic

  • because different distance metrics

  • make different assumptions about the underlying

  • geometry or topology that you'd expect in the space.

  • So, this L1 distance, underneath this, this is actually

  • a circle according to the L1 distance

  • and it forms this square shape thing

  • around the origin.

  • Where each of the points on this, on the square,

  • is equidistant from the origin according to L1,

  • whereas with the L2 or Euclidean distance

  • then this circle is a familiar circle,

  • it looks like what you'd expect.

  • So one interesting thing to point out between these two

  • metrics in particular,

  • is that the L1 distance depends on your choice

  • of coordinates system.

  • So if you were to rotate the coordinate frame

  • that would actually change the L1 distance

  • between the points.

  • Whereas changing the coordinate frame in the L2 distance

  • doesn't matter, it's the same thing no matter what

  • your coordinate frame is.

  • Maybe if your input features, if the individual entries

  • in your vector have some important meaning

  • for your task,

  • then maybe somehow L1 might be a more natural fit.

  • But if it's just a generic vector in some space

  • and you don't know which of the different elements,

  • you don't know what they actually mean,

  • then maybe L2 is slightly more natural.

  • And another point here is that

  • by using different distance metrics

  • we can actually generalize the k-nearest neighbor

  • classifier to many, many different types of data,

  • not just vectors, not just images.

  • So, for example, imagine you wanted to classify pieces

  • of text, then the only thing you need to do

  • to use k-nearest neighbors

  • is to specify some distance function

  • that can measure distances between maybe two paragraphs

  • or two sentences or something like that.

  • So, simply by specifying different distance metrics

  • we can actually apply this algorithm very generally

  • to basically any type of data.

  • Even though it's a kind of simple algorithm,

  • in general, it's a very good thing to try first

  • when you're looking at a new problem.

  • So then, it's also kind of interesting to think about

  • what is actually happening geometrically

  • if we choose different distance metrics.

  • So here we see the same set of points on the left

  • using the L1, or Manhattan distance,

  • and then, on the right, using the familiar L2,

  • or Euclidean distance.

  • And you can see that the shapes of these decision

  • boundaries actually change quite a bit

  • between the two metrics.

  • So when you're looking at L1 these decision boundaries

  • tend to follow the coordinate axes.

  • And this is again because the L1 depends on our choice

  • of coordinate system.

  • Where the L2 sort of doesn't really care about the

  • coordinate axis, it just puts the boundaries

  • where they should fall naturally.

  • My confession is that each of these examples

  • that I've shown you is actually from this interactive

  • web demo that I built,

  • where you can go and play with this k-nearest neighbor

  • classifier on your own.

  • And this is really hard to work on a projector screen.

  • So maybe we'll do that on your own time.

  • So, let's just go back to here.

  • Man, this is kind of embarrassing.

  • Okay, that was way more trouble than it was worth.

  • So, let's skip this, but I encourage you

  • to go play with this in your browser.

  • It's actually pretty fun

  • and kind of nice to build intuition about

  • how the decision boundary changes

  • as you change the K

  • and change your distance metric

  • and all those sorts of things.

  • Okay, so then the question is

  • once you're actually trying to use this algorithm

  • in practice, there's several choices

  • you need to make.

  • We talked about choosing different values of K.

  • We talked about choosing different distance metrics.

  • And the question becomes

  • how do you actually make these choices for your problem

  • and for your data?

  • So, these choices, of things like K and the distance metric,

  • we call hyperparameters,

  • because they are not necessarily learned from the training

  • data,

  • instead these are choices about your algorithm that you make

  • ahead of time

  • and there's no way to learn them directly from the data.

  • So, the question is how do you set these things

  • in practice?

  • And they turn out to be very problem-dependent.

  • And the simple thing that most people do is simply

  • try different values of hyperparameters for your data

  • and for your problem, and figure out which one works best.

  • There's a question?

  • [student asking a question]

  • So, the question is, where L1 distance might be preferable

  • to using L2 distance?

  • I think it's mainly problem-dependent,

  • it's sort of difficult to say

  • in which cases you think one might be better

  • than the other.

  • but I think that because L1 has this sort of coordinate

  • dependency, it actually depends on the coordinate system

  • of your data,

  • if you know that you have a vector,

  • and maybe the individual elements of the vector

  • have meaning.

  • Like maybe you're classifying employees for some reason

  • and then the different elements of that vector correspond

  • to different features or aspects of an employee.

  • Like their salary or the number of years they've been

  • working at the company or something like that.

  • So I think when your individual elements actually

  • have some meaning,

  • is where I think maybe using L1 might make a little bit

  • more sense.

  • But in general, again, this is a hyperparameter

  • and it really depends on your problem and your data

  • so the best answer is just to try them both

  • and see what works better.

  • Even this idea of trying out different values

  • of hyperparameters and seeing what works best,

  • there are many different choices here.

  • What exactly does it mean to try hyperparameters

  • and see what works best?

  • Well, the first idea you might think of

  • is simply choosing the hyperparameters that give you

  • the best accuracy or best performance

  • on your training data.

  • This is actually a really terrible idea.

  • You should never do this.

  • In the concrete case of the nearest neighbor

  • classifier, for example,

  • if we set K=1, we will always classify the training data

  • perfectly.

  • So if we use this strategy we'll always pick K=1,

  • but, as we saw from the examples earlier,

  • in practice it seems that setting K equals to larger values

  • might cause us to misclassify some of the training data,

  • but, in fact, lead to better performance

  • on points that were not in the training data.

  • And ultimately in machine learning

  • we don't care about fitting the training data,

  • we really care about how our classifier,

  • or how our method,

  • will perform on unseen data after training.

  • So, this is a terrible idea, don't do this.

  • So, another idea that you might think of,

  • is maybe we'll take our full dataset

  • and we'll split it into some training data

  • and some test data.

  • And now I'll try training my algorithm with different

  • choices of hyperparameters on the training data

  • and then I'll go and apply that trained classifier

  • on the test data and now I will pick

  • the set of hyperparameters that cause me to perform best

  • on the test data.

  • This seems like maybe a more reasonable strategy,

  • but, in fact, this is also a terrible idea

  • and you should never do this.

  • Because, again, the point of machine learning systems

  • is that we want to know how our algorithm will perform.

  • So, the point of the test set

  • is to give us some estimate of how our method will do

  • on unseen data that's coming out from the wild.

  • And if we use this strategy of training many different

  • algorithms with different hyperparameters,

  • and then, selecting the one which does the best

  • on the test data,

  • then, it's possible, that we may have just picked

  • the right set of hyperparameters

  • that caused our algorithm to work quite well

  • on this testing set,

  • but now our performance on this test set

  • will no longer be representative

  • of our performance of new, unseen data.

  • So, again, you should not do this, this is a bad idea,

  • you'll get in trouble if you do this.

  • What is much more common, is to actually split your data

  • into three different sets.

  • You'll partition most of your data into a training set

  • and then you'll create a validation set

  • and a test set.

  • And now what we typically do is go and train our algorithm

  • with many different choices of hyperparameters

  • on the training set,

  • evaluate on the validation set,

  • and now pick the set of hyperparameters

  • which performs best on the validation set.

  • And now, after you've done all your development,

  • you've done all your debugging,

  • after you've dome everything,

  • then you'd take that best performing classifier

  • on the validation set

  • and run it once on the test set.

  • And now that's the number that goes into your paper,

  • that's the number that goes into your report,

  • that's the number that actually is telling you how

  • your algorithm is doing on unseen data.

  • And this is actually really, really important

  • that you keep a very strict separation between

  • the validation data and the test data.

  • So, for example, when we're working on research papers,

  • we typically only touch the test set

  • at the very last minute.

  • So, when I'm writing papers,

  • I tend to only touch the test set for my problem

  • in maybe the week before the deadline or so

  • to really insure that we're not

  • being dishonest here and we're not reporting a number

  • which is unfair.

  • So, this is actually super important

  • and you want to make sure to keep your test data

  • quite under control.

  • So another strategy for setting hyperparameters

  • is called cross validation.

  • And this is used a little bit more commonly

  • for small data sets, not used so much in deep learning.

  • So here the idea is we're going to take our test data,

  • or we're going to take our dataset,

  • as usual, hold out some test set to use at the very end,

  • and now, for the rest of the data,

  • rather than splitting it into a single training

  • and validation partition,

  • instead, we can split our training data

  • into many different folds.

  • And now, in this way, we've cycled through choosing which

  • fold is going to be the validation set.

  • So now, in this example,

  • we're using five fold cross validation,

  • so you would train your algorithm with one set of

  • hyperparameters on the first four folds,

  • evaluate the performance on fold four,

  • and now go and retrain your algorithm on folds

  • one, two, three, and five,

  • evaluate on fold four,

  • and cycle through all the different folds.

  • And, when you do it this way,

  • you get much higher confidence about

  • which hyperparameters are going to perform

  • more robustly.

  • So this is kind of the gold standard to use,

  • but, in practice in deep learning

  • when we're training large models

  • and training is very computationally expensive,

  • these doesn't get used too much in practice.

  • Question?

  • [student asking a question]

  • Yeah, so the question is,

  • a little bit more concretely,

  • what's the difference between the training and the

  • validation set?

  • So, if you think about the k-nearest neighbor classifier

  • then the training set is this set of images with labels

  • where we memorize the labels.

  • And now, to classify an image,

  • we're going to take the image and compare it to each element

  • in the training data,

  • and then transfer the label from the nearest training point.

  • So now our algorithm will memorize everything

  • in the training set,

  • and now we'll take each element of the validation set

  • and compare it to each element in the training data

  • and then use this to determine what is the accuracy

  • of our classifier when it's applied on the validation set.

  • So this is the distinction between training

  • and validation.

  • Where your algorithm is able to see the labels

  • of the training set,

  • but for the validation set,

  • your algorithm doesn't have direct access to the labels.

  • We only use the labels of the validation set

  • to check how well our algorithm is doing.

  • A question?

  • [student asking a question]

  • The question is, whether the test set,

  • is it possible that the test set might not be

  • representative of data out there in the wild?

  • This definitely can be a problem in practice,

  • the underlying statistical assumption here is that

  • your data are all independently and identically distributed,

  • so that all of your data points should be

  • drawn from the same underlying probability distribution.

  • Of course, in practice, this might not always be the case,

  • and you definitely can run into cases

  • where the test set might not be super representative

  • of what you see in the wild.

  • So this is kind of a problem that dataset creators and

  • dataset curators need to think about.

  • But when I'm creating datasets, for example,

  • one thing I do,

  • is I'll go and collect a whole bunch of data all at once,

  • using the exact same methodology for collecting the data,

  • and then afterwards you go and partition it randomly

  • between train and test.

  • One thing that can screw you up here is

  • maybe if you're collecting data over time

  • and you make the earlier data, that you collect first,

  • be the training data,

  • and the later data that you collect be the test data,

  • then you actually might run into this shift

  • that could cause problems.

  • But as long as this partition is random

  • among your entire set of data points,

  • then that's how we try to alleviate this problem

  • in practice.

  • So then, once you've gone through this

  • cross validation procedure,

  • then you end up with graphs that look something like this.

  • So here, on the X axis, we are showing the value of K

  • for a k-nearest neighbor classifier on some problem,

  • and now on the Y axis, we are showing what is the accuracy

  • of our classifier on some dataset

  • for different values of K.

  • And you can see that, in this case,

  • we've done five fold cross validation over the data,

  • so, for each value of K we have five different examples

  • of how well this algorithm is doing.

  • And, actually, going back to the question about

  • having some test sets that are better or worse

  • for your algorithm,

  • using K fold cross validation

  • is maybe one way to help quantify that a little bit.

  • And, in that, we can see the variance of how this algorithm

  • performs on different of the validation folds.

  • And that gives you some sense of,

  • not just what is the best,

  • but, also, what is the distribution of that performance.

  • So, whenever you're training machine learning models

  • you end up making plots like this,

  • where they show you what is your accuracy,

  • or your performance as a function of your hyperparameters,

  • and then you want to go and pick the model,

  • or the set of hyperparameters,

  • at the end of the day,

  • that performs the best on the validation set.

  • So, here we see that maybe about K=7 probably works

  • about best for this problem.

  • So, k-nearest neighbor classifiers on images

  • are actually almost never used in practice.

  • Because, with all of these problems that we've talked about.

  • So, one problem is that it's very slow at test time,

  • which is the reverse of what we want,

  • which we talked about earlier.

  • Another problem is that

  • these things like Euclidean distance, or L1 distance,

  • are really not a very good way

  • to measure distances between images.

  • These, sort of, vectorial distance functions

  • do not correspond very well to perceptual similarity

  • between images.

  • How you perceive differences between images.

  • So, in this example, we've constructed,

  • there's this image on the left of a girl,

  • and then three different distorted images on the right

  • where we've blocked out her mouth,

  • we've actually shifted down by a couple pixels,

  • or tinted the entire image blue.

  • And, actually, if you compute the Euclidean distance

  • between the original and the boxed,

  • the original and the shuffled,

  • and original in the tinted,

  • they all have the same L2 distance.

  • Which is, maybe, not so good

  • because it sort of gives you the sense that

  • the L2 distance is really not doing a very good job

  • at capturing these perceptional distances between images.

  • Another, sort of, problem with the k-nearest neighbor

  • classifier has to do with something we call the curse

  • of dimensionality.

  • So, if you recall back this viewpoint we had of the

  • k-nearest neighbor classifier,

  • it's sort of dropping paint around each of the training

  • data points and using that to sort of partition the space.

  • So that means that if we expect the k-nearest neighbor

  • classifier to work well,

  • we kind of need our training examples to cover the space

  • quite densely.

  • Otherwise our nearest neighbors could actually be quite far

  • away and might not actually be very similar to our testing

  • points.

  • And the problem is,

  • that actually densely covering the space,

  • means that we need a number of training examples,

  • which is exponential in the dimension of the problem.

  • So this is very bad, exponential growth is always bad,

  • basically, you're never going to get enough images

  • to densely cover this space of pixels

  • in this high dimensional space.

  • So that's maybe another thing to keep in mind

  • when you're thinking about using k-nearest neighbor.

  • So, kind of the summary is that we're using

  • k-nearest neighbor to introduce this idea

  • of image classification.

  • We have a training set of images and labels

  • and then we use that

  • to predict these labels on the test set.

  • Question?

  • [student asking a question]

  • Oh, sorry, the question is,

  • what was going on with this picture?

  • What are the green and the blue dots?

  • So here, we have some training samples

  • which are represented by points,

  • and the color of the dot maybe represents the category

  • of the point, of this training sample.

  • So, if we're in one dimension,

  • then you maybe only need four training samples

  • to densely cover the space,

  • but if we move to two dimensions,

  • then, we now need, four times four is 16 training examples

  • to densely cover this space.

  • And if we move to three, four, five, many more dimensions,

  • the number of training examples that we need

  • to densely cover the space,

  • grows exponentially with the dimension.

  • So, this is kind of giving you the sense,

  • that maybe in two dimensions

  • we might have this kind of funny curved shape,

  • or you might have sort of arbitrary manifolds of labels

  • in different dimensional spaces.

  • Because the k-nearest neighbor algorithm

  • doesn't really make any assumptions about these

  • underlying manifolds,

  • the only way it can perform properly

  • is if it has quite a dense sample of training points

  • to work with.

  • So, this is kind of the overview of k-nearest neighbors

  • and you'll get a chance to actually implement this

  • and try it out on images in the first assignment.

  • So, if there's any last minute questions about K and N,

  • I'm going to move on to the next topic.

  • Question?

  • [student is asking a question]

  • Sorry, say that again.

  • [student is asking a question]

  • Yeah, so the question is,

  • why do these images have the same L2 distance?

  • And the answer is that, I carefully constructed them

  • to have the same L2 distance.

  • [laughing]

  • But it's just giving you the sense that the L2 distance

  • is not a very good measure of similarity between images.

  • And these images are actually all different from

  • each other in quite disparate ways.

  • If you're using K and N,

  • then the only thing you have to measure distance

  • between images,

  • is this single distance metric.

  • And this kind of gives you an example where

  • that distance metric is actually not capturing

  • the full description of distance or difference

  • between images.

  • So, if this case, I just sort of carefully constructed these

  • translations and these offsets to match exactly.

  • Question?

  • [student asking a question]

  • So, the question is,

  • maybe this is actually good,

  • because all of these things

  • are actually having the same distance to the image.

  • That's maybe true for this example,

  • but I think you could also construct examples where

  • maybe we have two original images

  • and then by putting the boxes in the right places

  • or tinting them,

  • we could cause it to be nearer to pretty much

  • anything that you want, right?

  • Because in this example, we can kind of like do arbitrary

  • shifting and tinting

  • to kind of change these distances nearly arbitrarily

  • without changing the perceptional nature of these images.

  • So, I think that this can actually screw you up

  • if you have many different original images.

  • Question?

  • [student is asking a question]

  • The question is,

  • whether or not it's common in real-world cases

  • to go back and retrain the entire dataset

  • once you've found those best hyperparameters?

  • So, people do sometimes do this in practice,

  • but it's somewhat a matter of taste.

  • If you're really rushing for that deadline

  • and you've really got to get this model out the door

  • then, if it takes a long time to retrain the model

  • on the whole dataset,

  • then maybe you won't do it.

  • But if you have a little bit more time to spare

  • and a little bit more compute to spare,

  • and you want to squeeze out that maybe that extra 1%

  • of performance, then that is a trick you can use.

  • So we kind of saw that the k-nearest neighbor

  • has a lot of the nice properties

  • of machine learning algorithms,

  • but in practice it's not so great,

  • and really not used very much in images.

  • So the next thing I'd like to talk about is

  • linear classification.

  • And linear classification is, again, quite a simple learning

  • algorithm, but this will become super important

  • and help us build up to whole neural networks

  • and whole convolutional networks.

  • So, one analogy people often talk about

  • when working with neural networks

  • is we think of them as being kind of like Lego blocks.

  • That you can have different kinds of components

  • of neural networks and you can stick these components

  • together to build these large different towers of

  • convolutional networks.

  • One of the most basic building blocks that we'll see

  • in different types of deep learning applications

  • is this linear classifier.

  • So, I think it's actually really important to

  • have a good understanding of what's happening

  • with linear classification.

  • Because these will end up generalizing quite nicely

  • to whole neural networks.

  • So another example of kind of this modular nature

  • of neural networks

  • comes from some research in our own lab on image captioning,

  • just as a little bit of a preview.

  • So here the setup is that we want to input an image

  • and then output a descriptive sentence

  • describing the image.

  • And the way this kind of works is that

  • we have one convolutional neural network that's looking

  • at the image,

  • and a recurrent neural network that knows

  • about language.

  • And we can kind of just stick these two pieces together

  • like Lego blocks and train the whole thing together

  • and end up with a pretty cool system

  • that can do some non-trivial things.

  • And we'll work through the details of this model as we go

  • forward in the class,

  • but this just gives you the sense that,

  • these deep neural networks are kind of like Legos

  • and this linear classifier

  • is kind of like the most basic building blocks

  • of these giant networks.

  • But that's a little bit too exciting for lecture two,

  • so we have to go back to CIFAR-10 for the moment.

  • [laughing]

  • So, recall that CIFAR-10 has these 50,000 training examples,

  • each image is 32 by 32 pixels and three color channels.

  • In linear classification, we're going to take a bit

  • of a different approach from k-nearest neighbor.

  • So, the linear classifier is one of the simplest examples

  • of what we call a parametric model.

  • So now, our parametric model actually has two different

  • components.

  • It's going to take in this image, maybe, of a cat on the left,

  • and this,

  • that we usually write as X for our input data,

  • and also a set of parameters, or weights,

  • which is usually called W, also sometimes theta,

  • depending on the literature.

  • And now we're going to write down some function

  • which takes in both the data, X, and the parameters, W,

  • and this'll spit out now 10 numbers describing

  • what are the scores corresponding to each of those 10

  • categories in CIFAR-10.

  • With the interpretation that, like the larger score for cat,

  • indicates a larger probability of that input X being cat.

  • And now, a question?

  • [student asking a question]

  • Sorry, can you repeat that?

  • [student asking a question]

  • Oh, so the question is what is the three?

  • The three, in this example, corresponds to the three color

  • channels, red, green, and blue.

  • Because we typically work on color images,

  • that's nice information that you don't want to throw away.

  • So, in the k-nearest neighbor setup

  • there was no parameters, instead,

  • we just kind of keep around the whole training data,

  • the whole training set,

  • and use that at test time.

  • But now, in a parametric approach,

  • we're going to summarize our knowledge of the training data

  • and stick all that knowledge into these parameters, W.

  • And now, at test time, we no longer need the actual

  • training data, we can throw it away.

  • We only need these parameters, W, at test time.

  • So this allows our models to now be more efficient

  • and actually run on maybe small devices like phones.

  • So, kind of, the whole story in deep learning

  • is coming up with the right structure for this

  • function, F.

  • You can imagine writing down different functional forms

  • for how to combine weights and data in different

  • complex ways, and these could correspond to different

  • network architectures.

  • But the simplest possible example

  • of combining these two things

  • is just, maybe, to multiply them.

  • And this is a linear classifier.

  • So here our F of X, W is just equal to the W times X.

  • Probably the simplest equation you can imagine.

  • So here,

  • if you kind of unpack the dimensions of these things,

  • we recall that our image was maybe 32 by 32 by 3 values.

  • So then, we're going to take those values and then stretch

  • them out into a long column vector

  • that has 3,072 by one entries.

  • And now we want to end up with 10 class scores.

  • We want to end up with 10 numbers for this image

  • giving us the scores for each of the 10 categories.

  • Which means that now our matrix, W,

  • needs to be ten by 3072.

  • So that once we multiply these two things out

  • then we'll end up with a single column vector

  • 10 by one, giving us our 10 class scores.

  • Also sometimes, you'll typically see this,

  • we'll often add a bias term

  • which will be a constant vector of 10 elements

  • that does not interact with the training data,

  • and instead just gives us some sort of data independent

  • preferences for some classes over another.

  • So you might imagine that if you're dataset was

  • unbalanced and had many more cats than dogs,

  • for example, then the bias elements corresponding

  • to cat would be higher than the other ones.

  • So if you kind of think about pictorially

  • what this function is doing,

  • in this figure we have an example on the left

  • of a simple image with just a two by two image,

  • so it has four pixels total.

  • So the way that the linear classifier works

  • is that we take this two by two image,

  • we stretch it out into a column vector

  • with four elements,

  • and now, in this example, we are just restricting to

  • three classes, cat, dog, and ship,

  • because you can't fit 10 on a slide,

  • and now our weight matrix is going to be four by three,

  • so we have four pixels and three classes.

  • And now, again, we have a three element bias vector

  • that gives us data independent bias terms

  • for each category.

  • Now we see that the cat score is going to be the enter

  • product between the pixels of our image

  • and this row in the weight matrix

  • added together with this bias term.

  • So, when you look at it this way

  • you can kind of understand linear classification

  • as almost a template matching approach.

  • Where each of the rows in this matrix

  • correspond to some template of the image.

  • And now the enter product or dot product

  • between the row of the matrix and the column

  • giving the pixels of the image,

  • computing this dot product kind of gives us

  • a similarity between this template for the class

  • and the pixels of our image.

  • And then bias just, again, gives you this data

  • independence scaling offset to each of the classes.

  • If we think about linear classification

  • from this viewpoint of template matching

  • we can actually take the rows of that weight matrix

  • and unravel them back into images

  • and actually visualize those templates as images.

  • And this gives us some sense of what a linear

  • classifier might actually be doing

  • to try to understand our data.

  • So, in this example, we've gone ahead and trained

  • a linear classifier on our images.

  • And now on the bottom we're visualizing

  • what are those rows in that learned weight matrix

  • corresponding to each of the 10 categories

  • in CIFAR-10.

  • And in this way we kind of get a sense for what's

  • going on in these images.

  • So, for example, in the left, on the bottom left,

  • we see the template for the plane class,

  • kind of consists of this like blue blob,

  • this kind of blobby thing in the middle

  • and maybe blue in the background,

  • which gives you the sense that this linear classifier

  • for plane is maybe looking for blue stuff

  • and blobby stuff, and those features are going to cause

  • the classifier to like planes more.

  • Or if we look at this car example,

  • we kind of see that there's a red blobby thing

  • through the middle and a blue blobby thing at the top

  • that maybe is kind of a blurry windshield.

  • But this is a little bit weird,

  • this doesn't really look like a car.

  • No individual car actually looks like this.

  • So the problem is that the linear classifier

  • is only learning one template for each class.

  • So if there's sort of variations in how that class

  • might appear,

  • it's trying to average out all those different variations,

  • all those different appearances,

  • and use just one single template

  • to recognize each of those categories.

  • We can also see this pretty explicitly in the horse

  • classifier.

  • So in the horse classifier we see green stuff on the bottom

  • because horses are usually on grass.

  • And then, if you look carefully, the horse actually

  • seems to have maybe two heads, one head on each side.

  • And I've never seen a horse with two heads.

  • But the linear classifier is just doing the best

  • that it can, because it's only allowed to learn

  • one template per category.

  • And as we move forward into neural networks

  • and more complex models,

  • we'll be able to achieve much better accuracy

  • because they no longer have this restriction

  • of just learning a single template per category.

  • Another viewpoint of the linear classifier

  • is to go back to this idea of images

  • as points and high dimensional space.

  • And you can imagine that each of our images

  • is something like a point in this high dimensional space.

  • And now the linear classifier is putting in these

  • linear decision boundaries to try to draw linear

  • separation between one category

  • and the rest of the categories.

  • So maybe up on the upper-left hand side

  • we see these training examples of airplanes

  • and throughout the process of training

  • the linear classier will go and try to draw this

  • blue line to separate out with a single line

  • the airplane class from all the rest of the classes.

  • And it's actually kind of fun if you watch during

  • the training process these lines will start out randomly

  • and then go and snap into place to try to separate

  • the data properly.

  • But when you think about linear classification

  • in this way, from this high dimensional point of view,

  • you can start to see again what are some of the problems

  • that might come up with linear classification.

  • And it's not too hard to construct examples

  • of datasets where a linear classifier will totally fail.

  • So, one example, on the left here,

  • is that, suppose we have a dataset of two categories,

  • and these are all maybe somewhat artificial,

  • but maybe our dataset has two categories,

  • blue and red.

  • And the blue categories are the number of pixels

  • in the image, which are greater than zero, is odd.

  • And anything where the number of pixels greater

  • than zero is even, we want to classify as the red category.

  • So if you actually go and draw what these different

  • decisions regions look like in the plane,

  • you can see that our blue class with an odd number of pixels

  • is going to be these two quadrants in the plane,

  • and even will be the opposite two quadrants.

  • So now, there's no way that we can draw a single linear line

  • to separate the blue from the red.

  • So this would be an example where a linear classifier

  • would really struggle.

  • And this is maybe not such an artificial thing after all.

  • Instead of counting pixels,

  • maybe we're actually trying to count whether the number

  • of animals or people in an image is odd or even.

  • So this kind of a parity problem

  • of separating odds from evens

  • is something that linear classification

  • really struggles with traditionally.

  • Other situations where a linear classifier really struggles

  • are multimodal situations.

  • So here on the right,

  • maybe our blue category has these three different islands

  • of where the blue category lives,

  • and then everything else is some other category.

  • So, for something like horses,

  • we saw on the previous example,

  • is something where this actually might be happening

  • in practice.

  • Where there's maybe one island in the pixel space of

  • horses looking to the left,

  • and another island of horses looking to the right.

  • And now there's no good way to draw a single linear

  • boundary between these two isolated islands of data.

  • So anytime where you have multimodal data,

  • like one class

  • that can appear in different regions of space,

  • is another place where linear classifiers might struggle.

  • So there's kind of a lot of problems with

  • linear classifiers, but it is a super simple algorithm,

  • super nice and easy to interpret and easy to understand.

  • So you'll actually be implementing these things

  • on your first homework assignment.

  • At this point,

  • we kind of talked about

  • what is the functional form corresponding to a

  • linear classifier.

  • And we've seen that this functional form

  • of matrix vector multiply

  • corresponds this idea of template matching

  • and learning a single template for each category

  • in your data.

  • And then once we have this trained matrix

  • you can use it to actually go and get your scores

  • for any new training example.

  • But what we have not told you is

  • how do you actually go about choosing the right W

  • for your dataset.

  • We've just talked about what is the functional form

  • and what is going on with this thing.

  • So that's something we'll really focus on next time.

  • And next lecture we'll talk about

  • what are the strategies and algorithms

  • for choosing the right W.

  • And this will lead us to questions

  • of loss functions and optimization

  • and eventually ConvNets.

  • So, that's a bit of the preview for next week.

  • And that's all we have for today.

Okay, so welcome to lecture two of CS231N.

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級 美國腔

回放2|影像分類 (Lecture 2 | Image Classification)

  • 18 0
    李張誌 發佈於 2021 年 01 月 14 日
影片單字