字幕列表 影片播放
♪ (music) ♪
Hi, and welcome to episode three of Zero to Hero with TensorFlow.
In the previous episode, you saw how to do basic computer vision
using a deep neural network
that matched the pixels of an image to a label.
So an image like this
was matched to a numeric label that represented it like this.
But there was a limitation to that.
The image you were looking at had to have the subject centered in it
and it had to be the only thing in the image.
So the code you wrote would work for that shoe,
but what about these?
It wouldn't be able to identify all of them
because it's not trained to do so.
For that we have to use something called a convolutional neural network,
which works a little differently than what you've just seen.
The idea behind a convolutional neural network
is that you filter the images before training the deep neural network.
After filtering the images,
features within the images could then come to the forefront
and you would then spot those features to identify something.
A filter is simply a set of multipliers.
So, for example, in this case, if you're looking at a particular pixel
that has the value 192,
and the filter is the values in the red box,
then you multiply 192 by 4.5,
and each of its neighbors by the respective filter value.
So it's neighbor above and to the left is zero,
so you multiply that by -1.
Its upper neighbor is 64, so you multiply that by zero and so on.
Sum up the result, and you get the new value for the pixel.
Now this might seem a little odd,
but check out the results for some filters like this one
that when multiplied over the contents of the image,
it removes almost everything except the vertical lines.
And this one, that removes almost everything except the horizontal lines.
This can then be combined with something called pooling,
which groups up the pixels in the image and filters them down to a subset.
So, for example, max pooling two by two
will group the image into sets of 2x2 pixels
and simply pick the largest.
The image will be reduced to a quarter of its original size
but the features can still be maintained.
So the previous image after being filtered and then max pooled could look like this.
The image on the right is one quarter the size of the one on the left,
but the vertical line features were maintained
and indeed they were enhanced.
So where did these filters come from?
That's the magic of a convolutional neural network.
They're actually learned.
They are just parameters like those in the neurons
of a neural network that we saw in the last video.
So as our image is fed into the convolutional layer,
a number of randomly initialized filters will pass over the image.
The results of these are fed into the next layer
and matching is performed by the neural network.
And over time, the filters that give us the image outputs
that give the best matches will be learned
and the process is called feature extraction.
Here is an example of how a convolutional filter layer
can help a computer visualize things.
You can see across the top row here that you actually have a shoe,
but it has been filtered down to the sole and the silhouette of a shoe
by filters that learned what a shoe looks like.
You'll run this code for yourself in just a few minutes.
Now, let's take a look at the code
to build a convolutional neural network like this.
So this code is very similar to what you used earlier.
We have a flattened input that's fed into a dense layer
that in turn in fed into the final dense layer that is our output.
The only difference here is that I haven't specified the input shape.
That's because I'll put a convolutional layer on top of it like this.
This layer takes the input so we specify the input shape,
and we're telling it to generate 64 filters with this parameter.
That is, it will generate 64 filters
and multiply each of them across the image,
then each epoch, it will figure out which filters gave the best signals
to help match the images to their labels
in much the same way it learned which parameters worked best
in the dense layer.
The max pooling to compress the image and enhance the features looks like this,
and we can stack convolutional layers on top of each other
to really break down the image
and try to learn from very abstract features like this.
With this methodology, your network starts to learn
based on the features of the image
instead of just the raw patterns of pixels.
Two sleeves, it's a shirt. Two short sleeves, it's a t-shirt.
Sole and laces, it's a shoe-- that type of thing.
Now we are still looking at just the simple image
as a fashion at the moment
but the principles will extend into more complex images
and you'll see that in the next video.
But before going there
try out the notebook to see convolutions for yourself.
I've made a link to it in the description below.
Before we get to the next video, don't forget to hit that subscribe button.
Thank you.
♪ (music) ♪