字幕列表 影片播放
This is a three. It's sloppily written and rendered at an extremely low resolution of 28 by 28 pixels.
這是一個隨意書寫的28*28像素、解析度很低的數字 3
But your brain has no trouble recognizing it as a three and I want you to take a moment to appreciate
但你的大腦一看見就能輕鬆辨識出來 ,我想要你好好欣賞這點
How crazy it is that brains can do this so effortlessly?
人腦能夠毫無障礙地辨識是非常厲害的
I mean this this and this are also recognizable as threes,
我的意思是,這個、這個、還有這個,都能被識別為 3
even though the specific values of each pixel is very different from one image to the next.
即使前後圖像的圖形組成有很大差異
The particular light-sensitive cells in your eye that are firing when you see this three
當你看到這張 3 在眼中所激發的感光細胞
are very different from the ones firing when you see this three.
跟當你看到這張 3 所激發的感光細胞是非常不同的
But something in that crazy smart visual cortex of yours
但在你驚人聰明的視覺皮層的處理下
resolves these as representing the same idea while at the same time recognizing other images as their own distinct ideas
能將這兩個 3 視為同一個概念,同時將其他圖像視為不同的概念
But if I told you hey sit down and write for me a program that takes in a grid of 28 by 28
要是我要你:「嘿!坐下來幫我寫個程式, 」
pixels like this and outputs a single number between 0 and 10 telling you what it thinks the digit is
「輸入像這個 28*28 像素的數字圖像」
Well the task goes from comically trivial to dauntingly difficult
「接著輸出該程式認為的 0 到 10 之間的一個數字 ,必須跟你認為的一樣。」
Unless you've been living under a rock
這個任務將不再是家常便飯,而變得嚇死人的困難
I think I hardly need to motivate the relevance and importance of machine learning and neural networks to the present into the future
除非你是山頂洞人
But what I want to do here is show you what a neural network actually is
我想不用再強調機器學習和神經網路之間,對未來發展的關聯性和重要性
Assuming no background and to help visualize what it's doing not as a buzzword but as a piece of math
我現在要向你展示神經網路究竟是什麼
My hope is just that you come away feeling like this structure itself is
假設你沒有相關背景知識
Motivated and to feel like you know what it means when you read or you hear about a neural network quote-unquote learning
我會視覺化神經網路的運作
This video is just going to be devoted to the structure component of that and the following one is going to tackle learning
並且把它當作一門數學,不僅僅是當下流行詞語
What we're going to do is put together a neural network that can learn to recognize handwritten digits
我希望你將能理解為什麼神經網路是長這個樣子
This is a somewhat classic example for
當你看到或聽到機器藉著神經網路來「學習」時 是了解其意涵的
Introducing the topic and I'm happy to stick with the status quo here because at the end of the two videos I want to point
這支影片將解釋神經網路的構造
You to a couple good resources where you can learn more and where you can download the code that does this and play with it?
而下一部影片將解釋機器學習
on your own computer
我們要做的是打造一個可以辨識手寫數字的神經網路
There are many many variants of neural networks and in recent years
這是介紹這種主題很典型的範例
There's been sort of a boom in research towards these variants
我樂於保持這種模式,因為在看完兩支影片後
But in these two introductory videos you and I are just going to look at the simplest plain-vanilla form with no added frills
我會給你一些很好的網站,你可以在那裡學到很多,並且下載程式碼
This is kind of a necessary
在你的電腦裡好好研究
prerequisite for understanding any of the more powerful modern variants and
神經網路發展成很多很多不同類型
Trust me it still has plenty of complexity for us to wrap our minds around
而且近年來對這些的研究有爆炸性的趨勢
But even in this simplest form it can learn to recognize handwritten digits
但這兩支入門影片只會帶你來認識,最簡單的一種神經網路:「多層感知機」(MLP) 最基本的樣子
Which is a pretty cool thing for a computer to be able to do.
這是必要的入門知識
And at the same time you'll see how it does fall short of a couple hopes that we might have for it
對於將來要理解現在任何一種強大的神經網路
As the name suggests neural networks are inspired by the brain, but let's break that down
而且相信我,今天的主題已經夠複雜了,足以讓你腦袋打結
What are the neurons and in what sense are they linked together?
即使是這麼簡單的神經網路也可以經由學習來分辨手寫數字
Right now when I say neuron all I want you to think about is a thing that holds a number
這對電腦來說是非常酷的一件事
Specifically a number between 0 & 1 it's really not more than that
而且與此同時你也將看到神經網路不盡人意的地方
For example the network starts with a bunch of neurons corresponding to each of the 28 times 28 pixels of the input image
神經網路一如其名,是啟發自生物的大腦神經結構
which is
讓我們來剖析它吧
784 neurons in total each one of these holds a number that represents the grayscale value of the corresponding pixel
何謂神經元,又是什麼機制讓它們連在一起的?
ranging from 0 for black pixels up to 1 for white pixels
現在,當我說「神經元」,我要你聯想到它是乘載一個數字的容器
This number inside the neuron is called its activation and the image you might have in mind here
基本是介於 0 和 1 之間的數字,但實際上不止於此
Is that each neuron is lit up when its activation is a high number?
例如:神經網路以輸入圖像的每個像素,對應到每個神經元作為輸入
So all of these 784 neurons make up the first layer of our network
也就是說輸入層總共有 784 個神經元,每個都有乘載數字 ,每個數字代表了對應像素的灰階值
Now jumping over to the last layer this has ten neurons each representing one of the digits
灰階值 0 即黑色,1 即白色
the activation in these neurons again some number that's between zero and one
這些在神經元中的數字稱為「激勵值」
Represents how much the system thinks that a given image?
在此你可能注意到
Corresponds with a given digit. There's also a couple layers in between called the hidden layers
每當神經元激勵值越高,該神經元就越亮
Which for the time being?
於是全部的 784 個神經元,組成了神經網路的第一層
Should just be a giant question mark for how on earth this process of recognizing digits is going to be handled
我們現在跳到最後一層,這層有 10 個神經元,各自表示 0 到 9 的數字
In this network I chose two hidden layers each one with 16 neurons and admittedly that's kind of an arbitrary choice
同樣在這邊的神經元也各自有著介於 0 到 1 的激勵值
to be honest I chose two layers based on how I want to motivate the structure in just a moment and
表示對於給定的圖像,神經網路對於實際數字的判斷結果
16 well that was just a nice number to fit on the screen in practice
在輸入層和輸出層之間,有數個「隱藏層」
There is a lot of room for experiment with a specific structure here
現在在本入門影片裡
The way the network operates activations in one layer determine the activations of the next layer
對於神經網路是如何進行判斷的,我們只能先把它看做是巨大的問號
And of course the heart of the network as an information processing mechanism comes down to exactly how those
本影片展示的視覺化神經網路,我設計了兩個隱層,個別搭載16個神經元
activations from one layer bring about activations in the next layer
這只是擺好看的設定
It's meant to be loosely analogous to how in biological networks of neurons some groups of neurons firing
老實說之所以選擇兩個隱層,是基於視覺化讓你看得清楚的考量,待會解釋
cause certain others to fire
而安排 16 個神經元,只是為了符合版面,也是為了讓你看得清楚
Now the network
在實際應用上,神經網路的結構經實驗不斷調整可以變得非常巨大且特殊
I'm showing here has already been trained to recognize digits and let me show you what I mean by that
神經網路操作一層激勵值的方式,會決定下一層的激勵值
It means if you feed in an image lighting up all
這正是神經網路的核心價值,不同以往的資料處理技術,現在發展成只要輸入
784 neurons of the input layer according to the brightness of each pixel in the image
激勵值從上一層傳到下一層,最後輸出足夠正確的結果
That pattern of activations causes some very specific pattern in the next layer
神經網路的本質就是模仿生物的大腦,就像一叢腦細胞被激發
Which causes some pattern in the one after it?
引發其他神經細胞的串聯反應
Which finally gives some pattern in the output layer and?
我現在展示的
The brightest neuron of that output layer is the network's choice so to speak for what digit this image represents?
這個神經網路已經訓練完成了,可以準確辨識圖像中的數字
And before jumping into the math for how one layer influences the next or how training works?
讓我來解釋「傳遞激勵值」這點
Let's just talk about why it's even reasonable to expect a layered structure like this to behave intelligently
意思是:當你輸入一張 28x28 像素的圖像,它將點亮
What are we expecting here? What is the best hope for what those middle layers might be doing?
所有 784 個神經元 每個都比照對應像素的灰階值,來決定自己的激勵值
Well when you or I recognize digits we piece together various components a nine has a loop up top and a line on the right
決定的數值分布狀態會影響下一層被啟動的神經元的分布
an 8 also has a loop up top, but it's paired with another loop down low
又會導致下一層不同的分布
A 4 basically breaks down into three specific lines and things like that
最後抵達輸出層,輸出層的神經元也會有特定的分布
Now in a perfect world we might hope that each neuron in the second-to-last layer
而最亮的那個就是神經網路所認為最有可能答對的答案
corresponds with one of these sub components
但在一腳踏進數學之前,要先知道上層如何影響下層,而且機器學習為什麼會有用
That anytime you feed in an image with say a loop up top like a 9 or an 8
為什麼我們認為層狀結構會像這樣聰明地運作是非常合理的?
There's some specific
我們在期待什麼呢? 我們最想要神經網路的隱藏層怎麼運作呢?
Neuron whose activation is going to be close to one and I don't mean this specific loop of pixels the hope would be that any
在你或我在辨識圖中的數字時我們會把各種筆畫拼湊在一起。
Generally loopy pattern towards the top sets off this neuron that way going from the third layer to the last one
一個 9 字,上面有圓圈,而在右邊有一條直線
just requires learning which combination of sub components corresponds to which digits
一個 8 字,上面也有一个圓圈但在下面與另一個圓圈相連
Of course that just kicks the problem down the road
一個 4 基本上可以拆解成就像那些特定的筆畫
Because how would you recognize these sub components or even learn what the right sub components should be and I still haven't even talked about
一個理想的情況中我們會希望第二層的每個神經元
How one layer influences the next but run with me on this one for a moment
能識別這些筆劃的其中之一
recognizing a loop can also break down into subproblems
每次你輸入一個有頂部有個圓圈的圖像如 9 或 8 時
One reasonable way to do this would be to first recognize the various little edges that make it up
隱層第二層的某些特定神經元的激勵值就會接近 1
Similarly a long line like the kind you might see in the digits 1 or 4 or 7
而我要的並不是單單適用這種圓圈,而是更廣泛的各種圓圈皆適用
Well that's really just a long edge or maybe you think of it as a certain pattern of several smaller edges
如此,在隱層第二層到輸出層的神經元
So maybe our hope is that each neuron in the second layer of the network
只需要學習對應於數字的筆畫的組合
corresponds with the various relevant little edges
當然這又丟出了一道難題
Maybe when an image like this one comes in it lights up all of the neurons
因為你怎麼讓那些神經元知道那些數字該對應到那些特定的筆畫?
associated with around eight to ten specific little edges
而我甚至還沒開始講上一層怎麼影響下一層,但是再聽我解釋一下這裡
which in turn lights up the neurons associated with the upper loop and a long vertical line and
辨識一個圓圈的問題也可以分解成辨識一些較小零件的問題
Those light up the neuron associated with a nine
一個合理的方法是認出組成它的各式各樣的邊
whether or not
同樣的道理,你在數字 1,4 或者 7 中所看到的一條長線
This is what our final network actually does is another question, one that I'll come back to once we see how to train the network
真的就是好幾小條的短線,根據特定筆畫順序,組合成的長線
But this is a hope that we might have. A sort of goal with the layered structure like this
所以我們期望在這網絡第二層
Moreover you can imagine how being able to detect edges and patterns like this would be really useful for other image recognition tasks
對應著各式各樣的一些小邊
And even beyond image recognition there are all sorts of intelligent things you might want to do that break down into layers of abstraction
也許出現一個像這樣的一個圖像就點亮
Parsing speech for example involves taking raw audio and picking out distinct sounds which combine to make certain syllables
所有大約有 8 到 10 種有關的特定神經元
Which combine to form words which combine to make up phrases and more abstract thoughts etc
它接著點亮後上方的圓圈和一根垂直的長線以及點亮和一條線相聯的神經元
But getting back to how any of this actually works picture yourself right now designing
最後點亮數字 9 的神經元
How exactly the activations in one layer might determine the activations in the next?
不管這個是不是我們最終的網絡實際上的實施是另一個問題
The goal is to have some mechanism that could conceivably combine pixels into edges
這個我們在知道怎樣了訓練網絡之後我在回過來講
Or edges into patterns or patterns into digits and to zoom in on one very specific example
但至少我們可能有點希望 像是一種以這樣分層結構為目標的
Let's say the hope is for one particular
你可以進一步想像怎樣能來檢測像這樣的邊和式樣對其他的圖像識別功能真是有用的
Neuron in the second layer to pick up on whether or not the image has an edge in this region here
並甚至在圖像識別之外做各種各樣智能的東西也許你也想分解成一些抽象的層
The question at hand is what parameters should the network have
例如句子的分析涉及到把原始的語音提出一些獨特的聲音構成一些音節再構成
what dials and knobs should you be able to tweak so that it's expressive enough to potentially capture this pattern or
詞再構成詞組以及更為抽象的思想等。
Any other pixel pattern or the pattern that several edges can make a loop and other such things?
但回到這些實際是怎樣工作的把你自己現在就放到這個的情景怎樣來設計
Well, what we'll do is assign a weight to each one of the connections between our neuron and the neurons from the first layer
如何在讓這層中的激勵函數可以決定下一層的激勵函數呢?
These weights are just numbers
這目標是有一些機能它想起來可以集中到一個特定的樣本來把一些像素結合成
then take all those activations from the first layer and compute their weighted sum according to these weights I
邊或者把邊結合成式樣或者式樣成爲數字 在這個特別的例子裡面
Find it helpful to think of these weights as being organized into a little grid of their own
我們希望第二層的這一個神經元
And I'm going to use green pixels to indicate positive weights and red pixels to indicate negative weights
可以正確的辨識出這個圖像裡有沒有一條邊
Where the brightness of that pixel is some loose depiction of the weights value?
現在我們想知道的是網路裡有哪些參數
Now if we made the weights associated with almost all of the pixels zero
要怎麼調整這些參數才能讓完整的表達出是這個圖案
except for some positive weights in this region that we care about
還是其他的圖案或是由數個邊組合成的圓圈之類的
then taking the weighted sum of
我們會分配給神經元和輸入層間的每一個連接線一個權重
all the pixel values really just amounts to adding up the values of the pixel just in the region that we care about
權重單純只是一個數字而已
And, if you really want it to pick up on whether there's an edge here what you might do is have some negative weights
然後計算所有激勵函數的加權總和
associated with the surrounding pixels
把這些權重整理成一個圖像應該更好理解
Then the sum is largest when those middle pixels are bright, but the surrounding pixels are darker
我把正的權重值標記為綠色 負的權重值標記為紅色
When you compute a weighted sum like this you might come out with any number
當顏色越亮代表它的值跟 0 差距越大
but for this network what we want is for activations to be some value between 0 & 1
除了我們所關注的區域以外
so a common thing to do is to pump this weighted sum
所有的權重值都改為 0
Into some function that squishes the real number line into the range between 0 & 1 and
然後去取得所有像素的加權總合
A common function that does this is called the sigmoid function also known as a logistic curve
幾乎就等於只有我們所關注的區域的值提升了
basically very negative inputs end up close to zero very positive inputs end up close to 1
如果知道這裡是不是真的存在一條邊
and it just steadily increases around the input 0
你只需要在周圍加上負的權重
So the activation of the neuron here is basically a measure of how positive the relevant weighted sum is
這樣當中間的像素亮但是周圍的像素暗 就可以得到最大的加權總和
But maybe it's not that you want the neuron to light up when the weighted sum is bigger than 0
當你計算加權總和時 它的值可能是任意實數
Maybe you only want it to be active when the sum is bigger than say 10
但是在這裡我們想要計算完的結果介於 0 跟 1 之間
That is you want some bias for it to be inactive
所以我們通常會把這個值丟進一個函數裡面
what we'll do then is just add in some other number like negative 10 to this weighted sum
把這個實數軸壓縮成一個介於 0 到 1 之間
Before plugging it through the sigmoid squishification function
有一個常見的函數叫做「Sigmoid」也被稱為「邏輯函數」
That additional number is called the bias
基本上越小的數會越來越接近 0 越大的數會越來越接近 1
So the weights tell you what pixel pattern this neuron in the second layer is picking up on and the bias
輸入值在 0 附近的會平穩增長
tells you how high the weighted sum needs to be before the neuron starts getting meaningfully active
所以從神經網路得到的激勵函數基本上就代表加權總和的大小
And that is just one neuron
但是不是每次只要加權總和大於零的時候就點亮神經元
Every other neuron in this layer is going to be connected to all
也許你只想要在它大於 10 的時候啟動
784 pixels neurons from the first layer and each one of those 784 connections has its own weight associated with it
所以要加入一個門檻來確保它不會隨便啟動
also each one has some bias some other number that you add on to the weighted sum before squishing it with the sigmoid and
我們只要在加權總和後面加上一個像是 負10 之類的數
That's a lot to think about with this hidden layer of 16 neurons
再把它塞進邏輯函數裡
that's a total of 784 times 16 weights along with 16 biases
這個附加的數字就叫做偏置
And all of that is just the connections from the first layer to the second the connections between the other layers
所以權重告訴我們下一層的神經元所關注的圖樣
Also, have a bunch of weights and biases associated with them
偏置則告訴我們加權總和要超過什麼程度才是有意義的
All said and done this network has almost exactly
以上只是一個神經元的情況
13,000 total weights and biases
在這一層的每個神經元都會連接第一層共 784 個神經元
13,000 knobs and dials that can be tweaked and turned to make this network behave in different ways
而且這 784 條連接線都各有一個屬於自己的權重