字幕列表 影片播放
-
Let me show you something.
容我為各位呈現一些照片
-
(Video) Girl: Okay, that's a cat sitting in a bed.
(影片)女孩:嗯,這是一隻貓,坐在床上。
-
The boy is petting the elephant.
這男孩在拍撫一隻象。
-
Those are people that are going on an airplane.
這些人要去搭飛機。
-
That's a big airplane.
好大的飛機。
-
Fei-Fei Li: This is a three-year-old child
主講人:這是由一位三歲的小孩
-
describing what she sees in a series of photos.
所描述她看到的一系列照片
-
She might still have a lot to learn about this world,
雖然對於這世界她還有更多要學習的地方,
-
but she's already an expert at one very important task:
但是她已經是其中一項重要技能的專家--
-
to make sense of what she sees.
為所見之聞賦予意義。
-
Our society is more technologically advanced than ever.
科技在我們的社會已進展到前所未有的程度:
-
We send people to the moon, we make phones that talk to us
我們把人送上月球、發明可以與人交談的電話,
-
or customize radio stations that can play only music we like.
或是客製一個電台,只播放個人喜歡的音樂。
-
Yet, our most advanced machines and computers
然而這台無比聰明的機器和電腦
-
still struggle at this task.
仍然無法發展這項技能,
-
So I'm here today to give you a progress report
因此今天我來到這裡向各位報告
-
on the latest advances in our research in computer vision,
我們在電腦視覺的最新研究進展,
-
one of the most frontier and potentially revolutionary
這是現階段在資訊業領域中,
-
technologies in computer science.
最先進、最具潛力的革命性技術。
-
Yes, we have prototyped cars that can drive by themselves,
是的,目前我們已經有自動駕駛的原型車,
-
but without smart vision, they cannot really tell the difference
但若不具備視覺辨識技術, 它將無法分辨同樣出現在馬路中,
-
between a crumpled paper bag on the road, which can be run over,
一團它其實輾過也無妨的破紙袋,
-
and a rock that size, which should be avoided.
以及一個大到它必須閃避的石塊, 兩者有何不同。
-
We have made fabulous megapixel cameras,
我們製造出畫素極高的相機,
-
but we have not delivered sight to the blind.
但我們卻無法賦予盲人視覺;
-
Drones can fly over massive land,
無人機可以翻山越嶺,
-
but don't have enough vision technology
卻沒有足夠的視覺技術可以
-
to help us to track the changes of the rainforests.
讓我們追蹤雨林的變化;
-
Security cameras are everywhere,
監視器滿佈在各個角落,
-
but they do not alert us when a child is drowning in a swimming pool.
卻無法在看到一個孩子將溺斃在泳池之際, 對我們發出警訊。
-
Photos and videos are becoming an integral part of global life.
靜態及動態影像已逐漸與全世界的生活密不可分,
-
They're being generated at a pace that's far beyond what any human,
它們發展的步伐已經遠遠超越人類
-
or teams of humans, could hope to view,
及其群體所相信的,
-
and you and I are contributing to that at this TED.
在座各位以及我自己 都是TED這個活動裡頭的推手。
-
Yet our most advanced software is still struggling at understanding
然而,目前最先進的軟體卻仍在其中苦苦掙扎,
-
and managing this enormous content.
無法理解與應用這龐大的資料體。
-
So in other words, collectively as a society,
換而言之,在這整個社會裡,
-
we're very much blind,
大家都有如盲人在運作,
-
because our smartest machines are still blind.
因為連我們最聰明的機器都還看不見。
-
"Why is this so hard?" you may ask.
或許有人會問:這到底有什麼困難?
-
Cameras can take pictures like this one
任何相機都可以產生像這樣的照片,
-
by converting lights into a two-dimensional array of numbers
它是藉由將有色光轉換成2D的數字陣列,
-
known as pixels,
也就是大家熟知的像素。
-
but these are just lifeless numbers.
但這些數字是死的,
-
They do not carry meaning in themselves.
並沒有被賦予意義。
-
Just like to hear is not the same as to listen,
就好像有「聽」,不代表有「到」。
-
to take pictures is not the same as to see,
同樣地,攝取到影像不等於看見,
-
and by seeing, we really mean understanding.
我們所認知的看到,應包含著了解其中的意義。
-
In fact, it took Mother Nature 540 million years of hard work
事實上,這樣的成果, 是大自然花了五億四千萬年的光陰
-
to do this task,
才得到的。
-
and much of that effort
這其中的努力,
-
went into developing the visual processing apparatus of our brains,
泰半是耗費在發展腦部的視覺處理這個區塊,
-
not the eyes themselves.
而不是眼睛的部分。
-
So vision begins with the eyes,
也就是說,視覺始於眼睛,
-
but it truly takes place in the brain.
但真正使它有用的,卻是大腦。
-
So for 15 years now, starting from my Ph.D. at Caltech
十五年來,從在加州理工學院攻讀博士開始,
-
and then leading Stanford's Vision Lab,
到領導史丹佛的視覺實驗室,
-
I've been working with my mentors, collaborators and students
我和指導教授、同事及學生們,
-
to teach computers to see.
試圖讓電腦擁有智能之眼,
-
Our research field is called computer vision and machine learning.
我們研究的領域稱之為電腦視覺與機器學習,
-
It's part of the general field of artificial intelligence.
這是人工智慧其中一環。
-
So ultimately, we want to teach the machines to see just like we do:
我們的終極目標就是教導機器能夠像人一樣理解所見之物,
-
naming objects, identifying people, inferring 3D geometry of things,
像是識別物品、辨認人臉、 推論物體的幾何形態,
-
understanding relations, emotions, actions and intentions.
進而理解其中的關聯、情緒、動作及意圖。
-
You and I weave together entire stories of people, places and things
在座每一位和我,都可以在匆匆一瞥的瞬間,
-
the moment we lay our gaze on them.
理解到人事、地、物所交織而成的網絡,
-
The first step towards this goal is to teach a computer to see objects,
要電腦達成這個目標的第一步,就是教導它辨別物品,
-
the building block of the visual world.
這是視覺的基石。
-
In its simplest terms, imagine this teaching process
簡單來說,我們教導的方法就是
-
as showing the computers some training images
給電腦看一些特定物體的影像,
-
of a particular object, let's say cats,
例如貓咪。
-
and designing a model that learns from these training images.
我們設計了一個程式讓電腦利用這些影像來學習
-
How hard can this be?
這有啥困難?
-
After all, a cat is just a collection of shapes and colors,
貓咪不就是由一些幾何圖形和顏色所組成的嘛,
-
and this is what we did in the early days of object modeling.
這就是我們初期所做的物體模型。
-
We'd tell the computer algorithm in a mathematical language
我們用數學語言來告知電腦演繹方法,
-
that a cat has a round face, a chubby body,
貓就是有圓圓的臉、胖胖的身體,
-
two pointy ears, and a long tail,
兩個尖尖的耳朵和一條長尾巴。
-
and that looked all fine.
看起來很好啊,
-
But what about this cat?
但如果貓咪長這樣呢?
-
(Laughter)
(觀眾笑)
-
It's all curled up.
全身都捲起來了。
-
Now you have to add another shape and viewpoint to the object model.
這下子我們又得在原來的模型 加上新的形狀和不同的視野角度。
-
But what if cats are hidden?
又,如果貓咪是躲著的呢?
-
What about these silly cats?
像這群傻貓?
-
Now you get my point.
這樣各位了解我的意思嗎?
-
Even something as simple as a household pet
即使簡單如貓這樣的家庭寵物,
-
can present an infinite number of variations to the object model,
也會有相對於原型以外,無數的其他形態表徵,
-
and that's just one object.
而這只是其中一樣。
-
So about eight years ago,
因此八年前,
-
a very simple and profound observation changed my thinking.
一項極其簡單和深刻的觀察,改變了我的想法,
-
No one tells a child how to see,
沒有人教導孩子如何去「看」,
-
especially in the early years.
特別是在早期發育階段,
-
They learn this through real-world experiences and examples.
他們是從真實世界的經驗中學習。
-
If you consider a child's eyes
如果你把孩童的眼睛
-
as a pair of biological cameras,
當成生物相機的概念,
-
they take one picture about every 200 milliseconds,
就如同每200毫秒就拍一張照片一樣,
-
the average time an eye movement is made.
這是眼球移動的平均時間。
-
So by age three, a child would have seen hundreds of millions of pictures
年紀到了三歲時, 孩子們已經看過了真實世界中
-
of the real world.
數以百萬計的照片,
-
That's a lot of training examples.
這樣的訓練範例是很大量的。
-
So instead of focusing solely on better and better algorithms,
因此,我的直覺告訴我 應該以孩童的學習經驗法則,
-
my insight was to give the algorithms the kind of training data
並兼以質與量,
-
that a child was given through experiences
提供訓練的資料給電腦,
-
in both quantity and quality.
而非一昧追求更好的程式演算。
-
Once we know this,
有了上述的洞見,
-
we knew we needed to collect a data set
我們接下來必須要收集
-
that has far more images than we have ever had before,
前所未有的大量資料群,
-
perhaps thousands of times more,
甚至於是千倍以上的。
-
and together with Professor Kai Li at Princeton University,
於是我與普林斯頓大學的李凱教授
-
we launched the ImageNet project in 2007.
共同於2007年開始了 我們稱之為 ImageNet 的專案。
-
Luckily, we didn't have to mount a camera on our head
很幸運地,我們不必在頭上綁一個相機,
-
and wait for many years.
然後花費數年收集影像,
-
We went to the Internet,
而是轉而由網際網路,
-
the biggest treasure trove of pictures that humans have ever created.
這個由人類所創造出來 龐大的影像寶窟,
-
We downloaded nearly a billion images
我們下載了數以百萬計的影像,
-
and used crowdsourcing technology like the Amazon Mechanical Turk platform
並且使用如Amazon Mechanical Turk 這樣的群眾外包平台,
-
to help us to label these images.
來協助我們處理及分類這些照片。
-
At its peak, ImageNet was one of the biggest employers
在高峰期,ImageNet 甚至是整個亞馬遜平台
-
of the Amazon Mechanical Turk workers:
最大的雇主之一,
-
together, almost 50,000 workers
我們一共聘請了來自167個國家,
-
from 167 countries around the world
約5萬個工作者,
-
helped us to clean, sort and label
來協助我們分類處理並標示
-
nearly a billion candidate images.
將近10億幅影像,
-
That was how much effort it took
花費了這麼多的資源,
-
to capture even a fraction of the imagery
就是為了捕捉那一絲絲
-
a child's mind takes in in the early developmental years.
孩童在早期心智發展的浮光掠影。
-
In hindsight, this idea of using big data
用現在眼光看來,使用大量的資料
-
to train computer algorithms may seem obvious now,
來訓練電腦演算是明顯合理的,
-
but back in 2007, it was not so obvious.
然而在2007年的世界卻非如此。
-
We were fairly alone on this journey for quite a while.
有好長一段時間, 我們在這個旅途中孤獨地踽踽而行,
-
Some very friendly colleagues advised me to do something more useful for my tenure,
有些同事好心地建議我, 與其苦苦掙扎於研究經費的募集,
-
and we were constantly struggling for research funding.
還不如轉而先做些比較好拿到終身聘的研究,
-
Once, I even joked to my graduate students
我還曾跟我的研究生開玩笑說
-
that I would just reopen my dry cleaner's shop to fund ImageNet.
我乾脆再開一間乾洗店來資助ImageNet 好了,
-
After all, that's how I funded my college years.
畢竟那就是我用以支付大學學費的方法。
-
So we carried on.
就這樣我們還是繼續往前走,
-
In 2009, the ImageNet project delivered
2009年起,ImageNet 已經是個擁有
-
a database of 15 million images
涵蓋了兩萬兩千種不同類別,
-
across 22,000 classes of objects and things
多達150億幅圖像的資料庫,
-
organized by everyday English words.
並組織以英語日常生活用字為主,
-
In both quantity and quality,
這樣的規模,不論是「質」或「量」
-
this was an unprecedented scale.
都是史無前例的。
-
As an example, in the case of cats,
用貓來舉個例子說明,
-
we have more than 62,000 cats
我們有超過六萬兩千種
-
of all kinds of looks and poses
不同外觀和姿勢的貓咪,
-
and across all species of domestic and wild cats.
橫跨不同的種類,有家貓,也有野貓。
-
We were thrilled to have put together ImageNet,
ImageNet 的成果讓我們非常激動,
-
and we wanted the whole research world to benefit from it,
我們希望它有助於全世界的研究,
-
so in the TED fashion, we opened up the entire data set
就如同 TED 的貢獻,我們免費提供整個資料庫
-
to the worldwide research community for free.
給全世界的研究單位。
-
(Applause)
(觀眾鼓掌)
-
Now that we have the data to nourish our computer brain,
有了這些資料,我們可以教育我們的電腦,
-
we're ready to come back to the algorithms themselves.
下一步就是回到程式演算的部分了。
-
As it turned out, the wealth of information provided by ImageNet
結果我們發現,ImageNet 所提供的豐富資訊
-
was a perfect match to a particular class of machine learning algorithms
恰巧與機器學習演算的其中一門特定領域 不謀而合,
-
called convolutional neural network,
我們稱它為「卷積神經網絡」,
-
pioneered by Kunihiko Fukushima, Geoff Hinton, and Yann LeCun
在七零及八零年代,福島邦彥、Geoff Hinton
-
back in the 1970s and '80s.
和 Yann LeCun 等學者為該領域的先驅。
-
Just like the brain consists of billions of highly connected neurons,
正如同大腦是由無數個緊密連結的神經元所組成,
-
a basic operating unit in a neural network
神經網絡的基本運作單位
-
is a neuron-like node.
也是一個類神經元的節點。
-
It takes input from other nodes
它的運作方式是從別的節點得到資料,
-
and sends output to others.
然後再傳給其他的節點。
-
Moreover, these hundreds of thousands or even millions of nodes
而且這些數不清的節點
-
are organized in hierarchical layers,
擁有層層的組織架構,
-
also similar to the brain.
就好像我們的大腦一樣。
-
In a typical neural network we use to train our object recognition model,
在一般的神經網絡中, 我們用作訓練的物品辨識模型
-
it has 24 million nodes,
就有兩千四百萬個節點、
-
140 million parameters,
一億四千萬個參數,
-
and 15 billion connections.
以及一百五十億個連結。
-
That's an enormous model.
這是一個大的不得了的模型。
-
Powered by the massive data from ImageNet
由ImageNet 提供巨大的資料群、
-
and the modern CPUs and GPUs to train such a humongous model,
並使用先進的核心處理器及圖型處理器來訓練 這個龐然大物,
-
the convolutional neural network
卷積神經網絡就在眾人的意料外
-
blossomed in a way that no one expected.
開花結果了。
-
It became the winning architecture
在物品辨識領域中,這樣的架構
-
to generate exciting new results in object recognition.
以令人興奮的嶄新成果,傲視群雄。
-
This is a computer telling us
電腦告訴我們
-
this picture contains a cat
這張圖中有隻貓,
-
and where the cat is.
還告訴我們貓在哪裡。
-
Of course there are more things than cats,
當然,這世界不會只有貓,
-
so here's a computer algorithm telling us
電腦的演算告訴我們
-
the picture contains a boy and a teddy bear;
這張圖中有一個男孩和一隻泰迪熊;
-
a dog, a person, and a small kite in the background;
有狗,一個人,以及背景中的一支小風箏;
-
or a picture of very busy things
或這一張令人眼花撩亂的圖,
-
like a man, a skateboard, railings, a lampost, and so on.
有人、滑板、欄杆、路燈,等等。
-
Sometimes, when the computer is not so confident about what it sees,
有時候,如果電腦不確定自己所見到的東西時,
-
we have taught it to be smart enough
我們已經將它教到可以聰明地
-
to give us a safe answer instead of committing too much,
給一個安全的答案,而非莽撞地回答,
-
just like we would do,
就像一般人會做的。
-
but other times our computer algorithm is remarkable at telling us
更有些時候,電腦的運算竟能夠
-
what exactly the objects are,
精準地辨別物體品項
-
like the make, model, year of the cars.
例如製造商、型號、車子的年份。
-
We applied this algorithm to millions of Google Street View images
Google 將這個演算程式廣泛地運用在
-
across hundreds of American cities,
數百個美國城市的街景裡,
-
and we have learned something really interesting:
也因此我們從中得到了一些有趣的概念。
-
first, it confirmed our common wisdom
首先,它證實了一項廣為人知的說法,
-
that car prices correlate very well
也就是汽車價格和家庭收入
-
with household incomes.
是息息相關的。
-
But surprisingly, car prices also correlate well
然而令人驚訝的是,汽車價格也和
-
with crime rates in cities,
城市中的犯罪率
-
or voting patterns by zip codes.
以及區域選舉模式,有相當的關係。
-
So wait a minute. Is that it?
等等,難道說我今天
-
Has the computer already matched or even surpassed human capabilities?
就是來告訴各位電腦已經趕上 甚至超越人類了嗎?
-
Not so fast.
還早得很呢。
-
So far, we have just taught the computer to see objects.
到目前為止,我們只是教導電腦識別物品,
-
This is like a small child learning to utter a few nouns.
就像小孩子牙牙學語一樣,
-
It's an incredible accomplishment,
雖然這是個傲人的進展,
-
but it's only the first step.
但它不過是第一步而已,
-
Soon, another developmental milestone will be hit,
很快地,下一波具指標性的後浪就會打上來了,
-
and children begin to communicate in sentences.
小孩子開始進展到用句子來溝通。
-
So instead of saying this is a cat in the picture,
因此,他已經不會用「這是貓」 來描述圖片,
-
you already heard the little girl telling us this is a cat lying on a bed.
而是會聽到這個小女孩說「這是躺在床上的貓」。
-
So to teach a computer to see a picture and generate sentences,
因此,要教導電腦看到圖並說出句子,
-
the marriage between big data and machine learning algorithm
必須進一步地仰賴龐大資料群
-
has to take another step.
以及機器的學習演算。
-
Now, the computer has to learn from both pictures
現在,電腦不僅要學習圖片識別,
-
as well as natural language sentences
還要學習人類自然的
-
generated by humans.
說話方式。
-
Just like the brain integrates vision and language,
就如同大腦要結合視覺和語言一樣,
-
we developed a model that connects parts of visual things
我們做出了一個模型, 它可以連結不同的可視物體,
-
like visual snippets
就像視覺片段一樣,
-
with words and phrases in sentences.
並附上句子用的字詞和片語。
-
About four months ago,
約四個月前,
-
we finally tied all this together
我們終於把所有的元素全部兜起來了,
-
and produced one of the first computer vision models
做出了第一個電腦版的模型,
-
that is capable of generating a human-like sentence
它有辦法在初次看到照片時
-
when it sees a picture for the first time.
說出像人類般自然的句子,
-
Now, I'm ready to show you what the computer says
好,現在我要給各位看看電腦
-
when it sees the picture
對於演講一開頭
-
that the little girl saw at the beginning of this talk.
那位小女孩所看到的影像, 它又是如何理解的。
-
(Video) Computer: A man is standing next to an elephant.
(電腦) 有個人站在大象旁邊。
-
A large airplane sitting on top of an airport runway.
一架大飛機停在機場跑道上。
-
FFL: Of course, we're still working hard to improve our algorithms,
(主講人) 當然,我們仍戮力於改善這電腦程式,
-
and it still has a lot to learn.
它還有很多要學。
-
(Applause)
(觀眾鼓掌)
-
And the computer still makes mistakes.
電腦還是會犯錯。