Placeholder Image

字幕列表 影片播放

  • Let me show you something.

    容我為各位呈現一些照片

  • (Video) Girl: Okay, that's a cat sitting in a bed.

    (影片)女孩:嗯,這是一隻貓,坐在床上。

  • The boy is petting the elephant.

    這男孩在拍撫一隻象。

  • Those are people that are going on an airplane.

    這些人要去搭飛機。

  • That's a big airplane.

    好大的飛機。

  • Fei-Fei Li: This is a three-year-old child

    主講人:這是由一位三歲的小孩

  • describing what she sees in a series of photos.

    所描述她看到的一系列照片

  • She might still have a lot to learn about this world,

    雖然對於這世界她還有更多要學習的地方,

  • but she's already an expert at one very important task:

    但是她已經是其中一項重要技能的專家--

  • to make sense of what she sees.

    為所見之聞賦予意義。

  • Our society is more technologically advanced than ever.

    科技在我們的社會已進展到前所未有的程度:

  • We send people to the moon, we make phones that talk to us

    我們把人送上月球、發明可以與人交談的電話,

  • or customize radio stations that can play only music we like.

    或是客製一個電台,只播放個人喜歡的音樂。

  • Yet, our most advanced machines and computers

    然而這台無比聰明的機器和電腦

  • still struggle at this task.

    仍然無法發展這項技能,

  • So I'm here today to give you a progress report

    因此今天我來到這裡向各位報告

  • on the latest advances in our research in computer vision,

    我們在電腦視覺的最新研究進展,

  • one of the most frontier and potentially revolutionary

    這是現階段在資訊業領域中,

  • technologies in computer science.

    最先進、最具潛力的革命性技術。

  • Yes, we have prototyped cars that can drive by themselves,

    是的,目前我們已經有自動駕駛的原型車,

  • but without smart vision, they cannot really tell the difference

    但若不具備視覺辨識技術, 它將無法分辨同樣出現在馬路中,

  • between a crumpled paper bag on the road, which can be run over,

    一團它其實輾過也無妨的破紙袋,

  • and a rock that size, which should be avoided.

    以及一個大到它必須閃避的石塊, 兩者有何不同。

  • We have made fabulous megapixel cameras,

    我們製造出畫素極高的相機,

  • but we have not delivered sight to the blind.

    但我們卻無法賦予盲人視覺;

  • Drones can fly over massive land,

    無人機可以翻山越嶺,

  • but don't have enough vision technology

    卻沒有足夠的視覺技術可以

  • to help us to track the changes of the rainforests.

    讓我們追蹤雨林的變化;

  • Security cameras are everywhere,

    監視器滿佈在各個角落,

  • but they do not alert us when a child is drowning in a swimming pool.

    卻無法在看到一個孩子將溺斃在泳池之際, 對我們發出警訊。

  • Photos and videos are becoming an integral part of global life.

    靜態及動態影像已逐漸與全世界的生活密不可分,

  • They're being generated at a pace that's far beyond what any human,

    它們發展的步伐已經遠遠超越人類

  • or teams of humans, could hope to view,

    及其群體所相信的,

  • and you and I are contributing to that at this TED.

    在座各位以及我自己 都是TED這個活動裡頭的推手。

  • Yet our most advanced software is still struggling at understanding

    然而,目前最先進的軟體卻仍在其中苦苦掙扎,

  • and managing this enormous content.

    無法理解與應用這龐大的資料體。

  • So in other words, collectively as a society,

    換而言之,在這整個社會裡,

  • we're very much blind,

    大家都有如盲人在運作,

  • because our smartest machines are still blind.

    因為連我們最聰明的機器都還看不見。

  • "Why is this so hard?" you may ask.

    或許有人會問:這到底有什麼困難?

  • Cameras can take pictures like this one

    任何相機都可以產生像這樣的照片,

  • by converting lights into a two-dimensional array of numbers

    它是藉由將有色光轉換成2D的數字陣列,

  • known as pixels,

    也就是大家熟知的像素。

  • but these are just lifeless numbers.

    但這些數字是死的,

  • They do not carry meaning in themselves.

    並沒有被賦予意義。

  • Just like to hear is not the same as to listen,

    就好像有「聽」,不代表有「到」。

  • to take pictures is not the same as to see,

    同樣地,攝取到影像不等於看見,

  • and by seeing, we really mean understanding.

    我們所認知的看到,應包含著了解其中的意義。

  • In fact, it took Mother Nature 540 million years of hard work

    事實上,這樣的成果, 是大自然花了五億四千萬年的光陰

  • to do this task,

    才得到的。

  • and much of that effort

    這其中的努力,

  • went into developing the visual processing apparatus of our brains,

    泰半是耗費在發展腦部的視覺處理這個區塊,

  • not the eyes themselves.

    而不是眼睛的部分。

  • So vision begins with the eyes,

    也就是說,視覺始於眼睛,

  • but it truly takes place in the brain.

    但真正使它有用的,卻是大腦。

  • So for 15 years now, starting from my Ph.D. at Caltech

    十五年來,從在加州理工學院攻讀博士開始,

  • and then leading Stanford's Vision Lab,

    到領導史丹佛的視覺實驗室,

  • I've been working with my mentors, collaborators and students

    我和指導教授、同事及學生們,

  • to teach computers to see.

    試圖讓電腦擁有智能之眼,

  • Our research field is called computer vision and machine learning.

    我們研究的領域稱之為電腦視覺與機器學習,

  • It's part of the general field of artificial intelligence.

    這是人工智慧其中一環。

  • So ultimately, we want to teach the machines to see just like we do:

    我們的終極目標就是教導機器能夠像人一樣理解所見之物,

  • naming objects, identifying people, inferring 3D geometry of things,

    像是識別物品、辨認人臉、 推論物體的幾何形態,

  • understanding relations, emotions, actions and intentions.

    進而理解其中的關聯、情緒、動作及意圖。

  • You and I weave together entire stories of people, places and things

    在座每一位和我,都可以在匆匆一瞥的瞬間,

  • the moment we lay our gaze on them.

    理解到人事、地、物所交織而成的網絡,

  • The first step towards this goal is to teach a computer to see objects,

    要電腦達成這個目標的第一步,就是教導它辨別物品,

  • the building block of the visual world.

    這是視覺的基石。

  • In its simplest terms, imagine this teaching process

    簡單來說,我們教導的方法就是

  • as showing the computers some training images

    給電腦看一些特定物體的影像,

  • of a particular object, let's say cats,

    例如貓咪。

  • and designing a model that learns from these training images.

    我們設計了一個程式讓電腦利用這些影像來學習

  • How hard can this be?

    這有啥困難?

  • After all, a cat is just a collection of shapes and colors,

    貓咪不就是由一些幾何圖形和顏色所組成的嘛,

  • and this is what we did in the early days of object modeling.

    這就是我們初期所做的物體模型。

  • We'd tell the computer algorithm in a mathematical language

    我們用數學語言來告知電腦演繹方法,

  • that a cat has a round face, a chubby body,

    貓就是有圓圓的臉、胖胖的身體,

  • two pointy ears, and a long tail,

    兩個尖尖的耳朵和一條長尾巴。

  • and that looked all fine.

    看起來很好啊,

  • But what about this cat?

    但如果貓咪長這樣呢?

  • (Laughter)

    (觀眾笑)

  • It's all curled up.

    全身都捲起來了。

  • Now you have to add another shape and viewpoint to the object model.

    這下子我們又得在原來的模型 加上新的形狀和不同的視野角度。

  • But what if cats are hidden?

    又,如果貓咪是躲著的呢?

  • What about these silly cats?

    像這群傻貓?

  • Now you get my point.

    這樣各位了解我的意思嗎?

  • Even something as simple as a household pet

    即使簡單如貓這樣的家庭寵物,

  • can present an infinite number of variations to the object model,

    也會有相對於原型以外,無數的其他形態表徵,

  • and that's just one object.

    而這只是其中一樣。

  • So about eight years ago,

    因此八年前,

  • a very simple and profound observation changed my thinking.

    一項極其簡單和深刻的觀察,改變了我的想法,

  • No one tells a child how to see,

    沒有人教導孩子如何去「看」,

  • especially in the early years.

    特別是在早期發育階段,

  • They learn this through real-world experiences and examples.

    他們是從真實世界的經驗中學習。

  • If you consider a child's eyes

    如果你把孩童的眼睛

  • as a pair of biological cameras,

    當成生物相機的概念,

  • they take one picture about every 200 milliseconds,

    就如同每200毫秒就拍一張照片一樣,

  • the average time an eye movement is made.

    這是眼球移動的平均時間。

  • So by age three, a child would have seen hundreds of millions of pictures

    年紀到了三歲時, 孩子們已經看過了真實世界中

  • of the real world.

    數以百萬計的照片,

  • That's a lot of training examples.

    這樣的訓練範例是很大量的。

  • So instead of focusing solely on better and better algorithms,

    因此,我的直覺告訴我 應該以孩童的學習經驗法則,

  • my insight was to give the algorithms the kind of training data

    並兼以質與量,

  • that a child was given through experiences

    提供訓練的資料給電腦,

  • in both quantity and quality.

    而非一昧追求更好的程式演算。

  • Once we know this,

    有了上述的洞見,

  • we knew we needed to collect a data set

    我們接下來必須要收集

  • that has far more images than we have ever had before,

    前所未有的大量資料群,

  • perhaps thousands of times more,

    甚至於是千倍以上的。

  • and together with Professor Kai Li at Princeton University,

    於是我與普林斯頓大學的李凱教授

  • we launched the ImageNet project in 2007.

    共同於2007年開始了 我們稱之為 ImageNet 的專案。

  • Luckily, we didn't have to mount a camera on our head

    很幸運地,我們不必在頭上綁一個相機,

  • and wait for many years.

    然後花費數年收集影像,

  • We went to the Internet,

    而是轉而由網際網路,

  • the biggest treasure trove of pictures that humans have ever created.

    這個由人類所創造出來 龐大的影像寶窟,

  • We downloaded nearly a billion images

    我們下載了數以百萬計的影像,

  • and used crowdsourcing technology like the Amazon Mechanical Turk platform

    並且使用如Amazon Mechanical Turk 這樣的群眾外包平台,

  • to help us to label these images.

    來協助我們處理及分類這些照片。

  • At its peak, ImageNet was one of the biggest employers

    在高峰期,ImageNet 甚至是整個亞馬遜平台

  • of the Amazon Mechanical Turk workers:

    最大的雇主之一,

  • together, almost 50,000 workers

    我們一共聘請了來自167個國家,

  • from 167 countries around the world

    約5萬個工作者,

  • helped us to clean, sort and label

    來協助我們分類處理並標示

  • nearly a billion candidate images.

    將近10億幅影像,

  • That was how much effort it took

    花費了這麼多的資源,

  • to capture even a fraction of the imagery

    就是為了捕捉那一絲絲

  • a child's mind takes in in the early developmental years.

    孩童在早期心智發展的浮光掠影。

  • In hindsight, this idea of using big data

    用現在眼光看來,使用大量的資料

  • to train computer algorithms may seem obvious now,

    來訓練電腦演算是明顯合理的,

  • but back in 2007, it was not so obvious.

    然而在2007年的世界卻非如此。

  • We were fairly alone on this journey for quite a while.

    有好長一段時間, 我們在這個旅途中孤獨地踽踽而行,

  • Some very friendly colleagues advised me to do something more useful for my tenure,

    有些同事好心地建議我, 與其苦苦掙扎於研究經費的募集,

  • and we were constantly struggling for research funding.

    還不如轉而先做些比較好拿到終身聘的研究,

  • Once, I even joked to my graduate students

    我還曾跟我的研究生開玩笑說

  • that I would just reopen my dry cleaner's shop to fund ImageNet.

    我乾脆再開一間乾洗店來資助ImageNet 好了,

  • After all, that's how I funded my college years.

    畢竟那就是我用以支付大學學費的方法。

  • So we carried on.

    就這樣我們還是繼續往前走,

  • In 2009, the ImageNet project delivered

    2009年起,ImageNet 已經是個擁有

  • a database of 15 million images

    涵蓋了兩萬兩千種不同類別,

  • across 22,000 classes of objects and things

    多達150億幅圖像的資料庫,

  • organized by everyday English words.

    並組織以英語日常生活用字為主,

  • In both quantity and quality,

    這樣的規模,不論是「質」或「量」

  • this was an unprecedented scale.

    都是史無前例的。

  • As an example, in the case of cats,

    用貓來舉個例子說明,

  • we have more than 62,000 cats

    我們有超過六萬兩千種

  • of all kinds of looks and poses

    不同外觀和姿勢的貓咪,

  • and across all species of domestic and wild cats.

    橫跨不同的種類,有家貓,也有野貓。

  • We were thrilled to have put together ImageNet,

    ImageNet 的成果讓我們非常激動,

  • and we wanted the whole research world to benefit from it,

    我們希望它有助於全世界的研究,

  • so in the TED fashion, we opened up the entire data set

    就如同 TED 的貢獻,我們免費提供整個資料庫

  • to the worldwide research community for free.

    給全世界的研究單位。

  • (Applause)

    (觀眾鼓掌)

  • Now that we have the data to nourish our computer brain,

    有了這些資料,我們可以教育我們的電腦,

  • we're ready to come back to the algorithms themselves.

    下一步就是回到程式演算的部分了。

  • As it turned out, the wealth of information provided by ImageNet

    結果我們發現,ImageNet 所提供的豐富資訊

  • was a perfect match to a particular class of machine learning algorithms

    恰巧與機器學習演算的其中一門特定領域 不謀而合,

  • called convolutional neural network,

    我們稱它為「卷積神經網絡」,

  • pioneered by Kunihiko Fukushima, Geoff Hinton, and Yann LeCun

    在七零及八零年代,福島邦彥、Geoff Hinton

  • back in the 1970s and '80s.

    和 Yann LeCun 等學者為該領域的先驅。

  • Just like the brain consists of billions of highly connected neurons,

    正如同大腦是由無數個緊密連結的神經元所組成,

  • a basic operating unit in a neural network

    神經網絡的基本運作單位

  • is a neuron-like node.

    也是一個類神經元的節點。

  • It takes input from other nodes

    它的運作方式是從別的節點得到資料,

  • and sends output to others.

    然後再傳給其他的節點。

  • Moreover, these hundreds of thousands or even millions of nodes

    而且這些數不清的節點

  • are organized in hierarchical layers,

    擁有層層的組織架構,

  • also similar to the brain.

    就好像我們的大腦一樣。

  • In a typical neural network we use to train our object recognition model,

    在一般的神經網絡中, 我們用作訓練的物品辨識模型

  • it has 24 million nodes,

    就有兩千四百萬個節點、

  • 140 million parameters,

    一億四千萬個參數,

  • and 15 billion connections.

    以及一百五十億個連結。

  • That's an enormous model.

    這是一個大的不得了的模型。

  • Powered by the massive data from ImageNet

    由ImageNet 提供巨大的資料群、

  • and the modern CPUs and GPUs to train such a humongous model,

    並使用先進的核心處理器及圖型處理器來訓練 這個龐然大物,

  • the convolutional neural network

    卷積神經網絡就在眾人的意料外

  • blossomed in a way that no one expected.

    開花結果了。

  • It became the winning architecture

    在物品辨識領域中,這樣的架構

  • to generate exciting new results in object recognition.

    以令人興奮的嶄新成果,傲視群雄。

  • This is a computer telling us

    電腦告訴我們

  • this picture contains a cat

    這張圖中有隻貓,

  • and where the cat is.

    還告訴我們貓在哪裡。

  • Of course there are more things than cats,

    當然,這世界不會只有貓,

  • so here's a computer algorithm telling us

    電腦的演算告訴我們

  • the picture contains a boy and a teddy bear;

    這張圖中有一個男孩和一隻泰迪熊;

  • a dog, a person, and a small kite in the background;

    有狗,一個人,以及背景中的一支小風箏;

  • or a picture of very busy things

    或這一張令人眼花撩亂的圖,

  • like a man, a skateboard, railings, a lampost, and so on.

    有人、滑板、欄杆、路燈,等等。

  • Sometimes, when the computer is not so confident about what it sees,

    有時候,如果電腦不確定自己所見到的東西時,

  • we have taught it to be smart enough

    我們已經將它教到可以聰明地

  • to give us a safe answer instead of committing too much,

    給一個安全的答案,而非莽撞地回答,

  • just like we would do,

    就像一般人會做的。

  • but other times our computer algorithm is remarkable at telling us

    更有些時候,電腦的運算竟能夠

  • what exactly the objects are,

    精準地辨別物體品項

  • like the make, model, year of the cars.

    例如製造商、型號、車子的年份。

  • We applied this algorithm to millions of Google Street View images

    Google 將這個演算程式廣泛地運用在

  • across hundreds of American cities,

    數百個美國城市的街景裡,

  • and we have learned something really interesting:

    也因此我們從中得到了一些有趣的概念。

  • first, it confirmed our common wisdom

    首先,它證實了一項廣為人知的說法,

  • that car prices correlate very well

    也就是汽車價格和家庭收入

  • with household incomes.

    是息息相關的。

  • But surprisingly, car prices also correlate well

    然而令人驚訝的是,汽車價格也和

  • with crime rates in cities,

    城市中的犯罪率

  • or voting patterns by zip codes.

    以及區域選舉模式,有相當的關係。

  • So wait a minute. Is that it?

    等等,難道說我今天

  • Has the computer already matched or even surpassed human capabilities?

    就是來告訴各位電腦已經趕上 甚至超越人類了嗎?

  • Not so fast.

    還早得很呢。

  • So far, we have just taught the computer to see objects.

    到目前為止,我們只是教導電腦識別物品,

  • This is like a small child learning to utter a few nouns.

    就像小孩子牙牙學語一樣,

  • It's an incredible accomplishment,

    雖然這是個傲人的進展,

  • but it's only the first step.

    但它不過是第一步而已,

  • Soon, another developmental milestone will be hit,

    很快地,下一波具指標性的後浪就會打上來了,

  • and children begin to communicate in sentences.

    小孩子開始進展到用句子來溝通。

  • So instead of saying this is a cat in the picture,

    因此,他已經不會用「這是貓」 來描述圖片,

  • you already heard the little girl telling us this is a cat lying on a bed.

    而是會聽到這個小女孩說「這是躺在床上的貓」。

  • So to teach a computer to see a picture and generate sentences,

    因此,要教導電腦看到圖並說出句子,

  • the marriage between big data and machine learning algorithm

    必須進一步地仰賴龐大資料群

  • has to take another step.

    以及機器的學習演算。

  • Now, the computer has to learn from both pictures

    現在,電腦不僅要學習圖片識別,

  • as well as natural language sentences

    還要學習人類自然的

  • generated by humans.

    說話方式。

  • Just like the brain integrates vision and language,

    就如同大腦要結合視覺和語言一樣,

  • we developed a model that connects parts of visual things

    我們做出了一個模型, 它可以連結不同的可視物體,

  • like visual snippets

    就像視覺片段一樣,

  • with words and phrases in sentences.

    並附上句子用的字詞和片語。

  • About four months ago,

    約四個月前,

  • we finally tied all this together

    我們終於把所有的元素全部兜起來了,

  • and produced one of the first computer vision models

    做出了第一個電腦版的模型,

  • that is capable of generating a human-like sentence

    它有辦法在初次看到照片時

  • when it sees a picture for the first time.

    說出像人類般自然的句子,

  • Now, I'm ready to show you what the computer says

    好,現在我要給各位看看電腦

  • when it sees the picture

    對於演講一開頭

  • that the little girl saw at the beginning of this talk.

    那位小女孩所看到的影像, 它又是如何理解的。

  • (Video) Computer: A man is standing next to an elephant.

    (電腦) 有個人站在大象旁邊。

  • A large airplane sitting on top of an airport runway.

    一架大飛機停在機場跑道上。

  • FFL: Of course, we're still working hard to improve our algorithms,

    (主講人) 當然,我們仍戮力於改善這電腦程式,

  • and it still has a lot to learn.

    它還有很多要學。

  • (Applause)

    (觀眾鼓掌)