字幕列表 影片播放
-
America's favorite pie is?
美國人最喜歡哪一種派?
-
Audience: Apple. Kenneth Cukier: Apple. Of course it is.
觀眾:蘋果派。
-
How do we know it?
講者:蘋果派,當然啦!
-
Because of data.
我們怎麼知道?
-
You look at supermarket sales.
因為有數據。
-
You look at supermarket sales of 30-centimeter pies
我們分析超市銷售數據,
-
that are frozen, and apple wins, no contest.
分析直徑 30 公分冷凍蘋果派的超市銷售數據,
-
The majority of the sales are apple.
蘋果派最夯,銷量一面倒。
-
But then supermarkets started selling
顧客幾乎都是買蘋果派。
-
smaller, 11-centimeter pies,
但是後來,超市開始賣小派,
-
and suddenly, apple fell to fourth or fifth place.
直徑 11 公分的派,
-
Why? What happened?
突然,蘋果派銷量掉到第四、五名,
-
Okay, think about it.
為什麼?發生了什麼事?
-
When you buy a 30-centimeter pie,
好,你想想:
-
the whole family has to agree,
如果是買 30 公分的大派,
-
and apple is everyone's second favorite.
全家人都得同意,
-
(Laughter)
而蘋果是全家每個人的第二選擇,
-
But when you buy an individual 11-centimeter pie,
(觀眾笑聲)
-
you can buy the one that you want.
但是當你分開買 11 公分的小派,
-
You can get your first choice.
就可以買你自己想吃的,
-
You have more data.
每個人都可以選自己最愛的口味。
-
You can see something
這就會產生更多的數據。
-
that you couldn't see
你會有新發現,
-
when you only had smaller amounts of it.
看出數據少的時候,無法發現的現象。
-
Now, the point here is that more data
現在,這個例子的重點是,
-
doesn't just let us see more,
數據增加,不只是讓我們看見更「多」,
-
more of the same thing we were looking at.
更多我們本來就已經知道的;
-
More data allows us to see new.
數據增加,讓我們看見「新」資訊,
-
It allows us to see better.
看得更「準確」,
-
It allows us to see different.
看見「不同」。
-
In this case, it allows us to see
在這個例子,它使我們看到
-
what America's favorite pie is:
美國人真正最喜歡的派是什麼:
-
not apple.
不是蘋果派。
-
Now, you probably all have heard the term big data.
你們可能都聽過「大數據」這個詞,
-
In fact, you're probably sick of hearing the term
其實,你們可能已經聽膩了。
-
big data.
的確有很多大肆宣傳,
-
It is true that there is a lot of hype around the term,
非常遺憾。
-
and that is very unfortunate,
因為大數據是極為重要的工具,
-
because big data is an extremely important tool
將會推動社會進步。
-
by which society is going to advance.
過去,我們依賴少量數據,
-
In the past, we used to look at small data
研究其含義,
-
and think about what it would mean
試圖了解我們的世界。
-
to try to understand the world,
現在我們有了更多數據,
-
and now we have a lot more of it,
遠超過以往能力所及。
-
more than we ever could before.
我們發現,
-
What we find is that when we have
當我們擁有龐大的數據,
-
a large body of data, we can fundamentally do things
就可以做過去數據較少時做不到的事。
-
that we couldn't do when we only had smaller amounts.
大數據很重要,
-
Big data is important, and big data is new,
大數據也很新。
-
and when you think about it,
你想一想,
-
the only way this planet is going to deal
唯一能幫助地球因應全球的挑戰:
-
with its global challenges —
解決饑荒、
-
to feed people, supply them with medical care,
提供醫療、
-
supply them with energy, electricity,
提供能源和電力、
-
and to make sure they're not burnt to a crisp
確保我們不被全球暖化烤焦,
-
because of global warming —
唯一的方法,就是靠善用數據。
-
is because of the effective use of data.
所以大數據有什麼稀奇?
-
So what is new about big data? What is the big deal?
有什麼好「大」驚小怪?
-
Well, to answer that question, let's think about
要回答這個問題,
-
what information looked like,
讓我們先來看資訊以前長什麼樣子。
-
physically looked like in the past.
好,
-
In 1908, on the island of Crete,
1908 年,在克里特島,
-
archaeologists discovered a clay disc.
考古學家發現一個泥土圓盤,
-
They dated it from 2000 B.C., so it's 4,000 years old.
鑑定大約是公元前 2 千年製成的,
-
Now, there's inscriptions on this disc,
所以已經有 4 千年之久。
-
but we actually don't know what it means.
圓盤上刻有古文字,
-
It's a complete mystery, but the point is that
但無法解讀,
-
this is what information used to look like
是個謎團。
-
4,000 years ago.
但重點是,4 千年前資訊是這個樣貌,
-
This is how society stored
古人是用這種方式儲存、傳遞資訊。
-
and transmitted information.
到現在,社會並沒有進步那麼多,
-
Now, society hasn't advanced all that much.
我們還是把資訊存在碟片上,
-
We still store information on discs,
只是現在可以儲存更多資訊,
-
but now we can store a lot more information,
空前的多。
-
more than ever before.
搜尋更容易,複製更容易,
-
Searching it is easier. Copying it easier.
分享更容易,處理更容易。
-
Sharing it is easier. Processing it is easier.
我們可以重複使用這些資訊,
-
And what we can do is we can reuse this information
用途之廣,超乎想像,
-
for uses that we never even imagined
超乎我們蒐集資訊時的預期。
-
when we first collected the data.
這樣看來,資訊已經
-
In this respect, the data has gone
從「存料」 變成「流動」;
-
from a stock to a flow,
從靜止、靜態的,
-
from something that is stationary and static
變成流體、動態的。
-
to something that is fluid and dynamic.
資訊可說是,有流動性。
-
There is, if you will, a liquidity to information.
那個 4 千年之久的克里特圓盤,
-
The disc that was discovered off of Crete
它很重,
-
that's 4,000 years old, is heavy,
儲存的資訊量不多,
-
it doesn't store a lot of information,
內容也不能更改。
-
and that information is unchangeable.
相較之下,
-
By contrast, all of the files
愛德華.史諾登盜走的所有檔案,
-
that Edward Snowden took
就是他從美國國安局竊走的資料,
-
from the National Security Agency in the United States
可以全部存在一個記憶卡,
-
fits on a memory stick
體積只有指甲般的大小。
-
the size of a fingernail,
並且可以用光速來傳輸分享。
-
and it can be shared at the speed of light.
更多的數據!
-
More data. More.
更多。
-
Now, one reason why we have so much data in the world today
今天之所以有這麼多的數據,
-
is we are collecting things
原因之一是我們正在蒐集過去
-
that we've always collected information on,
儲存資訊的物體;
-
but another reason why is we're taking things
原因之二是,
-
that have always been informational
我們把一些經常很資訊性的東西——
-
but have never been rendered into a data format
從未數據化的資訊,
-
and we are putting it into data.
把它們變成數據,
-
Think, for example, the question of location.
例如,地理位置。
-
Take, for example, Martin Luther.
舉馬丁.路德為例,
-
If we wanted to know in the 1500s
如果我們想知道十六世紀時,
-
where Martin Luther was,
馬丁.路德去過哪些地方,
-
we would have to follow him at all times,
我們必須隨時跟著他到處跑,
-
maybe with a feathery quill and an inkwell,
可能還要帶著羽毛筆和墨水瓶,
-
and record it,
隨時記錄。
-
but now think about what it looks like today.
但是看看現在的做法,
-
You know that somewhere,
你知道世界上某處,
-
probably in a telecommunications carrier's database,
可能是電信商的資料庫裡面,
-
there is a spreadsheet or at least a database entry
有一個試算表或至少有一筆記錄,
-
that records your information
存著關於你的資訊,
-
of where you've been at all times.
記錄你去過的所有地方。
-
If you have a cell phone,
如果你有一支手機,
-
and that cell phone has GPS, but even if it doesn't have GPS,
手機有 GPS,但就算沒有 GPS,
-
it can record your information.
還是可以記錄你的資訊。
-
In this respect, location has been datafied.
就這個角度來說,位置已經被數據化。
-
Now think, for example, of the issue of posture,
現在再想想這個例子:姿勢,
-
the way that you are all sitting right now,
就是你們現在的坐姿,
-
the way that you sit,
你的坐姿、
-
the way that you sit, the way that you sit.
你的坐姿,和你的坐姿,
-
It's all different, and it's a function of your leg length
都不一樣,取決於你的腿長、
-
and your back and the contours of your back,
你的背和背部輪廓。
-
and if I were to put sensors, maybe 100 sensors
要是我現在裝 1 百個感應器,
-
into all of your chairs right now,
到你們每個人的椅子上,
-
I could create an index that's fairly unique to you,
我可以建出你個人獨特的索引資料,
-
sort of like a fingerprint, but it's not your finger.
有點像指紋,但不是你的手指。
-
So what could we do with this?
這有什麼用?
-
Researchers in Tokyo are using it
東京的研究員用這種數據
-
as a potential anti-theft device in cars.
來研發汽車防盜裝置。
-
The idea is that the carjacker sits behind the wheel,
概念是,偷車賊坐在駕駛座,
-
tries to stream off, but the car recognizes
急著開車逃逸,
-
that a non-approved driver is behind the wheel,
但是車子辨識出開車的人未經授權,
-
and maybe the engine just stops, unless you
引擎就自動熄火,
-
type in a password into the dashboard
除非你輸入密碼到儀表板,
-
to say, "Hey, I have authorization to drive." Great.
告訴系統:「嘿,我可是有經授權喔!」
-
What if every single car in Europe
很好。
-
had this technology in it?
若歐洲每輛汽車都有這個裝置呢?
-
What could we do then?
那又能做什麼?
-
Maybe, if we aggregated the data,
或許,我們可以聚集所有的數據,
-
maybe we could identify telltale signs
或許能提早偵測到警訊,
-
that best predict that a car accident
預測車禍
-
is going to take place in the next five seconds.
即將在 5 秒鐘內發生。
-
And then what we will have datafied
然後我們還可以數據化
-
is driver fatigue,
駕駛員的疲勞狀態,
-
and the service would be when the car senses
汽車系統可以偵測到
-
that the person slumps into that position,
駕駛癱坐成某個姿勢,
-
automatically knows, hey, set an internal alarm
自動感知,發出指令啟動響鈴,
-
that would vibrate the steering wheel, honk inside
導致方向盤震動,
-
to say, "Hey, wake up,
車內喇叭作響,大喊:「嘿,快醒來!
-
pay more attention to the road."
注意路況!」
-
These are the sorts of things we can do
這一類的事都可以做到,
-
when we datafy more aspects of our lives.
當我們把更多的生活層面數據化。
-
So what is the value of big data?
那麼,大數據究竟有什麼價值?
-
Well, think about it.
想想看,
-
You have more information.
現在有更多資訊,
-
You can do things that you couldn't do before.
可以做過去不能做的事。
-
One of the most impressive areas
這概念的應用當中,最驚人的領域之一,
-
where this concept is taking place
就是「機器學習」。
-
is in the area of machine learning.
機器學習是人工智慧的一個分支,
-
Machine learning is a branch of artificial intelligence,
人工智慧又是電腦科學的分支。
-
which itself is a branch of computer science.
基本概念是:
-
The general idea is that instead of
不必告訴電腦要做什麼,
-
instructing a computer what to do,
只要把數據輸入到問題裡,
-
we are going to simply throw data at the problem
然後叫電腦自己想辦法。
-
and tell the computer to figure it out for itself.
我們回顧一下源頭,就會比較容易了解。
-
And it will help you understand it
1950 年代,IBM 有位電腦科學家
-
by seeing its origins.
名叫亞瑟.山姆爾,很愛下跳棋,
-
In the 1950s, a computer scientist
所以他寫了一個電腦程式,
-
at IBM named Arthur Samuel liked to play checkers,
叫電腦跟他對打。
-
so he wrote a computer program
他開始下棋,結果他贏了。
-
so he could play against the computer.
他再開始下棋,結果他又贏了。
-
He played. He won.
他再下,還是他贏。
-
He played. He won.
因為電腦只會
-
He played. He won,
棋步的規則。
-
because the computer only knew
而亞瑟.山姆爾會得更多,
-
what a legal move was.
他懂得策略。
-
Arthur Samuel knew something else.
所以他又寫了一個副程式,
-
Arthur Samuel knew strategy.
在背景執行,只做一件事:
-
So he wrote a small sub-program alongside it
就是計算機率,
-
operating in the background, and all it did
評估目前的棋局,
-
was score the probability
比較贏棋和輸棋的機率,
-
that a given board configuration would likely lead
每下一步棋,就重算一次。
-
to a winning board versus a losing board
然後他又跟電腦對打,結果他贏。
-
after every move.
再對打,還是他贏。
-
He plays the computer. He wins.
再對打,還是他贏。
-
He plays the computer. He wins.
然後亞瑟.山姆爾讓電腦自己對打。
-
He plays the computer. He wins.
它就自己下棋,一邊收集數據。
-
And then Arthur Samuel leaves the computer
越收集越多,它的預測準確度就提高。
-
to play itself.
然後亞瑟.山姆爾再回來跟電腦對打。
-
It plays itself. It collects more data.
他開始下棋,結果他輸了。
-
It collects more data. It increases the accuracy of its prediction.
他又下,又輸了。
-
And then Arthur Samuel goes back to the computer
再下,還是輸。
-
and he plays it, and he loses,
亞瑟.山姆爾創造了一台機器,
-
and he plays it, and he loses,
它的能力青出於藍,更甚於藍。
-
and he plays it, and he loses,
而這種機器學習的概念,
-
and Arthur Samuel has created a machine
現在到處可見。
-
that surpasses his ability in a task that he taught it.
你想我們怎麼會有自動駕駛汽車?
-
And this idea of machine learning
把全部交通規則都輸入到軟體,可以改善社會嗎?
-
is going everywhere.
不是。
-
How do you think we have self-driving cars?
因為記憶體更便宜嗎?不是。
-
Are we any better off as a society
演算法變快了?不。
-
enshrining all the rules of the road into software?
有更好的處理器?不。
-
No. Memory is cheaper. No.
這些都很重要,但不是真正的原因。
-
Algorithms are faster. No. Processors are better. No.
真正的原因是我們改變了問題的本質。
-
All of those things matter, but that's not why.
我們把問題從
-
It's because we changed the nature of the problem.
明確指示電腦如何開車,
-
We changed the nature of the problem from one
改成對電腦說:
-
in which we tried to overtly and explicitly
「我給你大量的開車數據,
-
explain to the computer how to drive
你自個兒看著辦吧!」
-
to one in which we say,
你自己判斷出那是紅綠燈,
-
"Here's a lot of data around the vehicle.
而且現在亮紅燈,不是綠燈,
-
You figure it out.
表示你要停車,
-
You figure it out that that is a traffic light,
不能繼續開。」
-
that that traffic light is red and not green,
機器學習也是
-
that that means that you need to stop
我們許多網路活動的基礎:
-
and not go forward."
搜尋引擎、
-
Machine learning is at the basis
亞馬遜的個人化演算法、
-
of many of the things that we do online:
電腦翻譯、
-
search engines,
語音辨識系統。
-
Amazon's personalization algorithm,
研究專家近來研究
-
computer translation,
活組織切片檢查,
-
voice recognition systems.
癌組織切片,
-
Researchers recently have looked at
他們叫電腦自己判別,
-
the question of biopsies,
電腦分析數據和存活率,
-
cancerous biopsies,
判斷是否為癌症細胞。
-
and they've asked the computer to identify
果然,當你把數據丟給電腦,
-
by looking at the data and survival rates
透過一個機器學習的演算法,
-
to determine whether cells are actually
電腦真的能找出
-
cancerous or not,
12 大危險徵兆,
-
and sure enough, when you throw the data at it,
預測這個乳房癌細胞的切片
-
through a machine-learning algorithm,
真的就是癌腫瘤。
-
the machine was able to identify
問題來了:醫學文獻只知道
-
the 12 telltale signs that best predict
其中 9 項。
-
that this biopsy of the breast cancer cells
另外 3 項特性
-
are indeed cancerous.
是我們以前不需檢查的,
-
The problem: The medical literature
卻被電腦找出來了。
-
only knew nine of them.
好。
-
Three of the traits were ones
不過,大數據也有不好的一面。
-
that people didn't need to look for,
它會改善我們的生活,
-
but that the machine spotted.
但是也有我們必須注意的問題。