字幕列表 影片播放
America's favorite pie is?
美國人最喜歡哪一種派?
Audience: Apple. Kenneth Cukier: Apple. Of course it is.
觀眾:蘋果派。
How do we know it?
講者:蘋果派,當然啦!
Because of data.
我們怎麼知道?
You look at supermarket sales.
因為有數據。
You look at supermarket sales of 30-centimeter pies
我們分析超市銷售數據,
that are frozen, and apple wins, no contest.
分析直徑 30 公分冷凍蘋果派的超市銷售數據,
The majority of the sales are apple.
蘋果派最夯,銷量一面倒。
But then supermarkets started selling
顧客幾乎都是買蘋果派。
smaller, 11-centimeter pies,
但是後來,超市開始賣小派,
and suddenly, apple fell to fourth or fifth place.
直徑 11 公分的派,
Why? What happened?
突然,蘋果派銷量掉到第四、五名,
Okay, think about it.
為什麼?發生了什麼事?
When you buy a 30-centimeter pie,
好,你想想:
the whole family has to agree,
如果是買 30 公分的大派,
and apple is everyone's second favorite.
全家人都得同意,
(Laughter)
而蘋果是全家每個人的第二選擇,
But when you buy an individual 11-centimeter pie,
(觀眾笑聲)
you can buy the one that you want.
但是當你分開買 11 公分的小派,
You can get your first choice.
就可以買你自己想吃的,
You have more data.
每個人都可以選自己最愛的口味。
You can see something
這就會產生更多的數據。
that you couldn't see
你會有新發現,
when you only had smaller amounts of it.
看出數據少的時候,無法發現的現象。
Now, the point here is that more data
現在,這個例子的重點是,
doesn't just let us see more,
數據增加,不只是讓我們看見更「多」,
more of the same thing we were looking at.
更多我們本來就已經知道的;
More data allows us to see new.
數據增加,讓我們看見「新」資訊,
It allows us to see better.
看得更「準確」,
It allows us to see different.
看見「不同」。
In this case, it allows us to see
在這個例子,它使我們看到
what America's favorite pie is:
美國人真正最喜歡的派是什麼:
not apple.
不是蘋果派。
Now, you probably all have heard the term big data.
你們可能都聽過「大數據」這個詞,
In fact, you're probably sick of hearing the term
其實,你們可能已經聽膩了。
big data.
的確有很多大肆宣傳,
It is true that there is a lot of hype around the term,
非常遺憾。
and that is very unfortunate,
因為大數據是極為重要的工具,
because big data is an extremely important tool
將會推動社會進步。
by which society is going to advance.
過去,我們依賴少量數據,
In the past, we used to look at small data
研究其含義,
and think about what it would mean
試圖了解我們的世界。
to try to understand the world,
現在我們有了更多數據,
and now we have a lot more of it,
遠超過以往能力所及。
more than we ever could before.
我們發現,
What we find is that when we have
當我們擁有龐大的數據,
a large body of data, we can fundamentally do things
就可以做過去數據較少時做不到的事。
that we couldn't do when we only had smaller amounts.
大數據很重要,
Big data is important, and big data is new,
大數據也很新。
and when you think about it,
你想一想,
the only way this planet is going to deal
唯一能幫助地球因應全球的挑戰:
with its global challenges —
解決饑荒、
to feed people, supply them with medical care,
提供醫療、
supply them with energy, electricity,
提供能源和電力、
and to make sure they're not burnt to a crisp
確保我們不被全球暖化烤焦,
because of global warming —
唯一的方法,就是靠善用數據。
is because of the effective use of data.
所以大數據有什麼稀奇?
So what is new about big data? What is the big deal?
有什麼好「大」驚小怪?
Well, to answer that question, let's think about
要回答這個問題,
what information looked like,
讓我們先來看資訊以前長什麼樣子。
physically looked like in the past.
好,
In 1908, on the island of Crete,
1908 年,在克里特島,
archaeologists discovered a clay disc.
考古學家發現一個泥土圓盤,
They dated it from 2000 B.C., so it's 4,000 years old.
鑑定大約是公元前 2 千年製成的,
Now, there's inscriptions on this disc,
所以已經有 4 千年之久。
but we actually don't know what it means.
圓盤上刻有古文字,
It's a complete mystery, but the point is that
但無法解讀,
this is what information used to look like
是個謎團。
4,000 years ago.
但重點是,4 千年前資訊是這個樣貌,
This is how society stored
古人是用這種方式儲存、傳遞資訊。
and transmitted information.
到現在,社會並沒有進步那麼多,
Now, society hasn't advanced all that much.
我們還是把資訊存在碟片上,
We still store information on discs,
只是現在可以儲存更多資訊,
but now we can store a lot more information,
空前的多。
more than ever before.
搜尋更容易,複製更容易,
Searching it is easier. Copying it easier.
分享更容易,處理更容易。
Sharing it is easier. Processing it is easier.
我們可以重複使用這些資訊,
And what we can do is we can reuse this information
用途之廣,超乎想像,
for uses that we never even imagined
超乎我們蒐集資訊時的預期。
when we first collected the data.
這樣看來,資訊已經
In this respect, the data has gone
從「存料」 變成「流動」;
from a stock to a flow,
從靜止、靜態的,
from something that is stationary and static
變成流體、動態的。
to something that is fluid and dynamic.
資訊可說是,有流動性。
There is, if you will, a liquidity to information.
那個 4 千年之久的克里特圓盤,
The disc that was discovered off of Crete
它很重,
that's 4,000 years old, is heavy,
儲存的資訊量不多,
it doesn't store a lot of information,
內容也不能更改。
and that information is unchangeable.
相較之下,
By contrast, all of the files
愛德華.史諾登盜走的所有檔案,
that Edward Snowden took
就是他從美國國安局竊走的資料,
from the National Security Agency in the United States
可以全部存在一個記憶卡,
fits on a memory stick
體積只有指甲般的大小。
the size of a fingernail,
並且可以用光速來傳輸分享。
and it can be shared at the speed of light.
更多的數據!
More data. More.
更多。
Now, one reason why we have so much data in the world today
今天之所以有這麼多的數據,
is we are collecting things
原因之一是我們正在蒐集過去
that we've always collected information on,
儲存資訊的物體;
but another reason why is we're taking things
原因之二是,
that have always been informational
我們把一些經常很資訊性的東西——
but have never been rendered into a data format
從未數據化的資訊,
and we are putting it into data.
把它們變成數據,
Think, for example, the question of location.
例如,地理位置。
Take, for example, Martin Luther.
舉馬丁.路德為例,
If we wanted to know in the 1500s
如果我們想知道十六世紀時,
where Martin Luther was,
馬丁.路德去過哪些地方,
we would have to follow him at all times,
我們必須隨時跟著他到處跑,
maybe with a feathery quill and an inkwell,
可能還要帶著羽毛筆和墨水瓶,
and record it,
隨時記錄。
but now think about what it looks like today.
但是看看現在的做法,
You know that somewhere,
你知道世界上某處,
probably in a telecommunications carrier's database,
可能是電信商的資料庫裡面,
there is a spreadsheet or at least a database entry
有一個試算表或至少有一筆記錄,
that records your information
存著關於你的資訊,
of where you've been at all times.
記錄你去過的所有地方。
If you have a cell phone,
如果你有一支手機,
and that cell phone has GPS, but even if it doesn't have GPS,
手機有 GPS,但就算沒有 GPS,
it can record your information.
還是可以記錄你的資訊。
In this respect, location has been datafied.
就這個角度來說,位置已經被數據化。
Now think, for example, of the issue of posture,
現在再想想這個例子:姿勢,
the way that you are all sitting right now,
就是你們現在的坐姿,
the way that you sit,
你的坐姿、
the way that you sit, the way that you sit.
你的坐姿,和你的坐姿,
It's all different, and it's a function of your leg length
都不一樣,取決於你的腿長、
and your back and the contours of your back,
你的背和背部輪廓。
and if I were to put sensors, maybe 100 sensors
要是我現在裝 1 百個感應器,
into all of your chairs right now,
到你們每個人的椅子上,
I could create an index that's fairly unique to you,
我可以建出你個人獨特的索引資料,
sort of like a fingerprint, but it's not your finger.
有點像指紋,但不是你的手指。
So what could we do with this?
這有什麼用?
Researchers in Tokyo are using it
東京的研究員用這種數據
as a potential anti-theft device in cars.
來研發汽車防盜裝置。
The idea is that the carjacker sits behind the wheel,
概念是,偷車賊坐在駕駛座,
tries to stream off, but the car recognizes
急著開車逃逸,
that a non-approved driver is behind the wheel,
但是車子辨識出開車的人未經授權,
and maybe the engine just stops, unless you
引擎就自動熄火,
type in a password into the dashboard
除非你輸入密碼到儀表板,
to say, "Hey, I have authorization to drive." Great.
告訴系統:「嘿,我可是有經授權喔!」
What if every single car in Europe
很好。
had this technology in it?
若歐洲每輛汽車都有這個裝置呢?
What could we do then?
那又能做什麼?
Maybe, if we aggregated the data,
或許,我們可以聚集所有的數據,
maybe we could identify telltale signs
或許能提早偵測到警訊,
that best predict that a car accident
預測車禍
is going to take place in the next five seconds.
即將在 5 秒鐘內發生。
And then what we will have datafied
然後我們還可以數據化
is driver fatigue,
駕駛員的疲勞狀態,
and the service would be when the car senses
汽車系統可以偵測到
that the person slumps into that position,
駕駛癱坐成某個姿勢,
automatically knows, hey, set an internal alarm
自動感知,發出指令啟動響鈴,
that would vibrate the steering wheel, honk inside
導致方向盤震動,
to say, "Hey, wake up,
車內喇叭作響,大喊:「嘿,快醒來!
pay more attention to the road."
注意路況!」
These are the sorts of things we can do
這一類的事都可以做到,
when we datafy more aspects of our lives.
當我們把更多的生活層面數據化。
So what is the value of big data?
那麼,大數據究竟有什麼價值?
Well, think about it.
想想看,
You have more information.
現在有更多資訊,
You can do things that you couldn't do before.
可以做過去不能做的事。
One of the most impressive areas
這概念的應用當中,最驚人的領域之一,
where this concept is taking place
就是「機器學習」。
is in the area of machine learning.
機器學習是人工智慧的一個分支,
Machine learning is a branch of artificial intelligence,
人工智慧又是電腦科學的分支。
which itself is a branch of computer science.
基本概念是:
The general idea is that instead of
不必告訴電腦要做什麼,
instructing a computer what to do,
只要把數據輸入到問題裡,
we are going to simply throw data at the problem
然後叫電腦自己想辦法。
and tell the computer to figure it out for itself.
我們回顧一下源頭,就會比較容易了解。
And it will help you understand it
1950 年代,IBM 有位電腦科學家
by seeing its origins.
名叫亞瑟.山姆爾,很愛下跳棋,
In the 1950s, a computer scientist
所以他寫了一個電腦程式,
at IBM named Arthur Samuel liked to play checkers,
叫電腦跟他對打。
so he wrote a computer program
他開始下棋,結果他贏了。
so he could play against the computer.
他再開始下棋,結果他又贏了。
He played. He won.
他再下,還是他贏。
He played. He won.
因為電腦只會
He played. He won,
棋步的規則。
because the computer only knew
而亞瑟.山姆爾會得更多,
what a legal move was.
他懂得策略。
Arthur Samuel knew something else.
所以他又寫了一個副程式,
Arthur Samuel knew strategy.
在背景執行,只做一件事:
So he wrote a small sub-program alongside it
就是計算機率,
operating in the background, and all it did
評估目前的棋局,
was score the probability
比較贏棋和輸棋的機率,
that a given board configuration would likely lead
每下一步棋,就重算一次。
to a winning board versus a losing board
然後他又跟電腦對打,結果他贏。
after every move.
再對打,還是他贏。
He plays the computer. He wins.
再對打,還是他贏。
He plays the computer. He wins.
然後亞瑟.山姆爾讓電腦自己對打。
He plays the computer. He wins.
它就自己下棋,一邊收集數據。
And then Arthur Samuel leaves the computer
越收集越多,它的預測準確度就提高。
to play itself.
然後亞瑟.山姆爾再回來跟電腦對打。
It plays itself. It collects more data.
他開始下棋,結果他輸了。
It collects more data. It increases the accuracy of its prediction.
他又下,又輸了。
And then Arthur Samuel goes back to the computer
再下,還是輸。
and he plays it, and he loses,
亞瑟.山姆爾創造了一台機器,
and he plays it, and he loses,
它的能力青出於藍,更甚於藍。
and he plays it, and he loses,
而這種機器學習的概念,
and Arthur Samuel has created a machine
現在到處可見。
that surpasses his ability in a task that he taught it.
你想我們怎麼會有自動駕駛汽車?
And this idea of machine learning
把全部交通規則都輸入到軟體,可以改善社會嗎?
is going everywhere.
不是。
How do you think we have self-driving cars?
因為記憶體更便宜嗎?不是。
Are we any better off as a society
演算法變快了?不。
enshrining all the rules of the road into software?
有更好的處理器?不。
No. Memory is cheaper. No.
這些都很重要,但不是真正的原因。
Algorithms are faster. No. Processors are better. No.
真正的原因是我們改變了問題的本質。
All of those things matter, but that's not why.
我們把問題從
It's because we changed the nature of the problem.
明確指示電腦如何開車,
We changed the nature of the problem from one
改成對電腦說:
in which we tried to overtly and explicitly
「我給你大量的開車數據,
explain to the computer how to drive
你自個兒看著辦吧!」
to one in which we say,
你自己判斷出那是紅綠燈,
"Here's a lot of data around the vehicle.
而且現在亮紅燈,不是綠燈,
You figure it out.
表示你要停車,
You figure it out that that is a traffic light,
不能繼續開。」
that that traffic light is red and not green,
機器學習也是
that that means that you need to stop
我們許多網路活動的基礎:
and not go forward."
搜尋引擎、
Machine learning is at the basis
亞馬遜的個人化演算法、
of many of the things that we do online:
電腦翻譯、
search engines,
語音辨識系統。
Amazon's personalization algorithm,
研究專家近來研究
computer translation,
活組織切片檢查,
voice recognition systems.
癌組織切片,
Researchers recently have looked at
他們叫電腦自己判別,
the question of biopsies,
電腦分析數據和存活率,
cancerous biopsies,
判斷是否為癌症細胞。
and they've asked the computer to identify
果然,當你把數據丟給電腦,
by looking at the data and survival rates
透過一個機器學習的演算法,
to determine whether cells are actually
電腦真的能找出
cancerous or not,
12 大危險徵兆,
and sure enough, when you throw the data at it,
預測這個乳房癌細胞的切片
through a machine-learning algorithm,
真的就是癌腫瘤。
the machine was able to identify
問題來了:醫學文獻只知道
the 12 telltale signs that best predict
其中 9 項。
that this biopsy of the breast cancer cells
另外 3 項特性
are indeed cancerous.
是我們以前不需檢查的,
The problem: The medical literature
卻被電腦找出來了。
only knew nine of them.
好。
Three of the traits were ones
不過,大數據也有不好的一面。
that people didn't need to look for,
它會改善我們的生活,
but that the machine spotted.
但是也有我們必須注意的問題。
Now, there are dark sides to big data as well.
第一,
It will improve our lives, but there are problems
我們可能因為預測而受罰,
that we need to be conscious of,
警察可能會利用大數據來辦案,
and the first one is the idea
有點像電影《關鍵報告》。
that we may be punished for predictions,
這叫做「預測性警務」,
that the police may use big data for their purposes,
或「演算犯罪學」。
a little bit like "Minority Report."
原理是,我們蒐集大量數據,
Now, it's a term called predictive policing,
例如,分析過去犯罪發生地點的大數據,
or algorithmic criminology,
我們就知道要往哪裡派送警力。
and the idea is that if we take a lot of data,
這很合邏輯。但問題是,當然,
for example where past crimes have been,
這種策略不會只限犯罪地點的數據,
we know where to send the patrols.
而會一直延伸到個人資料。
That makes sense, but the problem, of course,
何不利用人們的
is that it's not simply going to stop on location data,
高中成績單?
it's going to go down to the level of the individual.
或許我們可以看看
Why don't we use data about the person's
他們是否失業、信用評等、
high school transcript?
上網瀏覽行為、
Maybe we should use the fact that
是否熬夜、
they're unemployed or not, their credit score,
Fitbit 智慧健康手環,當它能識別個人生化數據,
their web-surfing behavior,
可看出主人是否有攻擊性的想法。
whether they're up late at night.
可能有演算法會預測我們將要做什麼事,
Their Fitbit, when it's able to identify biochemistries,
可能還沒有付諸行動,就得負責。
will show that they have aggressive thoughts.
在小數據時代,最重要的挑戰是隱私。
We may have algorithms that are likely to predict
在大數據時代,
what we are about to do,
挑戰則變成保衛自由意志、
and we may be held accountable
道德選擇、人的意志、
before we've actually acted.
人的「能動性」(human agency)。
Privacy was the central challenge
還有一個問題:
in a small data era.
大數據會搶走我們的工作。
In the big data age,
大數據和演算法將會挑戰
the challenge will be safeguarding free will,
21 世紀的白領、專業知識工作,
moral choice, human volition,
就像工廠自動化和生產線
human agency.
在 20 世紀挑戰藍領工作者一樣。
There is another problem:
試想一位實驗室技術員,
Big data is going to steal our jobs.
他正在用顯微鏡看腫瘤切片,
Big data and algorithms are going to challenge
要判斷是否為癌細胞。
white collar, professional knowledge work
他唸過大學,
in the 21st century
買了房子,
in the same way that factory automation
會投票,
and the assembly line
他與社會利害相關。
challenged blue collar labor in the 20th century.
他的工作,及許多像他一樣的專業人士,
Think about a lab technician
將發現他們的工作起了劇變,
who is looking through a microscope
甚至完全被淘汰。
at a cancer biopsy
我們喜歡相信
and determining whether it's cancerous or not.
長遠來說,科技創造工作機會,
The person went to university.
即使剛開始會先經歷短暫的錯亂與重組,
The person buys property.
這對我們所處的工業革命時代來說,並沒有錯,
He or she votes.
因為事實的確如此。
He or she is a stakeholder in society.
但是這個分析遺漏了一點:
And that person's job,
有些工作類別其實已經消失,
as well as an entire fleet
且從未起死回生。
of professionals like that person,
如果你是一匹馬,那麼工業革命對你並不利。
is going to find that their jobs are radically changed
所以我們必須非常謹慎,
or actually completely eliminated.
正確駕馭大數據,調整它以適應我們所需,
Now, we like to think
滿足我們的人性需求。
that technology creates jobs over a period of time
我們必須成為這項科技的主人,
after a short, temporary period of dislocation,
而不是淪為它的奴隸。
and that is true for the frame of reference
大數據時代才正開始,
with which we all live, the Industrial Revolution,
老實說,我們並不是很擅長
because that's precisely what happened.
處理我們能蒐集的龐大數據資料。
But we forget something in that analysis:
這不只是國安局的問題,
There are some categories of jobs
企業也蒐集大量資料,同樣也誤用、濫用。
that simply get eliminated and never come back.
我們都必須學習怎麼正確運用,而這需要時間。
The Industrial Revolution wasn't very good
有點像原始人用火所面臨的挑戰。
if you were a horse.
大數據是個工具,
So we're going to need to be careful
如果運用失當,就會燒傷我們。
and take big data and adjust it for our needs,
大數據將改變我們如何生活、
our very human needs.
工作,和思考。
We have to be the master of this technology,
它可以幫助我們管理職涯,
not its servant.
讓我們過滿意、夢想的生活,
We are just at the outset of the big data era,
帶來快樂與健康。
and honestly, we are not very good
以往,我們常在看待「資訊科技」時,
at handling all the data that we can now collect.
只專注在「科技」,
It's not just a problem for the National Security Agency.
只重視硬體,
Businesses collect lots of data, and they misuse it too,
因為它具體可見。
and we need to get better at this, and this will take time.
現在我們必須重新對焦,
It's a little bit like the challenge that was faced
轉向「資訊」,
by primitive man and fire.
它比較不明顯,
This is a tool, but this is a tool that,
但是就某些方面來說,卻重要得多。
unless we're careful, will burn us.
人性總算可以向我們蒐集來的資訊學習,
Big data is going to transform how we live,
成為我們永恆追尋的一部份,
how we work and how we think.
藉此了解我們的世界,和人類的角色,
It is going to help us manage our careers
這是為什麼大數據將「大」有可為。
and lead lives of satisfaction and hope
(觀眾掌聲)
and happiness and health,
but in the past, we've often looked at information technology
and our eyes have only seen the T,
the technology, the hardware,
because that's what was physical.
We now need to recast our gaze at the I,
the information,
which is less apparent,
but in some ways a lot more important.
Humanity can finally learn from the information
that it can collect,
as part of our timeless quest
to understand the world and our place in it,
and that's why big data is a big deal.
(Applause)