Placeholder Image

字幕列表 影片播放

已審核 字幕已審核
  • America's favorite pie is?

    美國人最喜歡哪一種派?

  • Audience: Apple. Kenneth Cukier: Apple. Of course it is.

    觀眾:蘋果派。

  • How do we know it?

    講者:蘋果派,當然啦!

  • Because of data.

    我們怎麼知道?

  • You look at supermarket sales.

    因為有數據。

  • You look at supermarket sales of 30-centimeter pies

    我們分析超市銷售數據,

  • that are frozen, and apple wins, no contest.

    分析直徑 30 公分冷凍蘋果派的超市銷售數據,

  • The majority of the sales are apple.

    蘋果派最夯,銷量一面倒。

  • But then supermarkets started selling

    顧客幾乎都是買蘋果派。

  • smaller, 11-centimeter pies,

    但是後來,超市開始賣小派,

  • and suddenly, apple fell to fourth or fifth place.

    直徑 11 公分的派,

  • Why? What happened?

    突然,蘋果派銷量掉到第四、五名,

  • Okay, think about it.

    為什麼?發生了什麼事?

  • When you buy a 30-centimeter pie,

    好,你想想:

  • the whole family has to agree,

    如果是買 30 公分的大派,

  • and apple is everyone's second favorite.

    全家人都得同意,

  • (Laughter)

    而蘋果是全家每個人的第二選擇,

  • But when you buy an individual 11-centimeter pie,

    (觀眾笑聲)

  • you can buy the one that you want.

    但是當你分開買 11 公分的小派,

  • You can get your first choice.

    就可以買你自己想吃的,

  • You have more data.

    每個人都可以選自己最愛的口味。

  • You can see something

    這就會產生更多的數據。

  • that you couldn't see

    你會有新發現,

  • when you only had smaller amounts of it.

    看出數據少的時候,無法發現的現象。

  • Now, the point here is that more data

    現在,這個例子的重點是,

  • doesn't just let us see more,

    數據增加,不只是讓我們看見更「多」,

  • more of the same thing we were looking at.

    更多我們本來就已經知道的;

  • More data allows us to see new.

    數據增加,讓我們看見「新」資訊,

  • It allows us to see better.

    看得更「準確」,

  • It allows us to see different.

    看見「不同」。

  • In this case, it allows us to see

    在這個例子,它使我們看到

  • what America's favorite pie is:

    美國人真正最喜歡的派是什麼:

  • not apple.

    不是蘋果派。

  • Now, you probably all have heard the term big data.

    你們可能都聽過「大數據」這個詞,

  • In fact, you're probably sick of hearing the term

    其實,你們可能已經聽膩了。

  • big data.

    的確有很多大肆宣傳,

  • It is true that there is a lot of hype around the term,

    非常遺憾。

  • and that is very unfortunate,

    因為大數據是極為重要的工具,

  • because big data is an extremely important tool

    將會推動社會進步。

  • by which society is going to advance.

    過去,我們依賴少量數據,

  • In the past, we used to look at small data

    研究其含義,

  • and think about what it would mean

    試圖了解我們的世界。

  • to try to understand the world,

    現在我們有了更多數據,

  • and now we have a lot more of it,

    遠超過以往能力所及。

  • more than we ever could before.

    我們發現,

  • What we find is that when we have

    當我們擁有龐大的數據,

  • a large body of data, we can fundamentally do things

    就可以做過去數據較少時做不到的事。

  • that we couldn't do when we only had smaller amounts.

    大數據很重要,

  • Big data is important, and big data is new,

    大數據也很新。

  • and when you think about it,

    你想一想,

  • the only way this planet is going to deal

    唯一能幫助地球因應全球的挑戰:

  • with its global challenges

    解決饑荒、

  • to feed people, supply them with medical care,

    提供醫療、

  • supply them with energy, electricity,

    提供能源和電力、

  • and to make sure they're not burnt to a crisp

    確保我們不被全球暖化烤焦,

  • because of global warming

    唯一的方法,就是靠善用數據。

  • is because of the effective use of data.

    所以大數據有什麼稀奇?

  • So what is new about big data? What is the big deal?

    有什麼好「大」驚小怪?

  • Well, to answer that question, let's think about

    要回答這個問題,

  • what information looked like,

    讓我們先來看資訊以前長什麼樣子。

  • physically looked like in the past.

    好,

  • In 1908, on the island of Crete,

    1908 年,在克里特島,

  • archaeologists discovered a clay disc.

    考古學家發現一個泥土圓盤,

  • They dated it from 2000 B.C., so it's 4,000 years old.

    鑑定大約是公元前 2 千年製成的,

  • Now, there's inscriptions on this disc,

    所以已經有 4 千年之久。

  • but we actually don't know what it means.

    圓盤上刻有古文字,

  • It's a complete mystery, but the point is that

    但無法解讀,

  • this is what information used to look like

    是個謎團。

  • 4,000 years ago.

    但重點是,4 千年前資訊是這個樣貌,

  • This is how society stored

    古人是用這種方式儲存、傳遞資訊。

  • and transmitted information.

    到現在,社會並沒有進步那麼多,

  • Now, society hasn't advanced all that much.

    我們還是把資訊存在碟片上,

  • We still store information on discs,

    只是現在可以儲存更多資訊,

  • but now we can store a lot more information,

    空前的多。

  • more than ever before.

    搜尋更容易,複製更容易,

  • Searching it is easier. Copying it easier.

    分享更容易,處理更容易。

  • Sharing it is easier. Processing it is easier.

    我們可以重複使用這些資訊,

  • And what we can do is we can reuse this information

    用途之廣,超乎想像,

  • for uses that we never even imagined

    超乎我們蒐集資訊時的預期。

  • when we first collected the data.

    這樣看來,資訊已經

  • In this respect, the data has gone

    從「存料」 變成「流動」;

  • from a stock to a flow,

    從靜止、靜態的,

  • from something that is stationary and static

    變成流體、動態的。

  • to something that is fluid and dynamic.

    資訊可說是,有流動性。

  • There is, if you will, a liquidity to information.

    那個 4 千年之久的克里特圓盤,

  • The disc that was discovered off of Crete

    它很重,

  • that's 4,000 years old, is heavy,

    儲存的資訊量不多,

  • it doesn't store a lot of information,

    內容也不能更改。

  • and that information is unchangeable.

    相較之下,

  • By contrast, all of the files

    愛德華.史諾登盜走的所有檔案,

  • that Edward Snowden took

    就是他從美國國安局竊走的資料,

  • from the National Security Agency in the United States

    可以全部存在一個記憶卡,

  • fits on a memory stick

    體積只有指甲般的大小。

  • the size of a fingernail,

    並且可以用光速來傳輸分享。

  • and it can be shared at the speed of light.

    更多的數據!

  • More data. More.

    更多。

  • Now, one reason why we have so much data in the world today

    今天之所以有這麼多的數據,

  • is we are collecting things

    原因之一是我們正在蒐集過去

  • that we've always collected information on,

    儲存資訊的物體;

  • but another reason why is we're taking things

    原因之二是,

  • that have always been informational

    我們把一些經常很資訊性的東西——

  • but have never been rendered into a data format

    從未數據化的資訊,

  • and we are putting it into data.

    把它們變成數據,

  • Think, for example, the question of location.

    例如,地理位置。

  • Take, for example, Martin Luther.

    舉馬丁.路德為例,

  • If we wanted to know in the 1500s

    如果我們想知道十六世紀時,

  • where Martin Luther was,

    馬丁.路德去過哪些地方,

  • we would have to follow him at all times,

    我們必須隨時跟著他到處跑,

  • maybe with a feathery quill and an inkwell,

    可能還要帶著羽毛筆和墨水瓶,

  • and record it,

    隨時記錄。

  • but now think about what it looks like today.

    但是看看現在的做法,

  • You know that somewhere,

    你知道世界上某處,

  • probably in a telecommunications carrier's database,

    可能是電信商的資料庫裡面,

  • there is a spreadsheet or at least a database entry

    有一個試算表或至少有一筆記錄,

  • that records your information

    存著關於你的資訊,

  • of where you've been at all times.

    記錄你去過的所有地方。

  • If you have a cell phone,

    如果你有一支手機,

  • and that cell phone has GPS, but even if it doesn't have GPS,

    手機有 GPS,但就算沒有 GPS,

  • it can record your information.

    還是可以記錄你的資訊。

  • In this respect, location has been datafied.

    就這個角度來說,位置已經被數據化。

  • Now think, for example, of the issue of posture,

    現在再想想這個例子:姿勢,

  • the way that you are all sitting right now,

    就是你們現在的坐姿,

  • the way that you sit,

    你的坐姿、

  • the way that you sit, the way that you sit.

    你的坐姿,和你的坐姿,

  • It's all different, and it's a function of your leg length

    都不一樣,取決於你的腿長、

  • and your back and the contours of your back,

    你的背和背部輪廓。

  • and if I were to put sensors, maybe 100 sensors

    要是我現在裝 1 百個感應器,

  • into all of your chairs right now,

    到你們每個人的椅子上,

  • I could create an index that's fairly unique to you,

    我可以建出你個人獨特的索引資料,

  • sort of like a fingerprint, but it's not your finger.

    有點像指紋,但不是你的手指。

  • So what could we do with this?

    這有什麼用?

  • Researchers in Tokyo are using it

    東京的研究員用這種數據

  • as a potential anti-theft device in cars.

    來研發汽車防盜裝置。

  • The idea is that the carjacker sits behind the wheel,

    概念是,偷車賊坐在駕駛座,

  • tries to stream off, but the car recognizes

    急著開車逃逸,

  • that a non-approved driver is behind the wheel,

    但是車子辨識出開車的人未經授權,

  • and maybe the engine just stops, unless you

    引擎就自動熄火,

  • type in a password into the dashboard

    除非你輸入密碼到儀表板,

  • to say, "Hey, I have authorization to drive." Great.

    告訴系統:「嘿,我可是有經授權喔!」

  • What if every single car in Europe

    很好。

  • had this technology in it?

    若歐洲每輛汽車都有這個裝置呢?

  • What could we do then?

    那又能做什麼?

  • Maybe, if we aggregated the data,

    或許,我們可以聚集所有的數據,

  • maybe we could identify telltale signs

    或許能提早偵測到警訊,

  • that best predict that a car accident

    預測車禍

  • is going to take place in the next five seconds.

    即將在 5 秒鐘內發生。

  • And then what we will have datafied

    然後我們還可以數據化

  • is driver fatigue,

    駕駛員的疲勞狀態,

  • and the service would be when the car senses

    汽車系統可以偵測到

  • that the person slumps into that position,

    駕駛癱坐成某個姿勢,

  • automatically knows, hey, set an internal alarm

    自動感知,發出指令啟動響鈴,

  • that would vibrate the steering wheel, honk inside

    導致方向盤震動,

  • to say, "Hey, wake up,

    車內喇叭作響,大喊:「嘿,快醒來!

  • pay more attention to the road."

    注意路況!」

  • These are the sorts of things we can do

    這一類的事都可以做到,

  • when we datafy more aspects of our lives.

    當我們把更多的生活層面數據化。

  • So what is the value of big data?

    那麼,大數據究竟有什麼價值?

  • Well, think about it.

    想想看,

  • You have more information.

    現在有更多資訊,

  • You can do things that you couldn't do before.

    可以做過去不能做的事。

  • One of the most impressive areas

    這概念的應用當中,最驚人的領域之一,

  • where this concept is taking place

    就是「機器學習」。

  • is in the area of machine learning.

    機器學習是人工智慧的一個分支,

  • Machine learning is a branch of artificial intelligence,

    人工智慧又是電腦科學的分支。

  • which itself is a branch of computer science.

    基本概念是:

  • The general idea is that instead of

    不必告訴電腦要做什麼,

  • instructing a computer what to do,

    只要把數據輸入到問題裡,

  • we are going to simply throw data at the problem

    然後叫電腦自己想辦法。

  • and tell the computer to figure it out for itself.

    我們回顧一下源頭,就會比較容易了解。

  • And it will help you understand it

    1950 年代,IBM 有位電腦科學家

  • by seeing its origins.

    名叫亞瑟.山姆爾,很愛下跳棋,

  • In the 1950s, a computer scientist

    所以他寫了一個電腦程式,

  • at IBM named Arthur Samuel liked to play checkers,

    叫電腦跟他對打。

  • so he wrote a computer program

    他開始下棋,結果他贏了。

  • so he could play against the computer.

    他再開始下棋,結果他又贏了。

  • He played. He won.

    他再下,還是他贏。

  • He played. He won.

    因為電腦只會

  • He played. He won,

    棋步的規則。

  • because the computer only knew

    而亞瑟.山姆爾會得更多,

  • what a legal move was.

    他懂得策略。

  • Arthur Samuel knew something else.

    所以他又寫了一個副程式,

  • Arthur Samuel knew strategy.

    在背景執行,只做一件事:

  • So he wrote a small sub-program alongside it

    就是計算機率,

  • operating in the background, and all it did

    評估目前的棋局,

  • was score the probability

    比較贏棋和輸棋的機率,

  • that a given board configuration would likely lead

    每下一步棋,就重算一次。

  • to a winning board versus a losing board

    然後他又跟電腦對打,結果他贏。

  • after every move.

    再對打,還是他贏。

  • He plays the computer. He wins.

    再對打,還是他贏。

  • He plays the computer. He wins.

    然後亞瑟.山姆爾讓電腦自己對打。

  • He plays the computer. He wins.

    它就自己下棋,一邊收集數據。

  • And then Arthur Samuel leaves the computer

    越收集越多,它的預測準確度就提高。

  • to play itself.

    然後亞瑟.山姆爾再回來跟電腦對打。

  • It plays itself. It collects more data.

    他開始下棋,結果他輸了。

  • It collects more data. It increases the accuracy of its prediction.

    他又下,又輸了。

  • And then Arthur Samuel goes back to the computer

    再下,還是輸。

  • and he plays it, and he loses,

    亞瑟.山姆爾創造了一台機器,

  • and he plays it, and he loses,

    它的能力青出於藍,更甚於藍。

  • and he plays it, and he loses,

    而這種機器學習的概念,

  • and Arthur Samuel has created a machine

    現在到處可見。

  • that surpasses his ability in a task that he taught it.

    你想我們怎麼會有自動駕駛汽車?

  • And this idea of machine learning

    把全部交通規則都輸入到軟體,可以改善社會嗎?

  • is going everywhere.

    不是。

  • How do you think we have self-driving cars?

    因為記憶體更便宜嗎?不是。

  • Are we any better off as a society

    演算法變快了?不。

  • enshrining all the rules of the road into software?

    有更好的處理器?不。

  • No. Memory is cheaper. No.

    這些都很重要,但不是真正的原因。

  • Algorithms are faster. No. Processors are better. No.

    真正的原因是我們改變了問題的本質。

  • All of those things matter, but that's not why.

    我們把問題從

  • It's because we changed the nature of the problem.

    明確指示電腦如何開車,

  • We changed the nature of the problem from one

    改成對電腦說:

  • in which we tried to overtly and explicitly

    「我給你大量的開車數據,

  • explain to the computer how to drive

    你自個兒看著辦吧!」

  • to one in which we say,

    你自己判斷出那是紅綠燈,

  • "Here's a lot of data around the vehicle.

    而且現在亮紅燈,不是綠燈,

  • You figure it out.

    表示你要停車,

  • You figure it out that that is a traffic light,

    不能繼續開。」

  • that that traffic light is red and not green,

    機器學習也是

  • that that means that you need to stop

    我們許多網路活動的基礎:

  • and not go forward."

    搜尋引擎、

  • Machine learning is at the basis

    亞馬遜的個人化演算法、

  • of many of the things that we do online:

    電腦翻譯、

  • search engines,

    語音辨識系統。

  • Amazon's personalization algorithm,

    研究專家近來研究

  • computer translation,

    活組織切片檢查,

  • voice recognition systems.

    癌組織切片,

  • Researchers recently have looked at

    他們叫電腦自己判別,

  • the question of biopsies,

    電腦分析數據和存活率,

  • cancerous biopsies,

    判斷是否為癌症細胞。

  • and they've asked the computer to identify

    果然,當你把數據丟給電腦,

  • by looking at the data and survival rates

    透過一個機器學習的演算法,

  • to determine whether cells are actually

    電腦真的能找出

  • cancerous or not,

    12 大危險徵兆,

  • and sure enough, when you throw the data at it,

    預測這個乳房癌細胞的切片

  • through a machine-learning algorithm,

    真的就是癌腫瘤。

  • the machine was able to identify

    問題來了:醫學文獻只知道

  • the 12 telltale signs that best predict

    其中 9 項。

  • that this biopsy of the breast cancer cells

    另外 3 項特性

  • are indeed cancerous.

    是我們以前不需檢查的,

  • The problem: The medical literature

    卻被電腦找出來了。

  • only knew nine of them.

    好。

  • Three of the traits were ones

    不過,大數據也有不好的一面。

  • that people didn't need to look for,

    它會改善我們的生活,

  • but that the machine spotted.

    但是也有我們必須注意的問題。

  • Now, there are dark sides to big data as well.

    第一,

  • It will improve our lives, but there are problems

    我們可能因為預測而受罰,

  • that we need to be conscious of,

    警察可能會利用大數據來辦案,

  • and the first one is the idea

    有點像電影《關鍵報告》。

  • that we may be punished for predictions,

    這叫做「預測性警務」,

  • that the police may use big data for their purposes,

    或「演算犯罪學」。

  • a little bit like "Minority Report."

    原理是,我們蒐集大量數據,

  • Now, it's a term</