Placeholder Image

字幕列表 影片播放

已審核 字幕已審核
  • America's favorite pie is?

    美國人最喜歡哪一種派?

  • Audience: Apple. Kenneth Cukier: Apple. Of course it is.

    觀眾:蘋果派。

  • How do we know it?

    講者:蘋果派,當然啦!

  • Because of data.

    我們怎麼知道?

  • You look at supermarket sales.

    因為有數據。

  • You look at supermarket sales of 30-centimeter pies

    我們分析超市銷售數據,

  • that are frozen, and apple wins, no contest.

    分析直徑 30 公分冷凍蘋果派的超市銷售數據,

  • The majority of the sales are apple.

    蘋果派最夯,銷量一面倒。

  • But then supermarkets started selling

    顧客幾乎都是買蘋果派。

  • smaller, 11-centimeter pies,

    但是後來,超市開始賣小派,

  • and suddenly, apple fell to fourth or fifth place.

    直徑 11 公分的派,

  • Why? What happened?

    突然,蘋果派銷量掉到第四、五名,

  • Okay, think about it.

    為什麼?發生了什麼事?

  • When you buy a 30-centimeter pie,

    好,你想想:

  • the whole family has to agree,

    如果是買 30 公分的大派,

  • and apple is everyone's second favorite.

    全家人都得同意,

  • (Laughter)

    而蘋果是全家每個人的第二選擇,

  • But when you buy an individual 11-centimeter pie,

    (觀眾笑聲)

  • you can buy the one that you want.

    但是當你分開買 11 公分的小派,

  • You can get your first choice.

    就可以買你自己想吃的,

  • You have more data.

    每個人都可以選自己最愛的口味。

  • You can see something

    這就會產生更多的數據。

  • that you couldn't see

    你會有新發現,

  • when you only had smaller amounts of it.

    看出數據少的時候,無法發現的現象。

  • Now, the point here is that more data

    現在,這個例子的重點是,

  • doesn't just let us see more,

    數據增加,不只是讓我們看見更「多」,

  • more of the same thing we were looking at.

    更多我們本來就已經知道的;

  • More data allows us to see new.

    數據增加,讓我們看見「新」資訊,

  • It allows us to see better.

    看得更「準確」,

  • It allows us to see different.

    看見「不同」。

  • In this case, it allows us to see

    在這個例子,它使我們看到

  • what America's favorite pie is:

    美國人真正最喜歡的派是什麼:

  • not apple.

    不是蘋果派。

  • Now, you probably all have heard the term big data.

    你們可能都聽過「大數據」這個詞,

  • In fact, you're probably sick of hearing the term

    其實,你們可能已經聽膩了。

  • big data.

    的確有很多大肆宣傳,

  • It is true that there is a lot of hype around the term,

    非常遺憾。

  • and that is very unfortunate,

    因為大數據是極為重要的工具,

  • because big data is an extremely important tool

    將會推動社會進步。

  • by which society is going to advance.

    過去,我們依賴少量數據,

  • In the past, we used to look at small data

    研究其含義,

  • and think about what it would mean

    試圖了解我們的世界。

  • to try to understand the world,

    現在我們有了更多數據,

  • and now we have a lot more of it,

    遠超過以往能力所及。

  • more than we ever could before.

    我們發現,

  • What we find is that when we have

    當我們擁有龐大的數據,

  • a large body of data, we can fundamentally do things

    就可以做過去數據較少時做不到的事。

  • that we couldn't do when we only had smaller amounts.

    大數據很重要,

  • Big data is important, and big data is new,

    大數據也很新。

  • and when you think about it,

    你想一想,

  • the only way this planet is going to deal

    唯一能幫助地球因應全球的挑戰:

  • with its global challenges

    解決饑荒、

  • to feed people, supply them with medical care,

    提供醫療、

  • supply them with energy, electricity,

    提供能源和電力、

  • and to make sure they're not burnt to a crisp

    確保我們不被全球暖化烤焦,

  • because of global warming

    唯一的方法,就是靠善用數據。

  • is because of the effective use of data.

    所以大數據有什麼稀奇?

  • So what is new about big data? What is the big deal?

    有什麼好「大」驚小怪?

  • Well, to answer that question, let's think about

    要回答這個問題,

  • what information looked like,

    讓我們先來看資訊以前長什麼樣子。

  • physically looked like in the past.

    好,

  • In 1908, on the island of Crete,

    1908 年,在克里特島,

  • archaeologists discovered a clay disc.

    考古學家發現一個泥土圓盤,

  • They dated it from 2000 B.C., so it's 4,000 years old.

    鑑定大約是公元前 2 千年製成的,

  • Now, there's inscriptions on this disc,

    所以已經有 4 千年之久。

  • but we actually don't know what it means.

    圓盤上刻有古文字,

  • It's a complete mystery, but the point is that

    但無法解讀,

  • this is what information used to look like

    是個謎團。

  • 4,000 years ago.

    但重點是,4 千年前資訊是這個樣貌,

  • This is how society stored

    古人是用這種方式儲存、傳遞資訊。

  • and transmitted information.

    到現在,社會並沒有進步那麼多,

  • Now, society hasn't advanced all that much.

    我們還是把資訊存在碟片上,

  • We still store information on discs,

    只是現在可以儲存更多資訊,

  • but now we can store a lot more information,

    空前的多。

  • more than ever before.

    搜尋更容易,複製更容易,

  • Searching it is easier. Copying it easier.

    分享更容易,處理更容易。

  • Sharing it is easier. Processing it is easier.

    我們可以重複使用這些資訊,

  • And what we can do is we can reuse this information

    用途之廣,超乎想像,

  • for uses that we never even imagined

    超乎我們蒐集資訊時的預期。

  • when we first collected the data.

    這樣看來,資訊已經

  • In this respect, the data has gone

    從「存料」 變成「流動」;

  • from a stock to a flow,

    從靜止、靜態的,

  • from something that is stationary and static

    變成流體、動態的。

  • to something that is fluid and dynamic.

    資訊可說是,有流動性。

  • There is, if you will, a liquidity to information.

    那個 4 千年之久的克里特圓盤,

  • The disc that was discovered off of Crete

    它很重,

  • that's 4,000 years old, is heavy,

    儲存的資訊量不多,

  • it doesn't store a lot of information,

    內容也不能更改。

  • and that information is unchangeable.

    相較之下,

  • By contrast, all of the files

    愛德華.史諾登盜走的所有檔案,

  • that Edward Snowden took

    就是他從美國國安局竊走的資料,

  • from the National Security Agency in the United States

    可以全部存在一個記憶卡,

  • fits on a memory stick

    體積只有指甲般的大小。

  • the size of a fingernail,

    並且可以用光速來傳輸分享。

  • and it can be shared at the speed of light.

    更多的數據!

  • More data. More.

    更多。

  • Now, one reason why we have so much data in the world today

    今天之所以有這麼多的數據,

  • is we are collecting things

    原因之一是我們正在蒐集過去

  • that we've always collected information on,

    儲存資訊的物體;

  • but another reason why is we're taking things

    原因之二是,

  • that have always been informational

    我們把一些經常很資訊性的東西——

  • but have never been rendered into a data format

    從未數據化的資訊,

  • and we are putting it into data.

    把它們變成數據,

  • Think, for example, the question of location.

    例如,地理位置。

  • Take, for example, Martin Luther.

    舉馬丁.路德為例,

  • If we wanted to know in the 1500s

    如果我們想知道十六世紀時,

  • where Martin Luther was,

    馬丁.路德去過哪些地方,

  • we would have to follow him at all times,

    我們必須隨時跟著他到處跑,

  • maybe with a feathery quill and an inkwell,

    可能還要帶著羽毛筆和墨水瓶,

  • and record it,

    隨時記錄。

  • but now think about what it looks like today.

    但是看看現在的做法,

  • You know that somewhere,

    你知道世界上某處,

  • probably in a telecommunications carrier's database,

    可能是電信商的資料庫裡面,

  • there is a spreadsheet or at least a database entry

    有一個試算表或至少有一筆記錄,

  • that records your information

    存著關於你的資訊,

  • of where you've been at all times.

    記錄你去過的所有地方。

  • If you have a cell phone,

    如果你有一支手機,

  • and that cell phone has GPS, but even if it doesn't have GPS,

    手機有 GPS,但就算沒有 GPS,

  • it can record your information.

    還是可以記錄你的資訊。

  • In this respect, location has been datafied.

    就這個角度來說,位置已經被數據化。

  • Now think, for example, of the issue of posture,

    現在再想想這個例子:姿勢,

  • the way that you are all sitting right now,

    就是你們現在的坐姿,

  • the way that you sit,

    你的坐姿、

  • the way that you sit, the way that you sit.

    你的坐姿,和你的坐姿,

  • It's all different, and it's a function of your leg length

    都不一樣,取決於你的腿長、

  • and your back and the contours of your back,

    你的背和背部輪廓。

  • and if I were to put sensors, maybe 100 sensors

    要是我現在裝 1 百個感應器,

  • into all of your chairs right now,

    到你們每個人的椅子上,

  • I could create an index that's fairly unique to you,

    我可以建出你個人獨特的索引資料,

  • sort of like a fingerprint, but it's not your finger.

    有點像指紋,但不是你的手指。

  • So what could we do with this?

    這有什麼用?

  • Researchers in Tokyo are using it

    東京的研究員用這種數據

  • as a potential anti-theft device in cars.

    來研發汽車防盜裝置。

  • The idea is that the carjacker sits behind the wheel,

    概念是,偷車賊坐在駕駛座,

  • tries to stream off, but the car recognizes

    急著開車逃逸,

  • that a non-approved driver is behind the wheel,

    但是車子辨識出開車的人未經授權,

  • and maybe the engine just stops, unless you

    引擎就自動熄火,

  • type in a password into the dashboard

    除非你輸入密碼到儀表板,

  • to say, "Hey, I have authorization to drive." Great.

    告訴系統:「嘿,我可是有經授權喔!」

  • What if every single car in Europe

    很好。

  • had this technology in it?

    若歐洲每輛汽車都有這個裝置呢?

  • What could we do then?

    那又能做什麼?

  • Maybe, if we aggregated the data,

    或許,我們可以聚集所有的數據,

  • maybe we could identify telltale signs

    或許能提早偵測到警訊,

  • that best predict that a car accident

    預測車禍

  • is going to take place in the next five seconds.

    即將在 5 秒鐘內發生。

  • And then what we will have datafied

    然後我們還可以數據化

  • is driver fatigue,

    駕駛員的疲勞狀態,

  • and the service would be when the car senses

    汽車系統可以偵測到

  • that the person slumps into that position,

    駕駛癱坐成某個姿勢,

  • automatically knows, hey, set an internal alarm

    自動感知,發出指令啟動響鈴,

  • that would vibrate the steering wheel, honk inside

    導致方向盤震動,

  • to say, "Hey, wake up,

    車內喇叭作響,大喊:「嘿,快醒來!

  • pay more attention to the road."

    注意路況!」

  • These are the sorts of things we can do

    這一類的事都可以做到,

  • when we datafy more aspects of our lives.

    當我們把更多的生活層面數據化。

  • So what is the value of big data?

    那麼,大數據究竟有什麼價值?

  • Well, think about it.

    想想看,

  • You have more information.

    現在有更多資訊,

  • You can do things that you couldn't do before.

    可以做過去不能做的事。

  • One of the most impressive areas

    這概念的應用當中,最驚人的領域之一,

  • where this concept is taking place

    就是「機器學習」。

  • is in the area of machine learning.

    機器學習是人工智慧的一個分支,

  • Machine learning is a branch of artificial intelligence,

    人工智慧又是電腦科學的分支。

  • which itself is a branch of computer science.

    基本概念是:

  • The general idea is that instead of

    不必告訴電腦要做什麼,

  • instructing a computer what to do,

    只要把數據輸入到問題裡,

  • we are going to simply throw data at the problem

    然後叫電腦自己想辦法。

  • and tell the computer to figure it out for itself.

    我們回顧一下源頭,就會比較容易了解。

  • And it will help you understand it

    1950 年代,IBM 有位電腦科學家

  • by seeing its origins.

    名叫亞瑟.山姆爾,很愛下跳棋,

  • In the 1950s, a computer scientist

    所以他寫了一個電腦程式,

  • at IBM named Arthur Samuel liked to play checkers,

    叫電腦跟他對打。

  • so he wrote a computer program

    他開始下棋,結果他贏了。

  • so he could play against the computer.

    他再開始下棋,結果他又贏了。

  • He played. He won.

    他再下,還是他贏。

  • He played. He won.

    因為電腦只會

  • He played. He won,

    棋步的規則。

  • because the computer only knew

    而亞瑟.山姆爾會得更多,

  • what a legal move was.

    他懂得策略。

  • Arthur Samuel knew something else.

    所以他又寫了一個副程式,

  • Arthur Samuel knew strategy.

    在背景執行,只做一件事:

  • So he wrote a small sub-program alongside it

    就是計算機率,

  • operating in the background, and all it did

    評估目前的棋局,

  • was score the probability

    比較贏棋和輸棋的機率,

  • that a given board configuration would likely lead

    每下一步棋,就重算一次。

  • to a winning board versus a losing board

    然後他又跟電腦對打,結果他贏。

  • after every move.

    再對打,還是他贏。

  • He plays the computer. He wins.

    再對打,還是他贏。

  • He plays the computer. He wins.

    然後亞瑟.山姆爾讓電腦自己對打。

  • He plays the computer. He wins.

    它就自己下棋,一邊收集數據。

  • And then Arthur Samuel leaves the computer

    越收集越多,它的預測準確度就提高。

  • to play itself.

    然後亞瑟.山姆爾再回來跟電腦對打。

  • It plays itself. It collects more data.

    他開始下棋,結果他輸了。

  • It collects more data. It increases the accuracy of its prediction.

    他又下,又輸了。

  • And then Arthur Samuel goes back to the computer

    再下,還是輸。

  • and he plays it, and he loses,

    亞瑟.山姆爾創造了一台機器,

  • and he plays it, and he loses,

    它的能力青出於藍,更甚於藍。

  • and he plays it, and he loses,

    而這種機器學習的概念,

  • and Arthur Samuel has created a machine

    現在到處可見。

  • that surpasses his ability in a task that he taught it.

    你想我們怎麼會有自動駕駛汽車?

  • And this idea of machine learning

    把全部交通規則都輸入到軟體,可以改善社會嗎?

  • is going everywhere.

    不是。

  • How do you think we have self-driving cars?

    因為記憶體更便宜嗎?不是。

  • Are we any better off as a society

    演算法變快了?不。

  • enshrining all the rules of the road into software?

    有更好的處理器?不。

  • No. Memory is cheaper. No.

    這些都很重要,但不是真正的原因。

  • Algorithms are faster. No. Processors are better. No.

    真正的原因是我們改變了問題的本質。

  • All of those things matter, but that's not why.

    我們把問題從

  • It's because we changed the nature of the problem.

    明確指示電腦如何開車,

  • We changed the nature of the problem from one

    改成對電腦說:

  • in which we tried to overtly and explicitly

    「我給你大量的開車數據,

  • explain to the computer how to drive

    你自個兒看著辦吧!」

  • to one in which we say,

    你自己判斷出那是紅綠燈,

  • "Here's a lot of data around the vehicle.

    而且現在亮紅燈,不是綠燈,

  • You figure it out.

    表示你要停車,

  • You figure it out that that is a traffic light,

    不能繼續開。」

  • that that traffic light is red and not green,

    機器學習也是

  • that that means that you need to stop

    我們許多網路活動的基礎:

  • and not go forward."

    搜尋引擎、

  • Machine learning is at the basis

    亞馬遜的個人化演算法、

  • of many of the things that we do online:

    電腦翻譯、

  • search engines,

    語音辨識系統。

  • Amazon's personalization algorithm,

    研究專家近來研究

  • computer translation,

    活組織切片檢查,

  • voice recognition systems.

    癌組織切片,

  • Researchers recently have looked at

    他們叫電腦自己判別,

  • the question of biopsies,

    電腦分析數據和存活率,

  • cancerous biopsies,

    判斷是否為癌症細胞。

  • and they've asked the computer to identify

    果然,當你把數據丟給電腦,

  • by looking at the data and survival rates

    透過一個機器學習的演算法,

  • to determine whether cells are actually

    電腦真的能找出

  • cancerous or not,

    12 大危險徵兆,

  • and sure enough, when you throw the data at it,

    預測這個乳房癌細胞的切片

  • through a machine-learning algorithm,

    真的就是癌腫瘤。

  • the machine was able to identify

    問題來了:醫學文獻只知道

  • the 12 telltale signs that best predict

    其中 9 項。

  • that this biopsy of the breast cancer cells

    另外 3 項特性

  • are indeed cancerous.

    是我們以前不需檢查的,

  • The problem: The medical literature

    卻被電腦找出來了。

  • only knew nine of them.

    好。

  • Three of the traits were ones

    不過,大數據也有不好的一面。

  • that people didn't need to look for,

    它會改善我們的生活,

  • but that the machine spotted.

    但是也有我們必須注意的問題。

  • Now, there are dark sides to big data as well.

    第一,

  • It will improve our lives, but there are problems

    我們可能因為預測而受罰,

  • that we need to be conscious of,

    警察可能會利用大數據來辦案,

  • and the first one is the idea

    有點像電影《關鍵報告》。

  • that we may be punished for predictions,

    這叫做「預測性警務」,

  • that the police may use big data for their purposes,

    或「演算犯罪學」。

  • a little bit like "Minority Report."

    原理是,我們蒐集大量數據,

  • Now, it's a term called predictive policing,

    例如,分析過去犯罪發生地點的大數據,

  • or algorithmic criminology,

    我們就知道要往哪裡派送警力。

  • and the idea is that if we take a lot of data,

    這很合邏輯。但問題是,當然,

  • for example where past crimes have been,

    這種策略不會只限犯罪地點的數據,

  • we know where to send the patrols.

    而會一直延伸到個人資料。

  • That makes sense, but the problem, of course,

    何不利用人們的

  • is that it's not simply going to stop on location data,

    高中成績單?

  • it's going to go down to the level of the individual.

    或許我們可以看看

  • Why don't we use data about the person's

    他們是否失業、信用評等、

  • high school transcript?

    上網瀏覽行為、

  • Maybe we should use the fact that

    是否熬夜、

  • they're unemployed or not, their credit score,

    Fitbit 智慧健康手環,當它能識別個人生化數據,

  • their web-surfing behavior,

    可看出主人是否有攻擊性的想法。

  • whether they're up late at night.

    可能有演算法會預測我們將要做什麼事,

  • Their Fitbit, when it's able to identify biochemistries,

    可能還沒有付諸行動,就得負責。

  • will show that they have aggressive thoughts.

    在小數據時代,最重要的挑戰是隱私。

  • We may have algorithms that are likely to predict

    在大數據時代,

  • what we are about to do,

    挑戰則變成保衛自由意志、

  • and we may be held accountable

    道德選擇、人的意志、

  • before we've actually acted.

    人的「能動性」(human agency)。

  • Privacy was the central challenge

    還有一個問題:

  • in a small data era.

    大數據會搶走我們的工作。

  • In the big data age,

    大數據和演算法將會挑戰

  • the challenge will be safeguarding free will,

    21 世紀的白領、專業知識工作,

  • moral choice, human volition,

    就像工廠自動化和生產線

  • human agency.

    在 20 世紀挑戰藍領工作者一樣。

  • There is another problem:

    試想一位實驗室技術員,

  • Big data is going to steal our jobs.

    他正在用顯微鏡看腫瘤切片,

  • Big data and algorithms are going to challenge

    要判斷是否為癌細胞。

  • white collar, professional knowledge work

    他唸過大學,

  • in the 21st century

    買了房子,

  • in the same way that factory automation

    會投票,

  • and the assembly line

    他與社會利害相關。

  • challenged blue collar labor in the 20th century.

    他的工作,及許多像他一樣的專業人士,

  • Think about a lab technician

    將發現他們的工作起了劇變,

  • who is looking through a microscope

    甚至完全被淘汰。

  • at a cancer biopsy

    我們喜歡相信

  • and determining whether it's cancerous or not.

    長遠來說,科技創造工作機會,

  • The person went to university.

    即使剛開始會先經歷短暫的錯亂與重組,

  • The person buys property.

    這對我們所處的工業革命時代來說,並沒有錯,

  • He or she votes.

    因為事實的確如此。

  • He or she is a stakeholder in society.

    但是這個分析遺漏了一點:

  • And that person's job,

    有些工作類別其實已經消失,

  • as well as an entire fleet

    且從未起死回生。

  • of professionals like that person,

    如果你是一匹馬,那麼工業革命對你並不利。

  • is going to find that their jobs are radically changed

    所以我們必須非常謹慎,

  • or actually completely eliminated.

    正確駕馭大數據,調整它以適應我們所需,

  • Now, we like to think

    滿足我們的人性需求。

  • that technology creates jobs over a period of time

    我們必須成為這項科技的主人,

  • after a short, temporary period of dislocation,

    而不是淪為它的奴隸。

  • and that is true for the frame of reference

    大數據時代才正開始,

  • with which we all live, the Industrial Revolution,

    老實說,我們並不是很擅長

  • because that's precisely what happened.

    處理我們能蒐集的龐大數據資料。

  • But we forget something in that analysis:

    這不只是國安局的問題,

  • There are some categories of jobs

    企業也蒐集大量資料,同樣也誤用、濫用。

  • that simply get eliminated and never come back.

    我們都必須學習怎麼正確運用,而這需要時間。

  • The Industrial Revolution wasn't very good

    有點像原始人用火所面臨的挑戰。

  • if you were a horse.

    大數據是個工具,

  • So we're going to need to be careful

    如果運用失當,就會燒傷我們。

  • and take big data and adjust it for our needs,

    大數據將改變我們如何生活、

  • our very human needs.

    工作,和思考。

  • We have to be the master of this technology,

    它可以幫助我們管理職涯,

  • not its servant.

    讓我們過滿意、夢想的生活,

  • We are just at the outset of the big data era,

    帶來快樂與健康。

  • and honestly, we are not very good

    以往,我們常在看待「資訊科技」時,

  • at handling all the data that we can now collect.

    只專注在「科技」,

  • It's not just a problem for the National Security Agency.

    只重視硬體,

  • Businesses collect lots of data, and they misuse it too,

    因為它具體可見。

  • and we need to get better at this, and this will take time.

    現在我們必須重新對焦,

  • It's a little bit like the challenge that was faced

    轉向「資訊」,

  • by primitive man and fire.

    它比較不明顯,

  • This is a tool, but this is a tool that,

    但是就某些方面來說,卻重要得多。

  • unless we're careful, will burn us.

    人性總算可以向我們蒐集來的資訊學習,

  • Big data is going to transform how we live,

    成為我們永恆追尋的一部份,

  • how we work and how we think.

    藉此了解我們的世界,和人類的角色,

  • It is going to help us manage our careers

    這是為什麼大數據將「大」有可為。

  • and lead lives of satisfaction and hope

    (觀眾掌聲)

  • and happiness and health,

  • but in the past, we've often looked at information technology

  • and our eyes have only seen the T,

  • the technology, the hardware,

  • because that's what was physical.

  • We now need to recast our gaze at the I,

  • the information,

  • which is less apparent,

  • but in some ways a lot more important.

  • Humanity can finally learn from the information

  • that it can collect,

  • as part of our timeless quest

  • to understand the world and our place in it,

  • and that's why big data is a big deal.

  • (Applause)

America's favorite pie is?

美國人最喜歡哪一種派?

字幕與單字
已審核 字幕已審核

單字即點即查 點擊單字可以查詢單字解釋