字幕列表 影片播放
I'm going to be talking about statistics today.
譯者: 易帆 余 審譯者: Wilde Luo
If that makes you immediately feel a little bit wary, that's OK,
今天我要來談談統計。
that doesn't make you some kind of crazy conspiracy theorist,
如果讓你感覺到 一點點的焦慮,沒關係,
it makes you skeptical.
這場演講不會讓你變成 瘋狂的陰謀論者,
And when it comes to numbers, especially now, you should be skeptical.
它能讓你學會懷疑。
But you should also be able to tell which numbers are reliable
一提到數據,特別是現在, 你更要懷疑。
and which ones aren't.
但你也必須要有能力 判讀哪些數據是可靠的,
So today I want to try to give you some tools to be able to do that.
哪些是不可靠的。
But before I do,
所以我今天要教大家 一些判斷的工具。
I just want to clarify which numbers I'm talking about here.
但在這之前,
I'm not talking about claims like,
我想要先說明 我所談論的是哪一種數據。
"9 out of 10 women recommend this anti-aging cream."
我並不是要談類似這樣的數據:
I think a lot of us always roll our eyes at numbers like that.
「十位女性當中有九位 會推薦這款抗老化乳液」
What's different now is people are questioning statistics like,
我們很多人聽到那樣的說法 會不相信而翻眼珠。
"The US unemployment rate is five percent."
但是我現在要談的, 是人們會質疑的一些統計數據,
What makes this claim different is it doesn't come from a private company,
例如「美國的失業率是 5% 」。
it comes from the government.
兩者的差異在於後者這宣稱 (失業率)並非來自私人企業,
About 4 out of 10 Americans distrust the economic data
而是來自政府機構。
that gets reported by government.
實際上,如今每十個美國人當中
Among supporters of President Trump it's even higher;
就有四個人根本不相信 政府公布的經濟數據。
it's about 7 out of 10.
而川普總統的支持者當中, 不相信的比例更高,
I don't need to tell anyone here
大約十個人裡面會有七個。
that there are a lot of dividing lines in our society right now,
我並不想在這裡解釋
and a lot of them start to make sense,
在目前社會中的許多分界線;
once you understand people's relationships with these government numbers.
一旦你了解政府公佈的數據 與民眾之間的關係,
On the one hand, there are those who say these statistics are crucial,
這些分界線就開始變得有意義了。
that we need them to make sense of society as a whole
一方面,有些人認為 這些數據是至關重要的,
in order to move beyond emotional anecdotes
這些數據能讓我們 瞭解整個社會的狀況,
and measure progress in an [objective] way.
為了就是要避免 各種情感上的糾葛,
And then there are the others,
並且以客觀的方式 衡量政策的發展。
who say that these statistics are elitist,
另外一群人則認為,
maybe even rigged;
這些統計數據 都是來自菁英份子,
they don't make sense and they don't really reflect
甚至可能是受到操縱的;
what's happening in people's everyday lives.
這些數據沒有意義, 而且根本無法真正反映
It kind of feels like that second group is winning the argument right now.
一般民眾的日常生活狀況。
We're living in a world of alternative facts,
目前看來,主張第二種觀點的人 似乎是對的。
where people don't find statistics this kind of common ground,
我們生活的世界中 胡說八道已成常態,
this starting point for debate.
民眾對這些數據沒有基本共識,
This is a problem.
也不會把這些數據 視為爭論時的基準點。
There are actually moves in the US right now
這會是個問題。
to get rid of some government statistics altogether.
實際上,目前有一股風潮 正在席捲美國,
Right now there's a bill in congress about measuring racial inequality.
他們認為應該要全面擺脫 政府統計數據的束縛。
The draft law says that government money should not be used
目前國會正在審查一項有關 評估種族不平等的法案。
to collect data on racial segregation.
草案中主張, 政府不應該把經費運用於
This is a total disaster.
收集各種有關種族隔離的資料上。
If we don't have this data,
這簡直是一場災難。
how can we observe discrimination,
如果我們缺乏這樣的資料,
let alone fix it?
我們要如何觀察種族歧視現象?
In other words:
更不用提要如何修正它?
How can a government create fair policies
換句話說:
if they can't measure current levels of unfairness?
如果政府無法衡量 目前不公的程度,
This isn't just about discrimination,
他們要如何制訂公平的政策?
it's everything -- think about it.
這也不只是攸關歧視的問題,
How can we legislate on health care
也會牽扯到所有的事情,各位想想:
if we don't have good data on health or poverty?
如果我們沒有 健康或貧困的正確數據,
How can we have public debate about immigration
我們要如何制訂 衛生保健的相關法案?
if we can't at least agree
如果我們連有多少人正要移入、 遷出我們的國家,
on how many people are entering and leaving the country?
都缺乏一致的共識,
Statistics come from the state; that's where they got their name.
我們要如何對於移民政策 進行公開的辯論?
The point was to better measure the population
統計(Statistics) 這個字, 就是源自於國家事務(State)。
in order to better serve it.
重點是,要更精確地 測量人口的分布,
So we need these government numbers,
才能為社會大眾提供更好的服務。
but we also have to move beyond either blindly accepting
所以我們需要政府的數據,
or blindly rejecting them.
但我們也需要摒除全盤接受
We need to learn the skills to be able to spot bad statistics.
或是全盤否定的迷思。
I started to learn some of these
我們需要學會 辨識劣質統計數據的方法。
when I was working in a statistical department
當我在聯合國的統計部門工作時,
that's part of the United Nations.
我開始學會了一些辨識的技巧。
Our job was to find out how many Iraqis had been forced from their homes
我們的工作是要了解 有多少伊拉克人民
as a result of the war,
因為戰爭而被迫離開家鄉,
and what they needed.
並且了解他們的需求。
It was really important work, but it was also incredibly difficult.
這是很重要的工作, 但也非常困難。
Every single day, we were making decisions
我們每天所作的決策,
that affected the accuracy of our numbers --
都會影響數據的準確性,
decisions like which parts of the country we should go to,
像是我們應該要前往 這個國家的哪些地區、
who we should speak to,
我們要與誰談話、
which questions we should ask.
應該問哪些問題...等等。
And I started to feel really disillusioned with our work,
但我對於工作的幻想 很快就破滅了,
because we thought we were doing a really good job,
因為我們自認這項工作很有意義,
but the one group of people who could really tell us were the Iraqis,
但是能夠告訴我們 真實情況的伊拉克民眾,
and they rarely got the chance to find our analysis, let alone question it.
他們根本沒機會看到我們的分析, 更別說是提出質疑了。
So I started to feel really determined
所以我愈來愈確信,
that the one way to make numbers more accurate
要讓數據更為準確的方法,
is to have as many people as possible be able to question them.
就是盡量讓更多人對數據提出質疑。
So I became a data journalist.
所以我變成一位數據記者。
My job is finding these data sets and sharing them with the public.
我的工作就是找到這些資料, 並且公開分享給社會大眾。
Anyone can do this, you don't have to be a geek or a nerd.
任何人都能做得到, 你不需要是個技術極客或是怪咖。
You can ignore those words; they're used by people
你不用理會這些名詞;
trying to say they're smart while pretending they're humble.
這是某些人想要表現聰明, 卻假裝謙虛時所用的字眼。
Absolutely anyone can do this.
任何人絕對都可以做到。
I want to give you guys three questions
所以我想給各位三個問題,
that will help you be able to spot some bad statistics.
它們可以幫助你辨識出 劣質的統計數據。
So, question number one is: Can you see uncertainty?
問題一: 你是否能看出數據的不確定性?
One of things that's really changed people's relationship with numbers,
有件事真正會改變 民眾與數據的關係,
and even their trust in the media,
甚至改變對媒體的信任,
has been the use of political polls.
其中一個方式就是 對選舉民調的濫用。
I personally have a lot of issues with political polls
我個人對選舉民調的 報導方式很有意見,
because I think the role of journalists is actually to report the facts
因為我認為記者扮演的角色, 就只是報導事實,
and not attempt to predict them,
而不是嘗試著預測結果,
especially when those predictions can actually damage democracy
特別是那些會傷害民主 的選舉預測,
by signaling to people: don't bother to vote for that guy,
像是暗示選民說: 別再費心給那個傢伙投票了,
he doesn't have a chance.
他根本沒機會當選。
Let's set that aside for now and talk about the accuracy of this endeavor.
我們把這個話題擺一邊, 先來談談這樣做的效果如何。
Based on national elections in the UK, Italy, Israel
根據幾個國家的選舉, 像是英國、義大利、以色列,
and of course, the most recent US presidential election,
當然還有最近的美國總統大選,
using polls to predict electoral outcomes
可以看到運用民調來預測選舉結果,
is about as accurate as using the moon to predict hospital admissions.
準確度就像觀測天象來預測 是否應該住院,同樣的不可靠。
No, seriously, I used actual data from an academic study to draw this.
說真的,我用了一份學術研究報告 的真實資料,畫出這張圖。
There are a lot of reasons why polling has become so inaccurate.
民調變得不準確,有很多原因。
Our societies have become really diverse,
我們的社會已經變得相當多元化,
which makes it difficult for pollsters to get a really nice representative sample
讓從事民意調查的人很難挑選出
of the population for their polls.
真正能代表選民意願的樣本。
People are really reluctant to answer their phones to pollsters,
人們已經很厭倦回答民調電話,
and also, shockingly enough, people might lie.
而且令人震驚的是, 受訪者還可能會說謊。
But you wouldn't necessarily know that to look at the media.
但是你在媒體報導中 不會知道這些事情。
For one thing, the probability of a Hillary Clinton win
例如希拉蕊·柯林頓 贏得選舉的機率,
was communicated with decimal places.
竟然可以精確到小數點?
We don't use decimal places to describe the temperature.
我們描述氣溫都不會這麽精確。
How on earth can predicting the behavior of 230 million voters in this country
所以怎麼可能對於全國 二億三千萬選民的行為,
be that precise?
能夠做出如此精確的預測?
And then there were those sleek charts.
還有一些看似井然有條的圖表,
See, a lot of data visualizations will overstate certainty, and it works --
各位知道嗎,有許多的視覺化設計,
these charts can numb our brains to criticism.
會誇大資料的準確性,而且很有效。
When you hear a statistic, you might feel skeptical.
這些圖表會麻痺我們的大腦, 讓我們無法做出判斷。
As soon as it's buried in a chart,
當你聽到一個統計數據, 你可能會覺得懷疑。
it feels like some kind of objective science,
但是當數據變成了圖表,
and it's not.
看起來就成為客觀的科學調查結果,
So I was trying to find ways to better communicate this to people,
但實際上並非如此。
to show people the uncertainty in our numbers.
所以,我試著找出一些方法, 清楚地告訴大家這些事,
What I did was I started taking real data sets,
讓大家知道數據本身的不確定性。
and turning them into hand-drawn visualizations,
而我所做的,就是把這些數據
so that people can see how imprecise the data is;
用手繪的視覺化設計來呈現,
so people can see that a human did this,
好讓人們可以看到 資料是如此的不精確;
a human found the data and visualized it.
所以大家會看到, 有人作了這個調查,
For example, instead of finding out the probability
然後有人找到這些數據, 並且將它視覺化。
of getting the flu in any given month,
舉個例子,
you can see the rough distribution of flu season.
我們不去找出每個月 民眾患流行性感冒的機率,
This is --
而是得到整個流感季節 的大致分布情形。
(Laughter)
就是這一張圖。
a bad shot to show in February.
(笑聲)
But it's also more responsible data visualization,
正值二月,這數據真不適時宜。
because if you were to show the exact probabilities,
但這樣的視覺化呈現方式 是比較可靠的,
maybe that would encourage people to get their flu jabs
因為如果你是用精確的機率來呈現,
at the wrong time.
也許會誤導民眾
The point of these shaky lines
在錯誤的時間注射疫苗。
is so that people remember these imprecisions,
重點是這些歪七扭八的線條,
but also so they don't necessarily walk away with a specific number,
能讓人們記得「數據的不精確性」,
but they can remember important facts.
人們不應該滿足於 一個鷄肋的數字,
Facts like injustice and inequality leave a huge mark on our lives.
而是要能夠記得重要的事實。
Facts like Black Americans and Native Americans have shorter life expectancies
有些不正義和不公平的事實, 在我們生活中造成了巨大的影響。
than those of other races,
像是美國黑人及原住民的預期壽命
and that isn't changing anytime soon.
比其他族群來的短,
Facts like prisoners in the US can be kept in solitary confinement cells
而且這是短時間內難以改變的事實。
that are smaller than the size of an average parking space.
還有像是美國監獄中, 囚犯的個人牢房空間
The point of these visualizations is also to remind people
比一般停車位的平均面積 還要小的事實。
of some really important statistical concepts,
這些視覺化圖像的重點 就是為了要提醒大家,
concepts like averages.
關注一些真正重要的統計概念,
So let's say you hear a claim like,
像是關於「平均數」的概念。
"The average swimming pool in the US contains 6.23 fecal accidents."
例如你聽到有人說:
That doesn't mean every single swimming pool in the country
「在美國,每座游泳池裡面 平均有 6.23 次大便」。
contains exactly 6.23 turds.
它的意思不是說,每一座游泳池
So in order to show that,
都有剛剛好 6.23 次大便。
I went back to the original data, which comes from the CDC,
為了說明這件事,
who surveyed 47 swimming facilities.
我找到疾病管制局的原始資料,
And I just spent one evening redistributing poop.
他們總共調查了47 座游泳池。
So you can kind of see how misleading averages can be.
我花了一個晚上「重新分配大便」。
(Laughter)
所以你就可以看出, 平均數如何地誤導大家。
OK, so the second question that you guys should be asking yourselves
(笑聲)
to spot bad numbers is:
好,第二個辨識 劣質統計數據的方法,
Can I see myself in the data?
就是你要問自己:
This question is also about averages in a way,
我自己的情況體現在這份數據內嗎?
because part of the reason why people are so frustrated
這個問題也與平均數有關,
with these national statistics,
因為民眾會對於國家的統計數據
is they don't really tell the story of who's winning and who's losing
產生失望的一部份原因,
from national policy.
是因為在國家的政策中,
It's easy to understand why people are frustrated with global averages
他們無法完全地看出 誰是贏家、誰是輸家。
when they don't match up with their personal experiences.
很容易理解, 為什麼當全球的平均數字
I wanted to show people the way data relates to their everyday lives.
與民眾的個人經驗不一致時, 他們會感到失望不已。
I started this advice column called "Dear Mona,"
我想告訴人們與我們 日常生活相關的數據。
where people would write to me with questions and concerns
我開設了一個專欄《親愛的夢娜》,
and I'd try to answer them with data.
人們會寫信詢問一些 他們所關心的事情,
People asked me anything.
我會試著用數據回答他們。
questions like, "Is it normal to sleep in a separate bed to my wife?"
人們會問我任何事情,
"Do people regret their tattoos?"
像是「跟老婆分床睡是正常的嗎?」
"What does it mean to die of natural causes?"
「人們會對身上的刺青覺得後悔嗎?」
All of these questions are great, because they make you think
「自然死亡」是甚麼意思?
about ways to find and communicate these numbers.
所有的問題都很棒, 因為這些問題會讓你思考,
If someone asks you, "How much pee is a lot of pee?"
用什麼方法尋找並傳達這些數字。
which is a question that I got asked,
如果有人問你,「尿多少尿才算太多?」
you really want to make sure that the visualization makes sense
我真的曾經被問過這個問題,
to as many people as possible.
你會很想用視覺化圖像來表達,
These numbers aren't unavailable.
這樣可以盡量讓更多人理解。
Sometimes they're just buried in the appendix of an academic study.
這些數字不是找不到。
And they're certainly not inscrutable;
有時候,數據只是被埋沒在 學術研究的附錄裡。
if you really wanted to test these numbers on urination volume,
但是它們並非難以理解的;
you could grab a bottle and try it for yourself.
如果你真的想要檢驗 這些有關尿量的數據,
(Laughter)
你自己拿個瓶子試試就知道了。
The point of this isn't necessarily
(笑聲)
that every single data set has to relate specifically to you.
重點是,這些數據
I'm interested in how many women were issued fines in France
並不是每樣都要與你有關。
for wearing the face veil, or the niqab,
我對於「法國有多少女人 因為戴面紗與頭巾而被罰款」
even if I don't live in France or wear the face veil.
這樣的議題很感興趣,
The point of asking where you fit in is to get as much context as possible.
即使我不住法國也不戴面紗。
So it's about zooming out from one data point,
問自己是否符合數據當中的情況, 是為了儘量得到更多的事件脈絡。
like the unemployment rate is five percent,
所以我們要更宏觀地觀察數據,
and seeing how it changes over time,
像是失業率 5% 這類的數據,
or seeing how it changes by educational status --
可以觀察它如何隨著時間而變化,
this is why your parents always wanted you to go to college --
或看看它在不同教育程度的差異──
or seeing how it varies by gender.
這也許是爸媽希望你進大學的原因──
Nowadays, male unemployment rate is higher
或是看它在不同性別上的表現。
than the female unemployment rate.
如今,男性的失業率
Up until the early '80s, it was the other way around.
已經比女性高了。
This is a story of one of the biggest changes
但是在 80 年代初期之前, 情況是相反的。
that's happened in American society,
這是美國社會到目前為止,
and it's all there in that chart, once you look beyond the averages.
其中一項最大的改變,
The axes are everything;
一旦你眼光放遠,不被平均數字侷限, 這些訊息都存在圖表當中。
once you change the scale, you can change the story.
軸線能呈現數據的各種意義;
OK, so the third and final question that I want you guys to think about
當你改變觀察的尺度, 你就能得到新的結論。
when you're looking at statistics is:
好,第三個也是最後一個問題,
How was the data collected?
當你觀察統計數據時 我希望各位去思考的是:
So far, I've only talked about the way data is communicated,
這些數據是如何收集而來的?
but the way it's collected matters just as much.
目前為止,我只談論到 呈現數據的方式,
I know this is tough,
但收集資料的方式也同樣重要。
because methodologies can be opaque and actually kind of boring,
我知道這很困難,
but there are some simple steps you can take to check this.
因為收集數據的方法, 經常是不透明而且有些無聊的,
I'll use one last example here.
但有一些步驟 可以給各位用來檢視數據。
One poll found that 41 percent of Muslims in this country support jihad,
這裡我要舉最後一個例子。
which is obviously pretty scary,
一份民調指出,國內有 41% 的 穆斯林支持伊斯蘭聖戰,
and it was reported everywhere in 2015.
聽起來相當嚇人,
When I want to check a number like that,
這份調查在 2015 年被大肆報導。
I'll start off by finding the original questionnaire.
當我想檢驗這樣的數據時,
It turns out that journalists who reported on that statistic
我會先尋找原始的問卷。
ignored a question lower down on the survey
結果發現,報導這則新聞的記者,
that asked respondents how they defined "jihad."
忽略了問卷當中的一個問題,
And most of them defined it as,
題目中詢問了受訪者 「如何定義伊斯蘭聖戰?」
"Muslims' personal, peaceful struggle to be more religious."
大多數人的定義是:
Only 16 percent defined it as, "violent holy war against unbelievers."
「為了更虔誠的信仰,穆斯林所進行 個人的、和平的內心鬥爭」。
This is the really important point:
只有 16% 的人認為是 「對抗不信教者的暴力神聖戰爭」。
based on those numbers, it's totally possible
所以真正的重點是:
that no one in the survey who defined it as violent holy war
根據原本的數據,很有可能
also said they support it.
那些將聖戰 定義為暴力神聖戰爭的人,
Those two groups might not overlap at all.
根本不支持聖戰。
It's also worth asking how the survey was carried out.
這兩群人可能沒有根本重疊。
This was something called an opt-in poll,
問卷調查的進行方式 也值得我們探討。
which means anyone could have found it on the internet and completed it.
這次的民調是一種稱為 「自願參與」的調查方式,
There's no way of knowing if those people even identified as Muslim.
意思就是,任何人都可以上網 找到並且參與這項調查。
And finally, there were 600 respondents in that poll.
你沒有辦法得知參與者 是否真的是穆斯林。
There are roughly three million Muslims in this country,
而且最後只有 600 個人 參與了那份民調。
according to Pew Research Center.
根據皮尤研究中心的資料,
That means the poll spoke to roughly one in every 5,000 Muslims
我們國內大約有三百萬名 伊斯蘭教信徒。
in this country.
意思就是國內每五千名穆斯林當中,
This is one of the reasons
大約只有一位填寫了那份問卷。
why government statistics are often better than private statistics.
這也是為什麼政府的統計數據,
A poll might speak to a couple hundred people, maybe a thousand,
通常比私人機構的調查 更為準確的原因之一。
or if you're L'Oreal, trying to sell skin care products in 2005,
一項民調可能訪談了幾百或一千人,
then you spoke to 48 women to claim that they work.
或者以萊雅公司在 2005 年 嘗試銷售護膚產品為例,
(Laughter)
只訪談了 48 位 認為產品有效的女性就好了。
Private companies don't have a huge interest in getting the numbers right,
(笑聲)
they just need the right numbers.
私人公司沒多少興趣 去追求數據的正確性,
Government statisticians aren't like that.
他們只需要「對」的數字。
In theory, at least, they're totally impartial,
但是政府的統計人員可不能如此。
not least because most of them do their jobs regardless of who's in power.
至少在理論上,他們必須完全公正,
They're civil servants.
特別是因為他們大多數都很盡職, 不受掌權者所影響。
And to do their jobs properly,
他們都是人民的公僕。
they don't just speak to a couple hundred people.
而為了做好份內的事,
Those unemployment numbers I keep on referencing
他們不能只調查幾百人。
come from the Bureau of Labor Statistics,
我所引用的失業率數字
and to make their estimates,
來自美國勞動統計局,
they speak to over 140,000 businesses in this country.
為了這項估計,
I get it, it's frustrating.
他們調查超過 14 萬家國內企業。
If you want to test a statistic that comes from a private company,
我懂,聽到這些很令人沮喪。
you can buy the face cream for you and a bunch of friends, test it out,
如果你想檢驗私人企業的 統計數據是否正確,
if it doesn't work, you can say the numbers were wrong.
你可以替自己或其他朋友 買面霜來試用,
But how do you question government statistics?
如果覺得沒有效果, 你就可以說他們的數據有誤。
You just keep checking everything.
但是你要如何 對政府的統計數據提出質疑呢?
Find out how they collected the numbers.
你需要檢查這些數據的方方面面。
Find out if you're seeing everything on the chart you need to see.
找出他們是如何收集這些數據的。
But don't give up on the numbers altogether, because if you do,
找出圖表中是否有你需要的全部訊息。
we'll be making public policy decisions in the dark,
但是也不要完全放棄數據, 因為如果你放棄了,
using nothing but private interests to guide us.
我們就會受私人利益的誤導,
Thank you.
在無知的狀態下, 制訂出錯誤的公共政策。
(Applause)
謝謝各位。