• Six thousand miles of road,

六千英里公路，

• 600 miles of subway track,

六百英里地鐵路線，

• 400 miles of bike lanes

四百英里腳踏車專用道，

• and a half a mile of tram track,

半英里的有軌電車專用道

• if you've ever been to Roosevelt Island.

僅在羅斯福島。

• These are the numbers that make up the infrastructure of New York City.

這些數字構成了紐約市的基建。

• These are the statistics of our infrastructure.

這些基建的統計數字，

• They're the kind of numbers you can find released in reports by city agencies.

都可以在市政機關公佈的報告中找到。

• For example, the Department of Transportation will probably tell you

譬如，交通部門可能會告訴你，

• how many miles of road they maintain.

他們維護這多少英里的道路。

• The MTA will boast how many miles of subway track there are.

MTA（紐約交通運輸管理局）會自誇 他們掌管著多少英里捷運。

• Most city agencies give us statistics.

多數的市政機關都在公佈統計數據。

• This is from a report this year

這是今年計程車與轎車委員會發佈的報告，

• from the Taxi and Limousine Commission,

我們從中知道紐約市運營著 大約一萬三千五百輛計程車。

• where we learn that there's about 13,500 taxis here in New York City.

很有趣，是嗎？

• Pretty interesting, right?

但你有否想過這些數據來自哪裡？

• But did you ever think about where these numbers came from?

既然有這些數字存在， 那肯定是因為在市政機關的某個人

• Because for these numbers to exist, someone at the city agency

想過：嗯......這個數字可能有人會想知道。

• had to stop and say, hmm, here's a number that somebody might want want to know.

這個數字是市民們想知道的。

• Here's a number that our citizens want to know.

所以他們找回那些原始數據，

• So they go back to their raw data,

他們計數、相加、計算，

• they count, they add, they calculate,

然後把得出的結果寫進報告中，

• and then they put out reports,

所以那些報告中會有這樣的數字。

• and those reports will have numbers like this.

那麼問題來了：他們怎麼會知道 我們的問題都是什麼？

• The problem is, how do they know all of our questions?

我們有很多問題。

• We have lots of questions.

事實上，可以說我們有無窮無盡的問題

• In fact, in some ways there's literally an infinite number of questions

有關我們這座城市。

市政機關可無法跟得上（我們的節奏）。

• The agencies can never keep up.

現有模式並不具有實效，我覺得 我們的政策制定者也知道這點，

• So the paradigm isn't exactly working, and I think our policymakers realize that,

因為在2012年，彭博市長 簽署了一個法令，他稱之為

• because in 2012, Mayor Bloomberg signed into law what he called

全美最具雄心和綜合性的 開放數據立法。

• the most ambitious and comprehensive open data legislation in the country.

從各種意義上來說，他是對的。

• In a lot of ways, he's right.

在過去兩年中，市政有1000個數據庫

• In the last two years, the city has released 1,000 datasets

放在我們的開放數據門戶網站上，

• on our open data portal,

還是蠻驚人的。

• and it's pretty awesome.

我們來檢視這些數據，

• So you go and look at data like this,

除了數數計程車的數量，

• and instead of just counting the number of cabs,

我們也能開始問不一樣的問題了。

• we can start to ask different questions.

我有一個問題：

• So I had a question.

紐約市的交通高峰在什麼時候？

• When's rush hour in New York City?

這簡直煩人。高峰到底是什麼時候？

• It can be pretty bothersome. When is rush hour exactly?

我想到，這些計程車可不僅僅是個數字，

• And I thought to myself, these cabs aren't just numbers,

它們可以是開遍全市道路的GPS記錄儀，

• these are GPS recorders driving around in our city streets

記錄著乘客的每一差車程。

• recording each and every ride they take.

數據是現成的。我檢視它們，

• There's data there, and I looked at that data,

並制出一張圖表，標出 一天中紐約市計程車的平均時速。

• and I made a plot of the average speed of taxis in New York City throughout the day.

大家可以看到， 從半夜到凌晨五點十八分，

• You can see that from about midnight to around 5:18 in the morning,

時速一直在增加，然後到了拐點，

• speed increases, and at that point, things turn around,

時速逐漸下降，在早間的八點三十五分，

• and they get slower and slower and slower until about 8:35 in the morning,

時速降到十一英里半。

• when they end up at around 11 and a half miles per hour.

運營中計程車的平均時速 保持在十一英里半，

• The average taxi is going 11 and a half miles per hour on our city streets,

結果沒有變化，

• and it turns out it stays that way

整天都是如此。

• for the entire day.

我告訴自己，紐約市並不存在高峰時段，

• So I said to myself, I guess there's no rush hour in New York City.

而是全天都高峰。

• There's just a rush day.

這是個有意義的結論，原因有幾點。

• Makes sense. And this is important for a couple of reasons.

如果你是做交通規劃的， 知道這個結論會有意義。

• If you're a transportation planner, this might be pretty interesting to know.

如果你要快速到達某地，

• But if you want to get somewhere quickly,

只要把鬧鐘定在凌晨四點四十五分就行了。

• you now know to set your alarm for 4:45 in the morning and you're all set.

紐約嘛！

• New York, right?

但這個數據背後還有故事。

• But there's a story behind this data.

這個數據並不真的是現成的。

• This data wasn't just available, it turns out.

你需要做一個「信息自由法案申請」，

• It actually came from something called a Freedom of Information Law Request,

也叫「FOIL申請」。

• or a FOIL Request.

你可以在計程車和轎車委員會的網站上 找到相關申請表。

• This is a form you can find on the Taxi and Limousine Commission website.

如果要獲得這些數據， 你要弄到這張申請表，

• In order to access this data, you need to go get this form,

填好上交，受理人員屆時會通知你。

• fill it out, and they will notify you,

一個叫克里斯▪旺的人就這樣做了。

• and a guy named Chris Whong did exactly that.

克里斯來到委員會，工作人員告訴他

• Chris went down, and they told him,

「帶個全新的硬盤來辦公室，

• "Just bring a brand new hard drive down to our office,

我們會把相關數據拷貝給你， 過五小時來拿。」

• leave it here for five hours, we'll copy the data and you take it back."

這就是拿到數據的經過。

• And that's where this data came from.

克里斯想公開這些數據，

• Now, Chris is the kind of guy who wants to make the data public,

於是放到網路上供所有人使用， 所以我才能做出這張圖。

• and so it ended up online for all to use, and that's where this graph came from.

這一切——這些GPS記錄儀真是酷。

• And the fact that it exists is amazing. These GPS recorders -- really cool.

但是，市民要攜帶自己的移動硬盤

• But the fact that we have citizens walking around with hard drives

踏遍市政機關， 然後通過自己的努力公開，這件事——

• picking up data from city agencies to make it public --

政府數據可以說是公開的， 普通市民能得到它，

• it was already kind of public, you could get to it,

但這只是名義上的「公開」， 並不是真正的公開。

• but it was "public," it wasn't public.

我們的城市可以做得更好。

• And we can do better than that as a city.

我們不需要費力帶著移動硬盤到處跑。

• We don't need our citizens walking around with hard drives.

並不是每一個數據庫都需要FOIL申請。

• Now, not every dataset is behind a FOIL Request.

我做的這張地圖標出了紐約市最危險的路口，

• Here is a map I made with the most dangerous intersections in New York City

來源是腳踏車騎行者的交通事故數據。

• based on cyclist accidents.

紅色區域更危險，

• So the red areas are more dangerous.

圖上顯示，首先，曼哈頓的東側，

• And what it shows is first the East side of Manhattan,

特別是曼哈頓的下城區域， 腳踏車事故更多。

• especially in the lower area of Manhattan, has more cyclist accidents.

這可能是因為，

• That might make sense

在這裡有更多的騎行者從大橋下來。

• because there are more cyclists coming off the bridges there.

圖上還有其他的熱點區域值得研究。

• But there's other hotspots worth studying.

威廉姆斯堡、皇后區的羅斯福大道，

• There's Williamsburg. There's Roosevelt Avenue in Queens.

這些咨詢才是Vision Zero項目所需要的。

• And this is exactly the kind of data we need for Vision Zero.

這正是我們要找的東西。

• This is exactly what we're looking for.

這個數據背後也有個故事。

• But there's a story behind this data as well.

這個數據並不是現成的。

• This data didn't just appear.

有多少人知道這個符號？

• How many of you guys know this logo?

我看到有人點頭了。

• Yeah, I see some shakes.

你們有沒有試過從PDF文檔中 拷貝和黏貼數據，

• Have you ever tried to copy and paste data out of a PDF

並據此作出結論呢？

• and make sense of it?

我看到更多人點頭了。

• I see more shakes.

試圖拷貝粘貼的人 比認識這個標誌的人更多，真有趣。

• More of you tried copying and pasting than knew the logo. I like that.

你們剛剛看到的數據是做在PDF裡的。

• So what happened is, the data that you just saw was actually on a PDF.

事實上，是成千上萬頁的PDF文檔，

• In fact, hundreds and hundreds and hundreds of pages of PDF

由我們的紐約警署發佈。

• put out by our very own NYPD,

如果你想享用這些數據， 你要不就持續

• and in order to access it, you would either have to copy and paste

做複製黏貼的動作，花掉成千上萬小時，

• for hundreds and hundreds of hours,

要不就像約翰▪克勞斯一樣。

• or you could be John Krauss.

約翰▪克勞斯

• John Krauss was like,

可不想重複地去複製黏貼， 他寫了一個程式。

• I'm not going to copy and paste this data. I'm going to write a program.

這個程序叫做 「紐約警署交通事故數據OK蹦」，

• It's called the NYPD Crash Data Band-Aid,

它能到紐約警署的網站下載PDF文檔，

• and it goes to the NYPD's website and it would download PDFs.

每天它都去搜索； 如果找到一個PDF文檔，就下載下來，

• Every day it would search; if it found a PDF, it would download it

然後運行某個PDF解碼的程式，

• and then it would run some PDF-scraping program,

把其中的文字信息提取出來，

• and out would come the text,

其中的訊息會發佈在網路上， 人們就可以製作這些地圖。

• and it would go on the Internet, and then people could make maps like that.

這些數據就在那兒，我們都能得到——

• And the fact that the data's here, the fact that we have access to it --

每一個交通事故就是一行數據。

• Every accident, by the way, is a row in this table.

你們可以想像有多少PDF需要轉碼。

• You can imagine how many PDFs that is.

——我們能看到這些數據固然好，

但能不能不要弄成PDF格式的，

• but let's not release it in PDF form,

不然市民們就得去寫PDF解碼的程式，

• because then we're having our citizens write PDF scrapers.

這對市民的時間來說是一種浪費，

• It's not the best use of our citizens' time,

而我們的城市能做的更好。

• and we as a city can do better than that.

有個好消息，白思豪市長的班底

• Now, the good news is that the de Blasio administration

在幾個月前公開了這份數據，

• actually recently released this data a few months ago,

所以我們能直接享用這些數據，

然而還有很多數據是PDF格式的。

• but there's a lot of data still entombed in PDF.

譬如，我們的罪案數據目前只有PDF格式的。

• For example, our crime data is still only available in PDF.

除了罪案數據，市政預算也是如此。

• And not just our crime data, our own city budget.

目前我們的市政預算只有PDF格式的。

• Our city budget is only readable right now in PDF form.

不僅是我們無法分析這些數字，

• And it's not just us that can't analyze it --

那些為市政預算投票的立法委員們

• our own legislators who vote for the budget

也只能拿到PDF版本的數字。

• also only get it in PDF.

所以我們的立法委員是無法分析 他們要為之投票的市政預算的。

• So our legislators cannot analyze the budget that they are voting for.

我認為我們的城市還能做得更好。

• And I think as a city we can do a little better than that as well.

很多數據已經不躲在PDF中了。

• Now, there's a lot of data that's not hidden in PDFs.

這裡有一幅地圖可以作為例證，

• This is an example of a map I made,

標示了紐約市最骯髒的水路。

• and this is the dirtiest waterways in New York City.

我是如何衡量「骯髒」的呢?

• Now, how do I measure dirty?

這裡有些奇怪，

• Well, it's kind of a little weird,

我衡量的是糞便大腸菌群的水平，

• but I looked at the level of fecal coliform,

這是水路中糞便物質的一種衡量指標。

• which is a measurement of fecal matter in each of our waterways.

圓圈越大，水就越髒，

• The larger the circle, the dirtier the water,

所以圖上的大圓圈代表髒水， 小圓圈代表乾淨的水。

• so the large circles are dirty water, the small circles are cleaner.

大家看到的是內河水道。

• What you see is inland waterways.

這裡有紐約市過去五年採樣的所有數據。

• This is all data that was sampled by the city over the last five years.

內河水道總的來說變髒了。

• And inland waterways are, in general, dirtier.

這個結論挺合理的，對嗎？

• That makes sense, right?

大圓圈代表髒水。 我從中學到了幾件事情。

• And the bigger circles are dirty. And I learned a few things from this.

第一：千萬別在任何叫做「xx溪」 或「xx運河」的地方游泳。

• Number one: Never swim in anything that ends in "creek" or "canal."

但是第二：紐約市最髒的水路，

• But number two: I also found the dirtiest waterway in New York City,

只看（糞便大腸菌群）這個唯一的指標，

• by this measure, one measure.

在康尼島溪，幸好不是你們游泳的康尼島。

• In Coney Island Creek, which is not the Coney Island you swim in, luckily.

那在島的另一面。

• It's on the other side.

但在康尼島溪中， 過去五年的採樣中有94%

• But Coney Island Creek, 94 percent of samples taken over the last five years

含有超標的糞便含量，

• have had fecal levels so high

以至於達到州法律禁止游泳的水平。

• that it would be against state law to swim in the water.

這種類型的事實

• And this is not the kind of fact that you're going to see

你可不會在市政報告中看到，不是嗎？

• boasted in a city report, right?

這也不會登上紐約市政府網站的頭條。

• It's not going to be the front page on nyc.gov.

我們肯定不會看到的，

• You're not going to see it there,

但能看到這些數據真實不錯。

• but the fact that we can get to that data is awesome.

同樣，拿到這些數據並不容易，

• But once again, it wasn't super easy,

因為它們並不在公開數據門戶網站上。

• because this data was not on the open data portal.

如果你看公開數據的門戶網站，

• If you were to go to the open data portal,

你只能看到其中一些片段， 只有一年內或幾個月的數據。

• you'd see just a snippet of it, a year or a few months.

這些數據其實是在環境保護部門的網站上。

• It was actually on the Department of Environmental Protection's website.

每一個鏈接都是一個Excel文件， 而每個Excel文件都是不一樣的。

• And each one of these links is an Excel sheet, and each Excel sheet is different.

每一個表頭都不同： 需要複製、黏貼、還有重新整理。

• Every heading is different: you copy, paste, reorganize.

一旦完成你就能做出這些地圖， 但我要再次重申，

• When you do you can make maps and that's great, but once again,

我們的城市能做的更好， 我們可以標準化。

• we can do better than that as a city, we can normalize things.

我們正在改善這裡有個 索克拉塔公司建立的網站

• And we're getting there, because there's this website that Socrata makes

叫做「紐約市公開數據門戶」。

• called the Open Data Portal NYC.

這裡，1100個數據庫

• This is where 1,100 data sets that don't suffer

都不存在標準化的問題，

• from the things I just told you live,

而且（這些無縫連接的數據庫）數字還在增加。

• and that number is growing, and that's great.

你可以下載任一格式的數據： CSV、PDF或Excel文件都可以。

• You can download data in any format, be it CSV or PDF or Excel document.

按你自己的需求來下載。

但問題又來了，

• The problem is, once you do,

你會發現不同的機構 用不同的代碼來表示地址。

• you will find that each agency codes their addresses differently.

有街道名、有路口名、

• So one is street name, intersection street,

行政區、地址、建築物、建築物地址等等。

所以，即使有這個門戶網站的幫助，

• So once again, you're spending time, even when we have this portal,

你還得花時間來標準化地址這塊的數據。

• you're spending time normalizing our address fields.

這也不是有效利用市民時間的方法。

• And that's not the best use of our citizens' time.

我們的城市能做得更好。

• We can do better than that as a city.

我們可以對地址進行標準化，

• We can standardize our addresses,

如果做到了， 我們就能做出更多這樣的地圖。

• and if we do, we can get more maps like this.

這是紐約市消防龍頭的地圖，

• This is a map of fire hydrants in New York City,

但不僅於此。

• but not just any fire hydrants.

這些是前250個吃到最多違章停車罰單的 消防栓位置圖。

• These are the top 250 grossing fire hydrants in terms of parking tickets.

我從圖中學到了幾件事， 我也真的喜歡這張圖。

• So I learned a few things from this map, and I really like this map.

第一：別在上東區停車。

• Number one, just don't park on the Upper East Side.

千萬別停。因為不管停哪裡都會吃罰單。

• Just don't. It doesn't matter where you park, you will get a hydrant ticket.

第二：我找出了全紐約市最最容易 吃到違章停車罰單的兩個消防栓的位置，

• Number two, I found the two highest grossing hydrants in all of New York City,

兩個都在下東區，

• and they're on the Lower East Side,

每年能在罰單上創收五萬五千多美金。

• and they were bringing in over 55,000 dollars a year in parking tickets.

我注意到這點，覺得有些奇怪，

• And that seemed a little strange to me when I noticed it,

於是深入挖掘了一下原因， 結果發現消防栓

• so I did a little digging and it turns out what you had is a hydrant

都有一個叫做控制擴展的區域，

• and then something called a curb extension,

是約有七英呎的一塊地方，可以走路，

• which is like a seven-foot space to walk on,

然後是一個停車位。

• and then a parking spot.

所以車開過來，司機發現消防栓，

• And so these cars came along, and the hydrant --

想“還有一段距離，這裡沒問題的”，

• "It's all the way over there, I'm fine,"

何況地上還有一個畫得美美的停車位，

• and there was actually a parking spot painted there beautifully for them.

司機停好車，但紐約警署不同意這種配置，

• They would park there, and the NYPD disagreed with this designation

開出了罰單。

• and would ticket them.

可不只是我本人吃了罰單，

• And it wasn't just me who found a parking ticket.

這是谷歌街景拍到的一輛過路車，

• This is the Google Street View car driving by

也吃了同樣的一張罰單。

• finding the same parking ticket.

於是我把這件事發到自己的部落格上 以及“I Quant NY”上，

• So I wrote about this on my blog, on I Quant NY, and the DOT responded,

結果交通部門回復如下：

• and they said,

“交通部並未就此地點收到相關投訴，

我們會重新檢視道路標誌， 並做出適當的改善措施。”

• we will review the roadway markings and make any appropriate alterations."

我暗自想：多麼官腔！

• And I thought to myself, typical government response,

好吧，我該幹嘛幹嘛去了。

• all right, moved on with my life.

然而，幾週時間過去， 發生了意料之外的事情。

• But then, a few weeks later, something incredible happened.

停車位重新畫了，

• They repainted the spot,

那一瞬間我覺得能看到公開數據的未來。

• and for a second I thought I saw the future of open data,

大家想想這件事，

• because think about what happened here.

過去五年，這個讓人困惑的停車位 一直讓人吃罰單，

• For five years, this spot was being ticketed, and it was confusing,

但某一天，一位市民發現了問題 報告市政機關，又過了幾週時間，

• and then a citizen found something, they told the city, and within a few weeks

問題車位被修正了。

• the problem was fixed.

太不可思議了。很多人認為 公開數據讓市民變成政府的監視者，

• It's amazing. And a lot of people see open data as being a watchdog.

並非如此，它實則讓人們成為了合作夥伴。

• It's not, it's about being a partner.

市民能夠有底氣成為政府更好的合作夥伴，

• We can empower our citizens to be better partners for government,

這並不難。

• and it's not that hard.

我們只需要作出一些改變。

• All we need are a few changes.

如果我們在申請FOIL信息自由法案數據，

• If you're FOILing data,

如果你看到自己申請的數據已經被反覆申請，

• if you're seeing your data being FOILed over and over again,

讓我們直接向公眾公開， 因為反覆申請就是需要公開的一种信號。

• let's release it to the public, that's a sign that it should be made public.

如果某個政府機關正在發佈PDF數據，

• And if you're a government agency releasing a PDF,

讓我們通過法案 要求他們發佈隱藏的數據，

• let's pass legislation that requires you to post it with the underlying data,

因為這些數據必定有來源。

• because that data is coming from somewhere.

我不知道從哪兒，但肯定有來源，