Name: 【科技】機器如何學習？為什麼網站都知道你想看什麼？(How Machines Learn)
Uploaded: 2019-05-26T21:01:20.000Z
Duration: 8 min 55 s

On the internet, the algorithms are all around you.

轉承詞

You are watching this video because an algorithm brought it to you (among others) to click, which you did, and the algorithm took note.

你正在看這支影片是因為演算法讓它出現在你的頁面，當你點擊它，演算法就會記錄。

When you open the TweetBook, A the algorithm decides what you see.

當你打開推特或臉書，演算法會決定你看到的。

When you search through your photos, A the algorithm does the finding.

當你搜索圖片，演算法給你結果。

When you buy something, A the algorithm sets the price and A the algorithm is at your bank watching transactions for fraud.

當你買東西，演算法呈現價格，演算法也在監視著詐欺行為。

The stock market is full of algorithms trading with algorithms.

股市充滿了演算法，與其他各式各樣的演算法互相交易。

Given this, you might want to know how these little algorithmic bots shaping your world work, especially when they don't.

有鑑於此，你可能想知道這些塑造你的生活的小機器人怎麼運作的，尤其是當它們異常時。

In Ye Olden Days, humans built algorithmic bots by giving them instructions the humans could explain.

在過去，人們用人能理解的程式語言打造演算法機器人。

「若滿足條件，則執行動作。」

But many problems are just too big and hard for a human to write simple instructions for.

但人無法用簡單的程式打造能處理巨大而複雜問題的演算法。

There's a gazillion financial transactions a second, which ones are fraudulent?

例如在每分每秒都有大筆金錢的交易，哪一個是詐欺？

假設有個影片網站上有一千兆億支影片。

Which eight should the user see as recommendations? Which shouldn't be allowed on the site at all?

哪些應該放在建議清單裡，哪些不應該被放到網站上？

For this airline seat, what is the maximum price this user will pay right now?

對航空公司來說，當前顧客願意支付的最高價位是多少？

Algorithmic bots give answers to these questions.

演算法機器人能回答這些問題的答案。

Not perfect answers, but much better than a human could do.

不盡完美，但遠比人類做得到的好太多了。

But how these bots work exactly, more and more, no one knows.

但是，演算法機器人實際上如何運作，漸漸地，沒有人知道。

Not even the humans who built them, or "built them", as we will see...

甚至連打造他們的人都不知道了，我稍後詳述。

Now companies that use these bots don't want to talk about how they work because the bots are valuable employees.

使用演算法的公司不想談論它們怎麼運作的，因為演算法是有價值的員工。

And how their brains are built is a fiercely-guarded trade secret.

演算法如何打造是絕對的商業機密。

Right now the cutting edge is most likely very "I hope you like linear algebra."

今日最前端的技術通常希望你：「喜歡線性代數。」

But what the current hotness is on any particular site and how the bots work,  is a bit "I dunno," and always will be.

這種技術通常出現在各大網站，那些演算法如何運作總是讓人有點無法理解。

So let's talk about one of the more quaint but understandable ways bots CAN be "built" without understanding how their brains work.

那麼讓我們談談一種打造演算法的精巧易懂的方法，而且不須弄懂它們的大腦怎麼運轉的。

Say you want a bot that can recognize what is in a picture.

假設你想要一個可以識別照片有什麼的演算法。

It's easy for humans (even little humans).

對人類 (甚至是寶寶) 都很容易。

But it's impossible to just tell a bot  in bot language how to do it, because really we just know that's a bee and that's a three.

但若只是用程式語言告訴演算法怎麼做是不可能的，因為我們也只是知道「這是蜜蜂，那是 3」。

We can say in words what makes them different, but bots don't understand words.

我們可以用語言表示兩者差別，但演算法不懂人類語言，。

And it's the wiring in our brains that makes it happen anyway.

這是寫在人類大腦神經迴路裡的本能。

While an individual neuron may be understood, and clusters of neurons' general purpose vaguely grasped, the whole is beyond.

一個神經元還能理解，但無數個神經迴路的作用就只能略懂幾分，整個大腦更是不用說了。

So to get a bot that can do this sorting, you don't build it yourself.

所以要得到能做這種區別的演算法，你不必直接做。

You build a bot that builds bots, and a bot that teaches bots.

你只要造個能打造演算法的演算法機器人，和訓練演算法的演算法機器人。

These bots' brains are simpler, something a smart human programmer can make.

這些演算法機器人的腦袋比較簡單，聰明的程式設計師做得出來。

The builder bot builds bots,  though it's not very good at it.

讓演算法機器人打造演算法， 雖然它們的品質不是很好。

At first it connects the wires and modules in the bot brains almost at random.

一開始是用隨機的方式組合線路與模組。

This leads to some very... "special" student bots sent to teacher bot to teach.

這讓一些非常 ……「特殊」的演算法學生交給演算法機器人老師教。

Of course, teacher bot can't tell a bee from a three either.

當然，教師演算法也不能區別蜜蜂和 3。

If the human could build teacher bot to do that,  well, then, problem solved.

如果人類可以讓演算法機器人老師做到這點， 那麼，問題就解決了。

Instead the human gives teacher bot a bunch of "bee" photos, and "three" photos, and an answer key to which is what.

相反地，人類給演算法機器人老師一大堆「蜜蜂」和「3」的照片，以及區別兩者的關鍵。

Teacher bot can't teach, but teacher bot can TEST.

演算法機器人老師不能教，但它可以考試。

The adorkable student bots stick out their tongues, try very hard, but they are bad at what they do.

起初那些可笑的機器人學生努力嘗試，但他們做得不好。

And it's not their fault, really, they were built that way.

這不是它們的錯，真的， 它們就是這麼打造的。

Grades in hand, the student bots take a march of shame back to builder bot.

拿著糟糕的成績單，這些演算法學生踏上雪恥之路去找演算法打造者。

Those that did best are put to one side, the others recycled.

考得最好的演算法放在一邊，其他的回收再利用。

Builder bot still isn't good at building bots, but now it takes those left and makes copies with changes in new combinations.

演算法打造者仍然不擅長打造，但現在只需要複製留下來的演算法，複製它們並重新組合，

Teacher bot teaches - er, tests again, and builder bot builds again.

給演算法機器人老師測試，再由演算法打造者回收重組。

Now a builder that builds at random,  and a teacher that doesn't teach, just tests, and students who can't learn, they just are what they are, in theory shouldn't work, but in practice, it does.

有個負責隨機打造的演算法、一個不會教只會考試的教師，還有不會學習的學生。它們只是這個樣子，照理行不通，但在實際運用中，這種方法真的有用。

Partly because in every iteration, builder bot's slaughterhouse keeps the best and discards the rest.

一部分原因在於，在每一次迭代，打造者演算法保留最好的，回收其餘的。

And partly because teacher bot isn't overseeing an old-timey, one-room schoolhouse with a dozen students, but an infinite warehouse with thousands of students.

另一部分是由於，演算法機器人老師並不是在一個老舊教室裡監督一打學生考試，而是在一個擁有數千名學生的無限教室。

The test isn't ten questions, but a million questions.

考試不只有十個問題，而是一百萬個問題。

And how many times does the test, build, test loop repeat?

測試，回收重組，要循環重複多少次？

At first students that survive are just lucky, but by combining enough lucky bots, and keeping only what works, and randomly messing around with new copies of that, eventually a student bot emerges that isn't lucky, that can perhaps barely tell bees from threes.

起初倖存的學生演算法只是幸運，但透過組合足夠的幸運演算法，且一直保持它的功能，並隨意組合新的複製版本，最終一個演算法學生脫穎而出了，而且不是只靠運氣，雖然只是勉強能區分蜜蜂和 3。

As this bot is copied and changed, slowly the average test score rises.

每當這一個演算法學生被複製和重組，平均測試成績就會慢慢上升。

And thus the grade needed to survive the next round gets higher and higher.

因此下一代生存所需的分數越來越高。

Keep this up and eventually from the infinite warehouse (slaughterhouse) a student bot will emerge, who can tell a bee from a three in a photo it's never seen before pretty well.

保持這一點，最終從無限教室中 (其實是屠宰場)，一個從未見過的演算法將出現，它可以完美區別照片中的蜜蜂和 3。

But how the student bot does this, neither the teacher bot nor the builder bot, nor the human overseer, can understand.

但是，該演算法如何做到這一點，演算法機器人老師或演算法打造者都不知道，也不是人類所能理解的。

After keeping so many useful random changes, the wiring in its head is incredibly complicated.

保持這麼多有用的隨機變化後，其構造變得非常複雜。

And while an individual line of code may be understood, and clusters of code's general purpose vaguely grasped, the whole is beyond, nonetheless, it works.

一行程式碼還能理解，但無數行程式碼的作用就只能略懂幾分，一整組演算法就不用說了，儘管如此，它就是有用。

But this is frustrating, especially as the student bot is very good at exactly only the kinds of questions it's been taught to.

但是這是令人沮喪的，尤其是演算法只擅長有被教導的那個問題。

It's great with photos, but useless with videos or baffled if the photos are upside down, or things that are obviously not bees, it's confident are.

它擅長處理照片，但影片或倒過來的照片就沒轍了，或者顯然不是蜜蜂的東西，卻會歸類到蜜蜂。

Since teacher bot can't teach, all the human overseer can do is give it more questions, to make the test even longer, to include the kinds of questions the best bots get wrong.

因為演算法機器人老師不會教，所有的人類能做的就是給予更多的問題，使測試更長更完善，而且包含連最好的演算法學生都會答錯的問題。

It's a reason why companies are  obsessed with collecting data.

這就是為什麼大公司會著迷於蒐集數據。

More data equals longer tests equals better bots.

更多數據等於更長的測試等於更好的演算法機器人。

So when you get the "Are you human?" test on a website, you are not only proving that you are human,  (hopefully), but you are also helping to build the test to make bots that can read, or count, or tell lakes from mountains, or horses from humans.

所以當你在網站上遇到「你是人類嗎？」的認證測試，你不僅證明你是人類，(但願如此)，你也在幫助建立測試，使演算法可以讀懂或記數，或者分辨高山與湖泊、馬與人類。

Seeing lots of questions about driving lately?

最近看過很多關於道路的問題？

Hmm...! What could that be building a test for?

嗯 ... 可能是針對什麼製作的測試？

Now figuring out what's in a photo, or on a sign, or filtering videos, requires humans to make correct enough tests.

搞清楚圖片或路標裡有什麼，或過濾影片內容，這些要求人類做出足夠正確的測試。

But there is another kind of test that makes itself.

For example, say entirely hypothetical NetMeTube wanted users to keep watching as long as possible?

舉例來說，假設有個影片網站希望用戶觀看影片愈長時間愈好。

Well, how long a user stays on the site is easy to measure.

測量用戶在網站上逗留多久時間是非常容易的。

So, teacher bot gives each student bot a bunch of NetMeTube users to oversee, the student bots watch what their user watches, looks at their files, and do their best to pick the videos that keep the user on the site.

所以，演算法機器人老師要每個演算法學生各自觀察一些影片網站的用戶，演算法學生觀察他們的用戶，看他們的資料，並盡力選擇影片，讓用戶在網站上待更久。

The longer the average, the higher their test score.

平均時間越長，測試成績越高。

A million cycles later, there's a student bot who's pretty good at keeping the users watching, at least compared to what a human could build.

一百萬週期之後，就可能有個演算法，非常擅長於挑選影片讓用戶待很久，至少與人類直接打造的演算法相比。

But when people ask: "How does the NetMeTube algorithm select videos?"

但是當人們問：「該網站的演算法如何選擇影片？」

Once again, there isn't a great answer other than pointing to the bot, and the user data it had access to.

除了演算法機器人本身和它擁有的用戶資料之外，就沒有更好的答案了。

And most vitally, how the human overseers  direct teacher bot to score the test.

和最重要的是，人類如何指導演算法機器人老師改考卷。

That's what the bot is trying to be good at to survive.

這就是演算法試圖善於生存的原因。

But what the bot is thinking, or how it thinks it, is not really knowable.

但是演算法在想什麼，或者它是怎麼思考，實在是不得而知。

All that's knowable is this student bot gets to be the algorithm, because it's point one percent better than the previous bot at the test the humans designed.

我們只知道的，這個演算法學生成為了真正的演算法，只因在人類設計的測試中，它比之前的演算法要好 1%。

So everywhere on the internet, behind the scenes, there are tests to increase user interaction, or set prices just right to maximize revenue, or pick the posts from all your friends you'll like the most, or articles people will share the most, or whatever.

總的來說，用來增進用戶體驗的演算法測試在網路上無所不在，可能用來設定價格好達到最大收益，或是選你會最感興趣的好友 po 文、你會想看的熱門文章之類。

If it's testable, it's teachable. Well, "teachable," and a student bot will graduate from the warehouse  to be the algorithm of its domain.

如果是測試得到的東西，那就是可用來教授的。沒錯，「可教授的」，一個演算法學生將從無限教室畢業，成為其領域的演算法。

We're used to the idea that the tools we use, even if we don't understand them, someone does.

我們已經習慣於使用工具，即便我們根本不了解它們，雖然有些人是了解。

But with our machines that learn, we are increasingly in a position where we use tools, or are used by tools, that no one, not even their creators, understand.

但隨著機器學得愈多，我們愈來愈處於一種究竟是我們在使用工具，還是被工具使用的窘境，沒有人，連它們的創造者都不知道。

We can only hope to guide them with the tests we make.

我們只能希望透過我們所做的測試來指正它們。

And we need to get comfortable with that, as our algorithmic bot buddies are all around, and not going anywhere.

我們需要適應這點，因為我們的演算法小夥伴到處都是，而且怎樣也不會離去。

This is where I need to ask you... to like... comment ...and subscribe.

是時候我該麻煩你 ... 按下喜歡 ... 留言 ... 和訂閱。