字幕列表 影片播放
-
Hello, my name is Christian Rudder,
大家好,我的名字叫 Christian Rudder
-
and I was one of the founders of OK Cupid.
我是 OK Cupid 的創辦者之一
-
It's now one of the biggest dating sites in the United States.
現在它是美國最大的交友網站之一
-
Like almost everyone at the site,
跟這網站的其它負責人一樣
-
I was a math major, and, as you might expect,
我主修數學,而就如你所預期的
-
we're known for the analytic approach
我們較為人知的是
-
we have taken to love.
用分析方式研究戀愛行為
-
We call it our matching algorithm.
我們把它叫做「速配演算法」
-
Basically OK Cupid's matching algorithm helps us decide whether two people should go on a date.
基本上,OK Cupid 的速配演算法幫助我們決定某兩個人該不該去約會
-
We built our entire business around it.
這是我們事業的技術核心
-
Now, algorithm is a fancy word,
演算法聽起來很花俏
-
and people like to drop it like it's this big thing,
而人們放棄搞懂因為它太複雜了
-
but, really, an algorithm is just a systematic,
但說真的,演算法只是一個有系統的
-
step-by-step way to solve a problem.
一步一步解決問題的方法
-
It doesn't have to be fancy at all.
完全不必要是花俏的
-
Here, in this lesson, I'm going to explain
這個課程裡,我將會解釋
-
how we arrived at our particular algorithm
我們是怎麼設計我們的演算法
-
so you can see how it's done.
來讓你理解它是如何運作的
-
Now, why are algorithms even important?
為什麼演算法如此重要
-
Why does this lesson even exist?
又為什麼要有這個課程
-
Well, notice one very significant phrase I used above:
這個,請注意我剛剛用的那個非常具暗示性的詞彙:
-
they are a "step-by-step" way to solve a problem,
演算法是「一步一步」解決問題的方法
-
and, as you probably know,
而就像你可能知道的
-
computers excel at step-by-step processes.
電腦很擅長做這種一步一步規劃好的程序
-
A computer without an algorithm
一臺沒有演算法的電腦
-
is basically an expensive paperweight.
基本上只是一個很貴的紙鎮而已
-
And since computers are such a pervasive part of everyday life, algorithms are everywhere.
由於電腦在日常生活中已經非常普及,所以演算法也是無所不在
-
The math behind OK Cupid's matching algorithm
而 OK Cupid 演算法背後的數學
-
is surprisingly simple.
其實異常地簡單
-
It's just some addition,
只是一些加法
-
multiplication,
乘法
-
a little bit of square roots.
還有一些些開根號
-
The tricky part in designing it, though,
然而要設計它比較麻煩的部份
-
was figuring out how to take something mysterious,
反而是要想辦法把一些神秘的東西
-
human attraction,
例如人類的吸引力
-
and break it into components that a computer can work with.
把它變成電腦可以運算的東西
-
Well, the first thing we needed to match people up was data,
那麼,要將人們配對,我們首先需要的是數據
-
something for the algorithm to work with.
也就是要讓演算法能夠計算的東西
-
The best way to get data quickly from people
要快速取得人們資料的最好方法
-
is to just ask for it.
就是直接問他
-
So, we decided that OK Cupid should ask users questions,
所以,我們決定 OK Cupid 應該要問使用者一些問題
-
stuff like, "Do you want to have kids one day?"
像是:「你未來希望有小孩嗎?」
-
and "How often do you brush your teeth?",
還有「你多常刷牙?」
-
"Do you like scary movies?"
「你喜歡恐怖片嗎?」
-
and big stuff like "Do you believe in God?"
以及較大的問題,像是「你相信神嗎?」
-
Now, a lot of the questions are good
而很多問題都有助於
-
for matching like with like,
將有相同喜好的人配在一起
-
that is when both people answer the same way.
這是當雙方都回答了相同答案的情況
-
For example, two people who are both into scary movies are probably a better match than one person who is and one person who isn't.
舉例來說,兩個都喜歡恐怖片的人也許就是不錯的配對,這比起將喜歡和不喜歡的人配在一起好
-
But what about a question like,
但如果是像這樣的問題:
-
"Do you like to be the center of attention?"
「你喜歡成為眾人的焦點嗎?」
-
If both people in a relationship are saying yes to this,
如果一對情侶的兩個人都說「喜歡」
-
then they are going to have massive problems.
那麼他們就有大問題了
-
We realized this early on,
我們很早就知道這點
-
and so we decided we needed
所以我們決定
-
a bit more data from each question.
每個問題都需要再多一點資訊
-
We had to ask people to specify not only their own answer,
我們要求使用者不只回答問題本身
-
but the answer they wanted from someone else.
同時也回答他們對別人的期望
-
That worked really well,
這效果真的很好
-
but we needed one more dimension.
但我們還須要另一個衡量的維度
-
Some questions tell you more about a person than others.
有一些問題比起其它問題,更能提供一個人的個性
-
For example, a question about politics, something like,
比如說政治議題如:
-
"Which is worse: book burning or flag burning?"
「哪一個比較糟:燒書或是燒國旗?」
-
might reveal more about someone than their taste in movies.
比起對電影的品味,這可能透露更多這個人的個性
-
And it doesn't make sense to weigh all things equally,
而每個人看事情的輕重大小都不同
-
so we added one final data point.
所以我們加入了最後一個資料點
-
For everything that OK Cupid asks you,
每一個 OK Cupid 問你的問題
-
you have a chance to tell us
你都可以告訴我們
-
the role it plays in your life,
它在你生活中扮演的角色
-
and this ranges from irrelevant to mandatory.
而選項是從「不相關」到「極重要」
-
So now, for every question,
所以現在,每一個問題
-
we have three things for our algorithm:
我們都有三筆資訊可以給我們的演算法:
-
first, your answer;
第一,你的答案
-
second, how you want someone else,
第二,你對別人答案的期望
-
your potential match,
也就是對於可能會跟你配對的人
-
to answer;
的答案的期望
-
and three, how important the question is to you at all.
第三,這問題究竟對你有多重要
-
With all this information,
有了全部的這些資訊
-
OK Cupid can figure out how well two people will get along.
OK Cupid 就可以算出這兩個人相處會多融洽
-
The algorithm crunches the numbers and gives us a result.
演算法會在數值運算後,給我們一個答案
-
As a practical example,
舉一個實際的例子
-
let's look at how we'd match you with another person,
讓我們來看看你和另一個人有多速配
-
let's call him, "B".
姑且叫他 B
-
Your match percentage with B is based on
你和 B 的速配指數是基於
-
questions you've both answered.
你們雙方都回答過的問題
-
Let's call that set of common questions, "s".
這些問題的集合叫做 s
-
As a very simple example, we use a small set "s"
舉一個非常簡單的例子, 我們用很小的集合 s
-
with just two questions in common
只包含兩個雙方都回答過的問題
-
and compute a match from that.
然後由它算出速配程度
-
Here are our two example questions.
舉例來說,他們回答了這兩個問題
-
The first one, let's say, is, "How messy are you?"
第一個,比如說:「你有多不愛乾淨?」
-
and the answer possibilities are
而可能的答案是
-
very messy,
「很髒亂」
-
average,
「普通」
-
and very organized.
及「很愛乾淨」
-
And let's say you answered "very organized,"
假設你的答案是「很愛乾淨」
-
and you'd like someone else to answer "very organized,"
而你期望別人也回答「很愛乾淨」
-
and the question is very important to you.
並且這問題對你來說「非常重要」
-
Basically you are a neat freak.
基本上你有潔癖
-
You're neat,
你愛乾淨
-
you want someone else to be neat,
你也希望別人愛乾淨
-
and that's it.
這是你的結果
-
And let's say B is a little bit different.
又假設 B 的回答有點不一樣
-
He answered very organized for himself,
他回答自己「很愛乾淨」
-
but average is OK with him
但別人回答「普通」
-
as an answer from someone else,
對他來說就可以了
-
and the question is only a little important to him.
而且這問題對他只有「些許重要」
-
Let's look at the second question,
接著我們來看第二個問題
-
it's the one from our previous example:
是我們先前說過的例子:
-
"Do you like to be the center of attention?"
「你喜歡成為眾人的焦點嗎?」
-
The answers are just yes and no.
而答案只有「是」或「否」
-
Now you've answered "no,"
假設你的答案是「否」
-
how you want someone else to answer is "no,"
而你希望對方回答「否」
-
and the questions is only a little important to you.
並且這問題對你只有「些許重要」
-
Now B, he's answered "yes,"
換 B,他回答「是」
-
he wants someone else to answer "no,"
而他希望對方回答「否」
-
because he wants the spotlight on him,
因為他希望焦點是在他身上
-
and the question is somewhat important to him.
而這問題對他「蠻重要的」
-
So, let's try to compute all of this.
好,讓我們試著來算看看
-
Our first step is,
我們的第一個步驟是
-
since we use computers to do this,
因為要用電腦計算
-
we need to assign numerical values
我們必須對不同答案如「蠻重要的」和「非常重要」
-
to ideas like "somewhat important" and "very important"
賦予相對應的數字
-
because computers need everything in numbers.
因為電腦必須透過數字才能運算
-
We at OK Cupid decided on the following scale:
在 OK Cupid 裡我們訂定了這樣的量表:
-
irrelevant is worth 0,
「不相關」是 0
-
a little important is worth 1,
「些許重要」是 1
-
somewhat important is worth 10,
「蠻重要的」是 10
-
very important is 50,
「非常重要」是 50
-
and absolutely mandatory is 250.
而「極重要」是 250
-
Next, the algorithm makes two simple calculations.
接著,演算法會進行兩個簡單的運算
-
The first is how much did B's answers satisfy you,
第一是 B 的答案有多符合你的期望
-
that is, how many possible points did B score on your scale?
也就是,B 在你的量表上會得到幾分?
-
Well, you indicated that B's answer
嗯,你在第一個愛乾淨的問題中
-
to the first question about messiness
表示 B 的答案
-
was very important to you.
對你非常重要
-
It's worth 50 points and B got that right.
它佔 50 分而 B 正好符合
-
The second question is worth only 1
而第二個問題只佔 1 分
-
because you said it was only a little important,
因為你說它只有些許重要
-
and B got that wrong.
而 B 答得不對
-
So B's answers were 50 out of 51 possible points.
所以 B 的答案在總分 51 分裡得到 50 分
-
That's 98% satisfactory.
這樣是 98% 的滿意度
-
It's pretty good.
相當不錯
-
And, the second question of the algorithm looks at
而演算法第二步要做的是
-
is how much did you satisfy B.
你有多符合 B 的要求
-
Well, B placed 1 point on your answer
嗯,B 認為你對整潔問題
-
to the messiness question
的答案佔 1 分
-
and 10 on your answer to the second.
而第二個問題的答案佔 10 分
-
Of those, 11, that's 1 plus 10,
總共是 11 分,也就是 1 + 10
-
you earned 10,
你得到 10 分
-
you guys satisfied each other on the second question.
你們雙方在第二個問題符合兩方的條件
-
So your answers were 10 out of 11
所以你的答案是 11 分裡得 10 分
-
equals 91% satisfactory to B.
也就是對於 B 來說 91% 的滿意度
-
That's not bad.
也是不錯
-
The final step is to take these two match percentages
而最後一步, 是把這兩個數字
-
and get one number for the both of you.
變成你們兩個速配指數
-
To do this, the algorithm multiplies your scores,
要完成這件事, 演算法會把你們的分數乘起來
-
then takes the nth root,
然後開 n 次方根
-
where n is the number of questions.
這裡 n 是問題的數目
-
Because s, which is the number of questions,
因為在我們例子的 s 裡
-
in this sample, is only 2,
問題數只有 2
-
we have match percentage equals
我們就算出速配指數
-
the square root of 98% times 91%.
是 98% 乘 91% 的開根號
-
That equals 94%.
也就是 94%
-
That 94% is your match percentage with B.
這 94% 就是你和 B 的速配指數
-
It's a mathematical expression
這是我們基於對於你們的了解
-
of how happy you'd be with each other
透過數學算式
-
based on what we know.
來表現出你們在一起會多快樂的方式
-
Now, why does the algorithm multiply as opposed to, say,
而,為什麼演算法要用相乘的
-
average the two match scores together
而不用相加的
-
and do the square-root business?
並且要取平方根呢?
-
In general, this formula is called the geometric mean,
一般來說,這個公式叫作幾何平均數
-
which is a great way to combine values
它是將範圍很廣
-
that have wide ranges
並表達不同特性的數據合在一起的
-
and represent very different properties.
一種很棒的方法
-
In other words, it's perfect for romantic matching.
也就是說,它對浪漫的配對來說是很完美的
-
You've got wide ranges
你會有很廣的數據
-
and you've got tons of different data points,
你也許多不一樣的資訊
-
like I said, about movies,
比如說,關於電影
-
about politics,
關於政治
-
about religion,
關於信仰
-
about everything.
關於所有事
-
Intuitively, too, this makes sense.
直覺來說,這也合理
-
Two people satisfying each other 50%
兩個人互相有 50% 的滿意度
-
should be a better match
應該會比
-
than two others who satisfy 0 and 100,
一人是 0% 另一人是 100% 來得好
-
because affection needs to be mutual.
因為感情是互相的
-
After adding a little correction for margin of error,
再加上一些邊界錯誤的修正
-
in the case when we have a very small number of questions,
就是說當問題數很少的時候的修正
-
like we do in this example,
像是我們這個例子
-
we're good to go.
我們就完成了
-
Any time OK Cupid matches two people,
每一次 OK Cupid 在幫兩人配對時
-
it goes through the steps we just outlined.
都經過了我們所講的那些步驟
-
First it collects data about your answers,
首先從你的答案收集資訊
-
then it compares your choices and preferences
然後用簡潔的數學方法
-
to other people in simple, mathematical ways.
來將你和其它人的偏好作比較
-
This, the ability to take real world phenomena and make them something a microchip can understand, is, I think, the most important skill anyone can have these days.
這樣把真實世界的現象變成微晶片能運作的一種能力,我認為是我們現今可以擁有最重要的技能
-
Like you use sentences to tell a story to a person,
就像是你用句子來向別人說故事一樣
-
you use algorithms to tell a story to a computer.
你會用演算法來對電腦說故事
-
If you learn the language,
如果你學會這種語言
-
you can go out and tell your stories.
你就可以把你的故事告訴別人
-
I hope this will help you do that.
這就是我希望幫助你達成的事情