人類計算 (Human Computation) - VoiceTube 看影片學英語

字幕列表影片播放

MALE SPEAKER: Today we're very pleased, very happy, to have
Luis Von Ahn here today, from Carnegie Mellon University.
His talk is on human computation.
Luis is a very new assistant professor in computer science
at the School of Computer Science at Carnegie Mellon
University.
He received his Ph.D. in 2005, and I'm told he was the
hottest new graduate on the market, with offers from just
about every university out there, including corporate
offers, too.
He received his B.S. from Duke University.
He received a Microsoft Research Fellowship Award.
His research interests include encouraging people to work for
free, as well as catching and thwarting cheaters in online
environments.
His work has appeared in over a hundred news publications
around the world.
New York Times, CNN, USA Today, BBC, and
the Discovery Channel.
Luis holds four patent applications and has licensed
technology to major internet companies.
Please join me in welcoming Luis Von Ahn.
[APPLAUSE]
LUIS VON AHN: Can you hear me now?
OK.
So, I want to start by asking a question to the people in
the audience.
How many of you have had to fill out a registration form
for something?
Like Yahoo, Hotmail, or Gmail, or some sort of web form where
you've been asked to read a distorted sequence of
characters or a distorted word such as this one?
How many of you found it annoying?
Awesome.
OK, well, that was part of my thesis.
That thing is called a CAPTCHA, and the reason it's
there is to make sure that you, the entity filling out
the web form, are actually a human, and not some sort of
computer program that was written to submit the form
millions and millions of times.
The reason it works is because humans--
at least non-visually impaired humans--
have no trouble reading distorted characters, whereas
computer programs simply can't do it as well yet.
More generally, a CAPTCHA is just a program that can tell
whether its user is a human or a computer.
OK, let me say that another way.
A CAPTCHA is a program that can generate and grade tests
that most humans can pass, but current computer
programs can not.
Notice the paradox here.
A CAPTCHA is a program that can generate and grade tests
that it itself cannot pass.
So in that way, CAPTCHAs are a lot like some professors.
[LAUGHTER]
Just to make things crystal clear, let me give you an
example of one of these programs that can generate and
grade tests that most humans can pass, but current computer
programs cannot.
Here's how the program works.
First, the program picks a random string of letters.
O-A-M-G, in this case.
Then the program renders the string into a randomly
distorted image, and then the program generates a test,
which consists of the randomly distorted image and the
question, "What are the characters in this image?"
CAPTCHAs are used all over the place, for all kinds of
things, and I could spend the next hour talking about all
the different applications of CAPTCHAs.
But since I don't want to do that, I want to illustrate one
of the applications through a little story.
So a few years ago, Slashdot--
which is a very popular website--
put up this poll in their site, asking which is the best
computer science graduate school in the United States?
This is a very dangerous question to ask over the web.
As with most online polls, IP addresses of voters were
recorded to make sure that each person could only vote,
at most, once.
However, as soon as the poll went up, students at CMU wrote
a program that voted for CMU thousands and
thousands of times.
The next day, students at MIT wrote their own program.
And a few days later, the poll had to be taken down with CMU
and MIT having, like, a gazillion votes and every
other school having less than 1,000.
I guess the poll worked in this case.
[LAUGHTER]
I'm just kidding.
But in general, this is a huge problem.
You simply cannot trust the results of an online poll,
because anybody could just write a program to vote for
their favorite option thousands and
thousands of times.
One solution is to use a CAPTCHA to make sure that only
humans can vote.
CAPTCHAs have many, many other applications.
Another one is in free email services.
For instance, there are several companies that offer
free email services--
Yahoo, Microsoft, Google--
and up until a few years ago, all of them were suffering
from a very specific type of attack.
It was people who wrote programs to obtain millions of
email accounts every day, and the people who wrote these
programs were usually spammers.
So if you're a spammer and you want to send spam from, say,
Yahoo, you run into the problem that each Yahoo
account only allows you to sound, like,
100 messages a day.
So if you want to send millions of messages a day
from Yahoo accounts, you have to own
millions of Yahoo accounts.
And this is why spammers wrote programs to obtain millions of
Yahoo accounts.
And the solution--
or one solution-- and this is what we originally suggested
to Yahoo-- was to use a CAPTCHA to make sure that only
humans can obtain free email accounts.
Now, since CAPTCHAs are used all over the place to stop
spammers from doing bad things, spammers have started
coming up with all kinds of dirty hacks to get around the
CAPTCHAs that are being used in practice.
So let me explain a couple of them.
Here's one.
I'm sure a lot of you have heard of this.
CAPTCHA sweatshops.
Spam companies actually are hiring people to solve
CAPTCHAs all day long.
And they are usually being hired in other countries where
the minimum wage is a lot lower, and this
is currently happening.
But there's at least two consolations.
First, it's at least costing them some.
So whereas before, they could get the accounts for free, now
it costs them a fraction of a cent per account, so they
can't get that many.
Second, CAPTCHAs are actually generating jobs in
underdeveloped countries.
[LAUGHTER]
So this is one dirty hack.
There's an even dirtier hack, and I'm sure a lot of you have
heard of it, and this is what some porn companies are
allegedly doing.
And I'm going to emphasize the word "allegedly." So, porn
companies also want to send spam.
They also want to break CAPTCHAs, and here's how they
are allegedly doing it.
They write a program the fills out the entire registration
form, say, at Yahoo.
And whenever the program gets to the CAPTCHA,
it can't solve it.
So what it does is it copies the CAPTCHA
back to the porn page.
Now, back at the porn page, there's a lot of people
looking at porn.
And suddenly, one of them gets this screen saying, "If you
want to see the next picture, you got to tell me what word
is in the box below." And you know what people do?
They type the word as fast as possible.
[LAUGHTER]
And by doing so, they are effectively solving the
CAPTCHA for the porn company bot.
That is, they're effectively obtaining a free
email account for them.
So pornographers, they're really, really smart.
So CAPTCHAs take advantage of human processing power in
order to differentiate humans from computers, and it turns
out that being able to do so has some very, very nice
applications in practice.
Now that I've told you about CAPTCHAs, now I can tell you
what this talk really is about.
This talk is not about CAPTCHAs.
This talk is about human computation.
Sort of the flipside of CAPTCHAs.
The idea is there's a lot of things that humans can easily
do that computers cannot yet do.
I want to show you how we can solve some of these problems
by just making good use of human processing power.
And I think the best way to introduce the rest of the talk
is with a little statistic, and the statistic is that over
9 billion human hours of Solitaire were played in 2003.
9 billion.
Now, some people talk about wasted computer cycles.
What about wasted human cycles?
Just to give you an idea of how large this number really
is, let me give you two other numbers.
First is the number of human hours that it took to build
the Empire State Building.
Turns out it took 7 million human hours to build the
entire Empire State Building.
That's equivalent to about 6.8 hours of people playing
Solitaire around the world.
Now, in case you don't think the Empire State Building is a
monumental enough task, let me give you another number.
The Panama Canal.
It turns out it took 20 million human hours to build
the entire Panama Canal, and that's equivalent to a little
less than a day of people play Solitaire around the world.
I want to show how we can make good use of these
wasted human cycles.
And that is what I mean by human computation.
In this talk, we're going to consider the human brain as an
extremely advanced processing unit that can solve problems
that computers cannot yet solve.
Even more, we're going to consider all of humanity as an
extremely advanced and large scale distributed processing
unit that can solve large scale problems that computers
cannot yet solve.
I claim that the current relationship between humans
and computers is extremely parasitic.
We're parasites of computers.
What I want to advocate for in this talk is more of a
symbiotic relationship, a symbiosis.
One in which humans solve some problems, computers solve some
other problems, and together we work to
create a better world.
[LAUGHTER]
OK, I'm getting freaky.
But more seriously, I want to talk about some problems that
computers cannot yet solve, and I want to show you how we
can easily solve a lot of these problems by just making
good use of human processing power.
The first problem that I'm going to talk about is that of
labeling images with words.
So the problem is as follows.
When inputting an arbitrary image, we want to output a set
of key words that properly and correctly describe this image.
[LAUGHTER]
As you should all probably know, this is still a
completely open problem in computer vision and artificial
intelligence, in the sense that computer programs simply
can't do this.
However, a method that could accurately label images with
words would have several applications, one of which
you've probably already seen, and that is image
search on the web.
So Google, for instance, has Google Images.
You can go there, type a word like "dog," and get back a lot
of images related to the word "dog." Now, it is the case
that there's no computer program out there that can
tell you whether an arbitrary image from the web contains a
dog or not, so the way Google Images works-- and image
search on the web works, roughly--
is by using file names in html text.
So if you search for "dog," you get back a lot of images
named dog.jpg or dog.gif, or that have the word
"dog" very near them.
Of course, the problem with this method is that it doesn't
always work very well.
For instance, this is not any more, but it used to be the
first page of results for the query "dog" on Google Images.
There is an image of a rabbit, there.
There's a guy in a blue suit.
What the hell?
But if we have methods such that for every image on the
web could give us accurate textual descriptions of those
images, we could potentially improve the accuracy of image
search on the web.
Such a method would have many other applications.
Another one is inaccessibility.
So it turns out that the majority of the web is not
fully accessible to visually impaired individuals, and one
of the biggest reasons is images.
So blind people actually surf the web.
The way they do it is they use screen readers, programs that
read the entire screen to them out loud.
But whenever a screen reader reaches an image, it can't do
anything other than read the caption of that image.
Of course, the majority of images on the web don't have
proper captions associated to them.
So again, if we had a method such that for every image on
the web could give us accurate, textual descriptions
of those images, we could improve the
accessibility of the web.
Such a method would have many other applications, and so
what we want-- and what I'm going to tell you right now--
is a method that can label all images on the web.
Not only that, it's a method that can label all images on
the web in a way that's fast and cheap.
How are we going to do it?
Well, we're going to use humans, but we're going to use
them cleverly.
So normally, if you ask people to label images for you, you'd
have to pay them to do so.
And if you wanted to label all images on the web by paying
people, you'd have to pay a lot of money.
And even if you had a lot of money, if you wanted to label
all images on the web by paying people fast, you'd have
to find a lot of people who were willing to label images
for living.
Good luck with that.
My approach is much better.
Rather than paying people to label images for me, I get
them to want to label the images for free.
And in fact, they want to label the images so much that
in some cases they're even willing to pay me to label the
images for me.
How do I do that?
Well, I have an extremely, extremely enjoyable,
multiplayer online game called the ESP game that people
really, really like to play, and as people play, sort of as
a side effect, they actually label images for me.
Now, the ESP game has two very nice properties.
First, as people play the game, the labels that they
generate for images are accurate even if the players
don't want them to be so.
Second, as people play the game, they actually label
images very, very fast. And in fact, using a conservative
estimate, I'm going to show you later in the talk that if
the ESP game is put on a popular gaming site, we could
actually label all images on Google Image Search
in just a few weeks.
So how does the game work?
Well, first and foremost, the ESP game is a two player
online game.
So there's a web site.
You can go there to try to play the website.
Whenever you go to the website, you get randomly
paired with somebody else wanting to play the game.
That's your partner.
Now, you're not allowed to communicate with them, and
you're not told who they are.
It's just a complete stranger from the web.
And the goal of the game is for both you and your partner
to type the exact same word, given that the only thing you
two have in common is an image.
So you can both see the same image.
You know you can both see the same image, and now you're
told to type whatever the other guy's typing.
Turns out that what people do, the best strategy, is just to
type a lot of words related to the common image.
So basically, both players are going to be typing a lot of
words related to the common image until one of player
one's words is equal to one of player two's words.
They agree, they get points, and then they get happy.
That's the basic idea of the game.
Now, this word that the two players agree on is usually a
very, very good label for the image, because it comes from
two independent sources.
Let me give you a better idea of the basic move of the game.
Imagine you have two players, player one and player two.
And they're both paired, so they can both
see the same image.
And now they're told, "Type whatever the other guy's
typing." Notice, the players are not told, "Label the
image," or even what labeling an image might mean.
They're just told, type whatever
the other guy's typing.
So say at first, player one types "car," player two types
"boy." It's not the same word, so the game still goes on.
Say then, player one types "hat" and then "kid." Still
none of player one's words is equal to one of player two's
words, so the game's still going on.
By the way, player one cannot see any of player two's
guesses, and vice versa.
So they're just typing words completely independently,
until, say, player two types a word that player one had
already entered.
They agree and then they get a lot of points.
This is the basic move of the game.
The actual game looks a little more like this.
Basically, both players have a certain amount of time to
agree on as many images as they can.
So in 2 and 1/2 minutes, they have to agree on as many
images as they can.
That's basically the game.
Each time they agree on an image, they get a certain
number of points.
There's also a thermometer at the bottom that measures how
many images the two players have agreed on, and if you
fill the thermometer, you get like a gazillion points.
There's also a pass button, so players can agree to pass on
difficult images.
And another really important component of the game is this
thing we call "taboo words." If you've ever played the game
Taboo, you should be able to guess what these are.
Taboo words are words that are related to the image the
players cannot use when trying to agree on that image.
So in this case, for instance, you can't use "hat" or
"sunglasses," or any plural or singular of these words.
Now, where do taboo words come from?
They come from the game itself.
The taboo words are words that two other players have already
agreed on for this particular image.
So the nice thing about taboos words is that they guarantee
that each time an image passes through the game, it gets a
brand new, different label.
The other nice thing about taboo words is they make the
game more difficult, and therefore more fun.
I'm not talking about fun.
Is this game fun?
Well, amazingly, it really is a lot of fun.
So far, we've gotten over 15 million agreements-- that's
over 15 million labels-- with about 75,000 players.
Let me say that another way.
75,000 players have given us over 15 million agreements.
That means that on average, each player is playing a lot.
We have many people that play over 20 hours a week.
That's like a full time job.
We've had playing streaks that are longer
than 15 hours straight.
[LAUGHTER]
I feel a little bad about this.
So by now, the game has a mechanism that if you've been
playing for longer than 15 hours, it will cut you off.
And as a promise to my department head, it's 10 hours
if you're from a .edu domain.
[LAUGHTER]
So, so far, over 15 million agreements.
What if you wanted to label the entire web?
Well, 5,000 people playing the game simultaneously could
label all images on Google Images in about two months.
The striking thing here is that 5,000 is not
a very large number.
In fact, individual games in popular gaming sites, such as
Yahoo, Polo.com, or MSN average over 5,000
players at a time.
So if you put the ESP game on a popular gaming site, you
could potentially label a lot of the images on the web in
just a few months.
A few more things about the game.
There's also a single player version of the game.
It's important to have a single player version of the
game for several reasons.
For one of them is the number of people playing the game is
not always even.
But also, whenever a player drops, it's important to just
basically have them keep on playing the single version
player of the game.
And how do you get a single player game?
Well, you simply can pair up a single person with a
prerecorded set of moves.
The idea is as follows.
Whenever you have two people playing, you record everything
that they do and when they do it.
So you record all the words they enter, along with timing
information.
And whenever we want to have a single player play, we simply
pair them up with a prerecorded set of moves.
So that single player is playing with somebody else,
just not at the same time.
One nice thing about this, notice this actually doesn't
stop the labeling process.
That single player is playing with somebody else, just not
the same time, so everything that I've said about labeling
remains true.
In fact, we can even go one step further.
We can do the zero player game.
We can also pair up prerecorded games with each
other to get more labels, and if you count all the extra
labels that the ESP game has collected so far, you get that
so far the ESP game has collected over 39 million
labels for images on the web, if you count all these.
Now, one thing that some of you may be wondering about is
what about cheating?
So for instance, could you try to cheat to
screw up the labels?
Something like, my office maid and I could try to log in to
the game at exactly the same time.
Maybe we'll get paired with each other, and if we get
paired with each other, we can agree on any word we
want for any image.
Or even worse, somebody could go to Slash.dot and type,
"Hey, everybody, let's all play the ESP game, and let's
all agree on the word 'A' for every image." Could happen.
Fortunately I've thought about this, and the ESP game has
several mechanisms that fully prevent cheating.
Let me tell you a few of the things that we
do to prevent cheating.
Here's one.
At random, we actually give players test images.
These are just images that are just there to test whether the
players are playing honestly or not.
And what they are is they are images for which we know all
the most common things that people enter for them.
And we only store a player's guesses and the words they
agree on if they successfully label the test images.
So if you think about it, in a way, this sort of gives a
probabilistic guarantee that a given label is not corrupt.
What's the probability that a given label is corrupt, given
that the players successfully label all
of their test images?
And this probability can be boosted by using the next
strategy, which is repetition.
So we only store a label after n pairs of players have agreed
on it, where it is a parameter that can be tweaked.
So every now and then, we actually delete all the taboo
lists for the images, and we put the image back into the
game afresh.
And we only store a label after n pairs of players have
agreed on it.
So if we let x be the probability of a label being
corrupt given that players successfully labeled all of
their test images, than after n repetitions, the probability
of corruption is x to the n.
This is assuming that the n repetitions are independent of
each other, but if x is very small, x to the n is really,
really small.
I'm going to say so far, we've collected lots and lots of
labels, and we have not seen cheating be able to screw up
our labels.
In fact, the quality of the labels that the ESP game has
collected so far is very high.
Let me now show you some search results.
Let me show you what happens when we search for the word
"dog," for instance.
Here's some dogs.
More dogs.
More dogs.
More dogs.
And I could go on forever.
Here's what happens when you sit for "Brittney Spears." You
got to show this whenever you show search results.
Here's what happens when you search for "Google." I
prepared this for this talk.
You get the founders.
And one really nice thing about this is that this slide
constitutes a proof that the word "Google" and the word
"search" really are synonyms. On input that,
people agreed on Google.
OK.
So let me now show you some sample labels.
So what I'm going to show you right now are some images,
along with the labels at the ESP game has collected for
them so far.
So here's an image, and here are the labels that the ESP
game has collected for it so far.
By the way, these could be ordered in terms of frequency.
They're not.
This is just the list of all the words that the ESP game
has collected for this image so far.
You should notice two things about this list of words.
First of all, it's extremely accurate, meaning all of these
words actually make sense with respect to the image.
Second, it's extremely complete, meaning almost
anything that you can imagine to describe something in this
image is in this list. Not everything, but
a lot of the things.
And in fact, this is true in general of the word lists
generated by the ESP game.
They are as accurate and as complete as those generated by
participants who are just paid to label images.
Let me show you more sample labels.
Here's another image.
Anybody know who this is?
Walter Matthau.
He's an actor.
And just to prep you for one of the labels, Walter Matthau
was in the movie Dennis The Menace, and he played the
character of Mr. Wilson.
Some of the labels that the ESP game has collected for
this image so far are--
[LAUGHTER]
So that first one seems a little wrong, but actually, if
you look carefully, you realize it's
really not that bad.
I like to tell people the ESP game has uncovered a major
conspiracy.
Now that we're on this topic, here's another image.
By the way, I have no political affiliations
whatsoever.
I'm not a US citizen, and what I'm about to show you are
simply the scientific results of what happens when you put
this image on the ESP game.
So some of the labels that the ESP game has collected for
this image so far are--
[LAUGHTER]
That last one, can you imagine how awesome the two players
must have felt when they agreed on that last one?
It must have felt great.
And in fact, this brings us to one of the reasons why people
really like the ESP game.
It's because they can feel a special connection with their
partner, especially when they agree on an off the wall word
like "yuck" for an image of President Bush.
In fact, it gets even better.
A lot of the emails we get actually suggest that players
feel a very, very, very special
connection with their partner.
Players like playing with partners of the
opposite sex better.
They want to know whether their partner's of the
opposite sex, and a lot of the emails say things like, "My
partner and I, we look at the world in exactly the same way.
Can you tell me their email address?" This is great
because, because I'm going to be rich soon.
More seriously, this brings us to the question of why do
people like the ESP game?
I mean, it's true that the game was designed to be
enjoyable, but what are the reasons that people like the
ESP game so much?
And to address that question, let me show you some of the
most common things that people have said of why
they like the game.
Here's what one person said.
By the way, I'm just going to let you read.
So this is the sense of connection with your partner.
Here are some of the other most common
things that people said.
[LAUGHTER]
That last one, if you think about it, it
makes perfect sense.
Although that was not expected, it
makes perfect sense.
The ESP game helps people learn English, because you've
got an image, you've got to say what it is in English.
And that brings up the question, could you have the
ESP game in multiple languages?
The answer is sure, but I don't want to talk about that.
So that's some of the most common things that people have
told us of why they like the game.
In addition, let me show you some of the things that people
have said about the game in blogs.
It was, at some point, in literally hundreds of blogs.
Here's a couple of them.
Here's what one guy said.
Sense of achievement.
But the best is the way this guy ends.
[LAUGHTER]
Here's another one.
So this guy actually likes the concept of the game, but
again, the best is the way this guy ends.
So not everybody likes their partner, and it completely
depends on whether you do well with them or not.
If you do well with them, you fall in love with them.
If you do badly, you think they're an idiot.
Of course, you're not the idiot.
They're an idiot.
Even though the game is symmetric.
But in addition to all those things that people have told
us, we continually do measurement to try to figure
out what are the things that make people play longer.
So let me explain one of these measurements to you.
At some point in the history of the game, I added this very
small message in the corner of the screen alerting you
whether your partner have already
entered a guess or not.
It's a very tiny message.
This is just a magnification of it.
It just tells you when your partner has already entered a
guess or not.
When this was added to the game, it wasn't added to all
the players.
It was just added to a small, random subset of the players.
And then we measured whether the players who had this
feature played longer than those who didn't.
And it turns out that those who had this feature played a
whopping 4% longer than those who didn't.
Now, you might not think that 4% is very large, but
actually, it's a statistically significant difference, and if
you think about it, just a very tiny message in the
corner of the screen makes people play 4% longer.
Now, in a way, the ESP game is kind of like an algorithm.
Much like an algorithm, it has an imput/output behavior.
Its input is an image.
It's output is a set of key words that properly
describe the image.
Much like an algorithm, you can analyze its efficiency.
You can prove that its output is not corrupt with high
probability, et cetera.
So what I want to do now is I want to refer to all games--
like the ESP game, that are kind of like algorithms--
I want to refer to them as games with a purpose.
And the idea that I want you to have in your mind is that
games with a purpose is like running a computation in
people's brains instead of silicon processors.
And what do I do now is I want to give you other examples of
games of a purpose.
So the next problem that I'm going to talk about is that of
locating objects in images.
On input an arbitrary image, the ESP game tells us what
objects are in the image, but it does not tell us where in
the image each object is located.
So what we would like to know is, we would like to know,
yes, there's a man in the image, but the
man is right there.
There's a plant in the image, but the plant is right there.
And not only that.
We would like to know precisely which pixels belong
to the man, which pixels belong to plant, et cetera.
And we would like to have this information for a large
percentage of the images on the web.
If we could have this information, we could do a lot
of really cool things.
For instance, we could have an image search engine were the
results are highlighted.
It tells you this is where the man is in
each one of your images.
That would be pretty cool.
But even better, if we have this information for a lot of
images, we could use this for training computer vision
algorithms. So computer vision has advanced significantly
over the last 20 or 40 years, but so far, it hasn't been
able to create a program that can, with high probability,
figure out where in the image each object is located.
And one of the major stumbling blocks is the
lack of training data.
But if we had this data for a lot of images on the web, we
could use it to train better computer vision algorithms.
So this is what the next game is going to do, and the next
game is called Peekaboom.
And here's how it works.
It's a two player game.
Much like the ESP game, both players don't know anything
about each other, and they can't
communicate with each other.
At the beginning of every round-- oh, by the way, the
player on the left, we're going to call him "Peek." The
player on the right, we're going to call him "Boom" So
Peek and Boom.
At the beginning of every around, Boom gets an image
along with a word.
In this case, it's the image of a butterfly and the word is
"butterfly." That image word pair comes directly
from the ESP game.
Peek, at the beginning of every round, gets nothing.
Just a completely blank screen.
And the goal of the game is for Boom to get Peek to guess
the word "butterfly." And the only thing that Boom can do to
help Peek guess the word "butterfly" is he can take his
mouse, put it somewhere in the image, and click.
And whenever Boom clicks, a circular area around that
click is revealed to Peek.
The actual circular area is a lot smaller
than the one I revealed.
I just didn't want to go through all the clicks.
But basically, when Boom clicks, a circular area around
the click is revealed to Peek.
And then Peek, given only the circular areas, has to guess
what word Boom is trying to make them guess.
Whenever Peek guesses the correct word, both players get
a lot of points, and then they switch roles.
Peek becomes Boom, and Boom become Peek.
Now notice, in this case the word was "butterfly," so Boom
clicked on the butterfly.
But had the word been "flower," Boom would have
clicked on the flower.
So by just watching where Boom clicks, we get information
about where each object is located in every image.
By the way, I'm brushing over a lot of details.
For instance, there's also hints.
So Boom can give hints to Peek about whether the word is a
noun, is it a verb, is it text in the image, et cetera.
Now, just to make things more clear, let's play a couple
rounds of Peekaboom.
So you guys have to guess what I'm trying to make you guess.
Here we go.
So now?
"Bush," awesome.
OK, you got it.
Bush.
Here's another one.
It's a verb.
"Pick." OK, very nice.
I love this image.
So, imagine we were back here, and I gave
you a different hint.
I told you it was a noun, and not only that, I started
pointing there.
What would you say this?
Hair, exactly.
So this is another mechanism of Peekaboom, and it's
something called pings.
So not only can Boom reveal part of the image.
After something has been revealed, he can also point to
somewhere, saying it's this, it's this.
This gives us extra information about where each
object is located in the image.
So this is the basic idea of Peekaboom.
This is what the Peekaboom screen looks like for one of
the players.
This is for the Boom player.
And now the first question is, is this game fun?
Well, it turns out it really is a lot of fun.
By the way, the statistics I'm going to show you right now
are a little outdated.
This is just for the first four months of gameplay.
So in the first four months of gameplay, 27,000 players gave
us 2.1 million pieces of data.
By a piece of data, I mean an image along with a word
correctly analyzed by a pair of players.
In the first 10 days after release, actually many people
played over 120 hours.
That's an average of over 12 hours a day.
So it's a lot of fun.
Here's the top scores list of Peekaboom, just to put things
in perspective.
This is for the first four months of gameplay.
Each time you play Peekaboom, you get on average 800 points.
So the top player there has 3.3 million points.
Even the lowest player in this list, in the first four
months, have played at least 270 hours of gameplay.
So people really love this game.
Now what about the data that it produces?
Is it any good, or how do we get
good data out of Peekaboom?
So let me explain how we get good data out of Peekaboom.
This is an image of Ronald.
The word is "Ronald." By the way, I love this image.
And the last three images were collected by searching for the
word "funny" using the ESP game.
So we get an image of Ronald, a word "Ronald," here's what
we do to get good data out of this.
We give the same image word pair to a
bunch of pairs of players.
And from each pair of players, we get a region of the image
that is related to the word.
Now we take all of these regions and intelligently
combine them, and get a really good idea of where the object
is located in the image.
And on top of that, we can add sort of where the pings are to
get more information about what the most
salient parts are.
And we can go even one step further.
We can take this information and combine it with image
segmentation algorithms to get pretty much the precise
outline of where the object is in the image.
Now I'll, say, this doesn't work for all
objects in the images.
It works like that perfectly for about 50% of the objects
in the images that we have data for.
For the rest, it works mostly, but it can miss
like a foot or something.
But even without using segmentation, we could just
use the Peekaboom data in a really, really boneheaded way
to come up with a search engine in which the results
are sort of highlighted.
And we've done this.
We have a search engine where you can search for "man,"
"dog." And for each image, it tells you here's the man,
here's the dog, here's the man, here's the dog.
And more man, more dog.
OK.
Forget about Peekaboom.
Brand new gain, Verbosity.
So this next game that I'm going to talk about, by the
way, has not yet been released, so I'm not going to
be able to show you any statistics.
But I'm just going to quickly explain what the idea is.
So what does Verbosity do?
The idea is it collects common-sense facts.
So what's a common-sense fact?
Here's an example of a common-sense fact.
Water quenches thirst. It's a true fact
that everybody knows.
Here's another common-sense fact.
Cars usually have four wheels.
Now, the thing about common-sense facts is that it
is estimated that each one of us has literally hundreds of
millions of them in our head.
And these are what allow us to act normal and navigate our
world successfully.
The other thing about common-sense facts is that
computers don't yet have them.
But if we could somehow put common-sense facts into
computers, we could potentially make them more
intelligent.
And I'm not even talking about making computers as
intelligent as humans.
Just a little more intelligent.
Like for instance, transforming our search query
into something better, that works better, or
something like that.
So if we could somehow collect a lot of common-sense facts
and put them into a computer, we could potentially use this
to make computers more intelligent.
And in fact, there's been a lot of projects that have
tried to do this, including one at MIT, and so far they
haven't been able to collect enough common-sense facts in
order to really make a difference, because the
process of entering common-sense facts into a
computer is extremely tedious.
So we're going to turn this into a game.
So for the next game that I'm going to talk about, the
input-output behavior of this game is as follows.
On input a word, this game is going to output a set of
common-sense facts about that word.
By the way, I'm oversimplifying here.
These common-sense facts are not just going to be
common-sense facts in English.
They're going to have some structure to them.
So there's going to be logical operators
inside them, et cetera.
So this is the input-output behavior.
On input a word, it's going to give common-sense
facts about that word.
And the way the game is going to work--
game called Verbosity--
and the way it's going to work is as follows.
It's a two-player word guessing game.
There's two players, a Narrator and a Guesser.
Same idea as the ESP game.
Basically, both players can't communicate with each other.
They don't know anything about each other.
At the beginning of every round, the Narrator gets a
word and has to get the Guesser to guess that word.
And what the Narrator can do to get to Guesser to guess
that word is he can pick one among many sentence templates
that they have. Which sentence templates are available to
them at the time vary depending on the word.
So he can pick one among many sentence templates, and fill
it with an appropriate word.
What's an appropriate word is a word that's not "milk," and
it's also a word that fits in grammatically with the
sentence template.
Whenever the sentence template is filled in,
it's sent to the Guesser.
Then the Narrator can pick another sentence template,
fill it with an appropriate keyword, and
send it to the Guesser.
And the Guesser, given enough hints about it, eventually has
to guess what word it is, and whenever the Guesser guesses
the correct word, both players get points.
The way we get common-sense facts out of this game is by
just watching what the Narrator says for each word.
By the way, I'm brushing over a lot of
details for this game.
This is just the basic idea, so high level idea is it's a
two-player game.
Player one and player two.
At the beginning of every round, player one gets a word,
and because of the rules of the game, has to give some
common-sense facts about the word.
Then those common-sense facts are sent to player two, and
player two, given only the common-sense facts, has to
guess what word player one got as input.
And if player two can guess the correct word, both players
get points.
This is the core mechanism of Verbosity.
Now, I want you to notice two things
about this core mechanism.
First, it's fun.
This is very similar to the core mechanism of a lot of
popular party games.
Basically, just word guessing games.
Second, this core mechanism actually gives output that is
already, in a way, verified.
Notice, we're getting all the common-sense
facts from player one.
But what's player two doing?
In a way, player two is verifying the output.
Because if player two can guess the word given only the
common-sense facts, then those common-sense facts must have
something to do with the word.
So in a way, it's giving output
that is already verified.
And the same core mechanism is exactly the same core
mechanism that was used in Peekaboom.
So in the case of Peekaboom, it's a two-player game.
Player one and player two.
At the beginning of every round, player one gets an
image along with a word, then has to give a region of the
image that is related to the word.
Then that region is sent to player two, and player two,
given only the region, has to guess what word
player one got as input.
The same mechanism as Verbosity.
And again, it's fun, and also gives output that is, in a
way, verified.
We're going to call all games that satisfy this mechanism,
we're going to call them asymmetric verification games.
So this is a general mechanism for building
games with a purpose.
So in general, for an arbitrary input-output
behavior, we could define a game as follows.
It's a two-player game.
We give the input to player one and have
them give an output.
Then we send the output to player two, and given only the
output, player two has to guess what
input player one got.
If player two can guess the correct input, both players
get points.
This mechanism has two very nice properties, that for a
lot of input-output behaviors, it's fun, and also, it gives
output that is, in a way, verified.
Of course, this doesn't work for all input-output
behaviors, but it works for a large class of them.
And these are asymmetric verification games, and it's
asymmetric because both players are doing something
slightly different than each other.
And it's also asymmetric, as supposed to symmetric
verification games, where you've already seen an example
of a symmetric verification game, and that's the ESP game.
So this is another general mechanism for creating games
with a purpose.
So for an arbitrary input-output behavior, you can
give both players the same input, and ask them to guess
what output the other player is going to give.
So if they both give the same output, they get points.
Again, this mechanism is fun for a lot of input-output
behaviors, and also has the property that the output it
gives is, in a way, verified because it comes from two
independent sources.
And now, we can start looking at the differences between
symmetric and asymmetric verification games.
So for instance, symmetric verification games, I claim,
put a constraint on the number of inputs per output.
The number of outputs per input, sorry.
If a given input has too many outputs, than a symmetric
verification game is never going to work, because both
players are never going to agree on the same output.
Asymmetric verification games put a constraint on the number
of inputs that yield the same output.
If there's too many inputs that yield the same output,
then given only the output, you'll never be able to guess
what input it came from.
I'm going to finish, now.
Hopefully, I've been able to convince you that there's a
lot of power into looking for clever ways of
utilizing human cycles.
In fact, if you think about it, this talk hints at a
paradigm for dealing with open problems in artificial
intelligence.
If you have something that you really can't solve in
artificial intelligence, then maybe you can turn the problem
into a test that distinguishes humans from computers.
Turns out that being able to do so has some very nice
applications in practice.
Or alternatively, maybe you can turn the problem into a
game, in which case you don't even need to solve your
problem anymore.
People will solve it for you.
One nice thing about this whole research agenda is that
it provides a much better motivation for
the movie The Matrix.
If you think about it, the motivation for the movie was
that in the future, computers become a lot more intelligent
than humans.
But rather than killing us, they actually have to keep us
around, because we generate power.
That makes no sense.
A much better motivation would be in the future, computers
become a lot more intelligent than humans.
But rather than killing us, they actually have to keep us
around, because there's a couple of problems that we can
solve that they cannot yet solve.
My ultimate research goal is to transform our human
existence to just eating, sleeping, drinking, playing--
never mind.
[LAUGHTER]
Www.captcha.net.
Www.espgame.org.
Peekaboom.org, and that's it.
Thank you.
[APPLAUSE]
Yes?
AUDIENCE: Does it concern you at all that the fact that
you're using a game will automatically give you a very
biased population of people that are giving us to problems
we want answers to?
And this population of people are the people that have way
more time on their hands, and are not motivated to maybe get
a job or do something [UNINTELLIGIBLE]?
[LAUGHTER]
LUIS VON AHN: Very good question.
It's true that the population is biased.
There's no question about that.
But for a lot of really simple things, I mean,
anybody can do it.
But it's true that the population is biased.
That's definitely true.
AUDIENCE: Have you seen any results?
LUIS VON AHN: I can tell you that the population is biased,
but I have not seen anything that really can tell me
because we're using gamers, it's like, this is happening
instead of the general population.
I have not seen that.
Yes?
AUDIENCE: I have a concern with asymmetric games where
the input is very similar to the [UNINTELLIGIBLE].
For example, when you said milk is close to cereal.
It's like a fraud question.
What if someone types in milk and I come up with pail--
P-A-I-L. I think it would be very obvious for his partner
to guess which question to ask.
LUIS VON AHN: Sure.
I didn't mention a lot of the mechanisms that we use to stop
that sort of cheating, but there's a lot of mechanisms.
For instance, we don't let them type anything that's not
a dictionary word.
Second, that word has to fit in with the template,
grammatically.
But still, I mean, there's a lot of mechanisms that try to
prevent that.
But you're right, that's a concern.
AUDIENCE: So I think the popularity of a lot of games
and these games in particular are [UNINTELLIGIBLE].
They're novel and different.
This is a new thing, let's try it out.
We might spend 100, 120 hours on this.
There was a site I remembered called Am I Hot or Not?
a few years ago.
Maybe I'm confessing something I shouldn't confess.
But you spend a few hours.
And 5 years from now the game won't exist. The question is,
if you view this as a strategic shift in how we use
human cycles, you're kind of hindered by the fact that this
will probably die out within a few months.
LUIS VON AHN: The answer to that is yes and no.
So, there are games whose popularity lasts for
thousands of years.
And there's a lot of these gaming sites that have games
that the popularity only last like six months or a year.
And what they do is very simple.
They have the same game concept and just redress it
with another name and something else, and all the
people come back.
This is also well known to nightclub designers.
Just change the name.
But it is true that popularity does die, but that is
completely game-dependent.
Some games, the popularity lasts longer than others.
So the ESP game has been running for well over two
years now, and the popularity has not died.
I mean, there was definitely an initial surge,
but it has not died.
So the amount of time that it works varies, and hopefully we
can find games that last for thousands of years.
Yes?
AUDIENCE: Towards the beginning of the talk, you
talked about accessibility and how vision impaired people
[UNINTELLIGIBLE] screen reader.
But I don't really see how these games
close the loop on that.
LUIS VON AHN: That's a very good question.
The way I explained, the ESP game only gives you keywords.
That's not quite enough for accessibility.
It's better than nothing, but it's not quite enough.
AUDIENCE: But you're not putting them
back into the websites.
LUIS VON AHN: Very good point.
I see what you're talking about.
But that's an engineering problem.
You could actually do that just with a server that they
connect to.
FEMALE SPEAKER: Plug an extension to the browser.
Something like that.
LUIS VON AHN: Sure.
FEMALE SPEAKER: Some people are better than others in any
game, so can you take people from the opposite end of the
spectrum, and there will be people who try
to disect your brain.
And the really bad people, you try to see how they improve.
LUIS VON AHN: I don't understand.
Say that again?
AUDIENCE: There's a spectrum of ability in any game.
So we can look at either end of a spectrum, and find the
really good people, and study what algorithm they use.
The really bad people, when they improve, see if
[UNINTELLIGIBLE] to your algorithm.
LUIS VON AHN: Right.
You can do that.
Yes, I agree.
Yes?
AUDIENCE: So you said you've running ruinning this game for
two years now, which means you must have an
obscene amount of data.
LUIS VON AHN: Yes and no.
I do have an obscene amount of data, but I recycle the
images, because I just don't want to have that many images.
So there are 39 million labels, but it's
not that many images.
AUDIENCE: You said the facts you get out of Verbosity are
not simply English sentences, but have some more logical
structure to them.
I wonder if you could say a few more words about that.
LUIS VON AHN: The reason is because of the templates.
We have templates.
We don't just let people free flow write English.
We have templates.
So out of those templates, we know things like, well, this
is for purpose.
So things like that.
AUDIENCE: Let's say I have a really boring job, like I'm
looking for defects in a manufacturing process.
How do I turn that into a fun game?
[LAUGHTER]
LUIS VON AHN: That's a very good question.
That would be really cool if we could figure out how to do
that for everything.
I don't know how to do that for everything.
I don't even know if it's possible to do it for
everything.
But that'd be really cool if we could figure out how to do
it for everything.
I don't know if it's possible to do it for everything.
AUDIENCE: From an ethical point of view, is there any
problem that people would probably be spending their
work hours playing the game, rather than
their free time hours?
And so you're not really gaining any productivity in
society as a whole.
LUIS VON AHN: Well, depending.
I mean, you're right about that.
But imagine we could turn everybody's work into
something fun.
That'd be really cool.
So, depending.
But one thing I should say about ethical is all these
games, they don't try to trick you into doing anything.
I mean, everybody knows what the purpose is.
AUDIENCE: Maybe "ethical" is the wrong word.
LUIS VON AHN: Right.
Yes?
AUDIENCE: A lot of these games are very good at getting basic
facts out of people.
Have you thought about how to get stuff that's a little bit
more nuanced?
Like if you leave milk in the fridge for three weeks, it's
going to go bad?
LUIS VON AHN: That's a very good question.
I mean, it depends a lot on the particular domain.
I don't know how to do it in general, but for instance, for
images, I can tell you.
The ESP game for images, most of the stuff that you get out
of the general ESP game is very general stuff.
I mean, the first word is going to be like "dog." Then
once "dog" becomes a taboo word, it's probably going to
be the breed of the dog, or something.
But very generally, usually things that everybody knows.
If you want to start getting things that only a few people
know, then you can do a few things.
So for instance, you can have people tell you what they want
to see images of.
So for instance, I like cars.
Can I see images of cars?
Then I'll be an expert on that sort of thing, and then you
can do that better.
Or you can use collaborative filtering
to try to give people--
you figure out what they're good at, and you give them
more images like that.
And so you can start getting better things like that.
But yeah, that's a very good question.
Yes?
AUDIENCE: It seems like you could also use these games to
solve problems that computers are already good at solving,
like you could have people add up numbers,
of things like that.
But those games likely would not be very fun.
LUIS VON AHN: They might be.
So like Sudoku.
AUDIENCE: I guess my question is-- right, but that's like a
constraint propagation problem that is a little
bit harder to solve.
LUIS VON AHN: Sure.
But given that Sudoku's human solved, computers are a lot
better at those.
AUDIENCE: Have you or anyone thought about the cognitive
aspects of games that are fun in this
model, versus that aren't?
And the computational models that are associated with it?
It seems like there's a lot of human
cognition interests there.
LUIS VON AHN: Yeah, definitely.
So part of the problem--
I should say two things.
There's a lot of research on trying to define and figure
out how to make things more fun.
Non-general computational things, but just how to make
games more fun.
But nobody really knows the answer to this.
I mean, this is an open problem.
AUDIENCE: In the future, if the market becomes competitive
do you think you'll have to start paying people money?
LUIS VON AHN: I don't know.
AUDIENCE: It depends on how much you're
making out of that.
LUIS VON AHN: Yeah, I don't know.
Yeah?
AUDIENCE: In the asymmetric verification games, how do you
eliminate when the first person makes a mistake in the
net, that mistaken output is sent to the second player, and
then after more output from the first guy that's then sent
to the second guy, they get the right answer.
How do you know what--
LUIS VON AHN: The way to do that is by using the
single-player game.
We take all these facts that we get and treat them all
separately, and sometimes you're just playing with a
computer and we're giving you certain facts that
we want you to verify.
And eventually, you just try to intersect which ones are
good and which ones are not, and you try to figure out.
Yeah, all the way in back.
AUDIENCE: How and when is the concept of the game presented
to the players?
LUIS VON AHN: How and when?
AUDIENCE: Yeah.
Like before they start playing or after they finish, you just
say, oh, by the way--
LUIS VON AHN: Oh, no.
Before.
Beforehand.
Beforehand.
Yeah.
Yes?
AUDIENCE: With the security issues now, with things like
Clicker or Picasa Web, where the images are controlled by a
specific entity?
LUIS VON AHN: What do you mean?
AUDIENCE: I could see implications coming from this
such that, you know, well, we want to provide all these
results, but we're going to basically try and have a
monopoly on the best images available.
Which seems kind of anti--
I don't know.
It just seems like it would be more--
people trying to get more control over content.
LUIS VON AHN: I don't understand what you mean.
There are problems with copyright.
I mean, Google knows about that.
So, yeah, there are problems with that.
But I don't know what else.
I mean--
Yes?
AUDIENCE: So, this talk was about generalizing past ESP
games to all kinds of things with human computation.
So you've looked at a bunch of these now.
From what I can tell so far, it's opportunistic.
Given a new task where you know people are better than
computers, is there some procedure for coming to figure
out what the right game is to get at that?
LUIS VON AHN: That would be great.
But I mean, in the same way that, for instance, if I give
you a new task and you have to come up with an efficient
algorithm for it, there's no procedure to coming up with an
efficient algorithm to solve something.
I don't think there will be a procedure, given a problem,
here's a game for it.
I think it's going to be an art, much like coming up with
efficient algorithms.
AUDIENCE: What does that mean for your research strategy and
research agenda?
Is it to just continue to find,
opportunistically, more of these?
LUIS VON AHN: Yes.
So basically, it's similar to what happens
in algorithm design.
I mean, people try to come up with general things.
So there's things like dynamic programming that works for a
lot of things.
So that's the best you can hope for, and that's sort of
what I'm trying to do.
But yeah, I don't think there'll ever be--
well, I don't know.
I'm not holding my breath that there'll ever be a method that
will just, given a problem, output a game.
MALE SPEAKER: OK, I'll ask if nobody else wants to.
So are you worried about the interface
between these two things?
Like, does the existence of these games and their
popularity reduce the value of the CAPTCHAs?
LUIS VON AHN: Oh, the CAPTCHA.
Yeah, yeah.
You can use these games to break the CAPTCHAs.
Yeah, definitely.
[LAUGHTER]
It's good to do research that breaks each other.
[LAUGHTER]
MALE SPEAKER: Thanks, Luis.
[APPLAUSE]