與圖靈獎得主、深度學習先驅Geoffrey Hinton的爐邊談話（Google I/O'19）。 (A Fireside Chat with Turing Award Winner Geoffrey Hinton, Pioneer of Deep Learning (Google I/O'19))

字幕列表影片播放

[MUSIC PLAYING]
NICHOLAS THOMPSON: Hello, I'm Nicholas Thompson.
I'm the editor in chief of "Wired."
It is my honor today to get the chance
to interview Geoffrey Hinton.
They're a couple-- well, there are
many things I love about him.
But two that I'll just mention in the introduction.
The first is that he persisted.
He had an idea that he really believed in
that everybody else said was bad.
And he just kept at it.
And it gives a lot of faith to everybody who has bad ideas,
myself included.
Then the second, as someone who spends half his life
as a manager adjudicating job titles,
I was looking at his job title before the introduction.
And he has the most non pretentious job
title in history.
So please welcome Geoffrey Hinton, the engineering fellow
at Google.
[APPLAUSE]
Welcome.
GEOFFREY HINTON: Thank you.
NICHOLAS THOMPSON: So nice to be here with you.
All right, so let us start.
20 years ago when you write some of your early very influential
papers, everybody starts to say, it's a smart idea,
but we're not actually going to be able to design computers
this way.
Explain why you persisted, why you were so confident that you
had found something important.
GEOFFREY HINTON: So actually it was 40 years ago.
And it seemed to me there's no other way the brain could work.
It has to work by learning the strengths of connections.
And if you want to make a device do something intelligent,
you've got two options.
You can program it, or it can learn.
And we certainly weren't programmed.
So we had to learn.
So this had to be the right way to go.
NICHOLAS THOMPSON: So explain, though--
well, let's do this.
Explain what neural networks are.
Most of the people here will be quite familiar.
But explain the original insight and how
it developed in your mind.
GEOFFREY HINTON: So you have relatively simple processing
elements that are very loosely models of neurons.
They have connections coming in.
Each connection has a weight on it.
That weight can be changed to do learning.
And what a neuron does is take the activities
on the connections times the weights, adds them all up,
and then decides whether to send an output.
And if it gets a big enough sum, it sends an output.
If the sum is negative, it doesn't send anything.
That's about it.
And all you have to do is just wire up
a gazillion of those with a gazillion squared weights
and just figure out how to change the weights,
and it'll do anything.
It's just a question of how you change the weights.
NICHOLAS THOMPSON: So when did you
come to understand that this was an approximate representation
of how the brain works?
GEOFFREY HINTON: Oh, it was always designed as that.
NICHOLAS THOMPSON: Right.
GEOFFREY HINTON: It was designed to be like how the brain works.
NICHOLAS THOMPSON: But let me ask you this.
So at some point in your career, you
start to understand how the brain works.
Maybe it was when you were 12.
Maybe it was when you were 25.
When do you make the decision that you
will try to model computers after the brain?
GEOFFREY HINTON: Sort of right away.
That was the whole point of it.
The whole idea was to have a learning device that
learned like the brain like people
think the brain learns by changing connection strengths.
And this wasn't my idea.
Turing had the same.
Turing, even though he invented a lot
of the basis of standard computer science,
he believed that the brain was this unorganized device
with random weights.
And it would use reinforcement learning
to change the connections.
And it would learn everything, and he
thought that was the best route to intelligence.
NICHOLAS THOMPSON: And so you were following Turing's idea
that the best way to make a machine is to model it
after the human brain.
This is how a human brain works.
So let's make a machine like that.
GEOFFREY HINTON: Yeah, it wasn't just Turing's idea.
Lots of people thought that back then.
NICHOLAS THOMPSON: All right, so you have this idea.
Lots of people have this idea.
You get a lot of credit.
In the late '80s, you start to come
to fame with your published work, is that correct?
GEOFFREY HINTON: Yes.
NICHOLAS THOMPSON: When is the darkest moment.
When is the moment where other people who
have been working who agreed with this idea from Turing
start to back away and yet you continue to plunge ahead?
GEOFFREY HINTON: There were always
a bunch of people who kept believing in it, particularly
in psychology.
But among computer scientists, I guess
in the '90s, what happened was data sets were quite small.
And computers weren't that fast.
And on small data sets, other methods like things
called support vector machines, worked a little bit better.
They didn't get confused by noise so much.
And so that was very depressing because we developed back
propagation in the '80s.
We thought it would solve everything.
And we were a bit puzzled about why it didn't solve everything.
And it was just a question of scale.
But we didn't really know that then.
NICHOLAS THOMPSON: And so why did
you think it was not working?
GEOFFREY HINTON: We thought it was not
working because we didn't have quite the right algorithms.
We didn't have quite the right objective functions.
I thought for a long time it's because we
were trying to do supervised learning
where you have to label data.
And we should have been doing unsupervised learning, where
you just learn from the data with no labels.
It turned out it was mainly a question of scale.
NICHOLAS THOMPSON: Oh, that's interesting.
So the problem was you didn't have enough data.
You thought you had the right amount of data,
but you hadn't labeled it correctly.
So you just misidentified the problem?
GEOFFREY HINTON: I thought that using labels at all
was a mistake.
You would do most of your learning
without making any use of labels just
by trying to model the structure in the data.
I actually still believe that.
I think as computers get faster, for any given size data set,
if you make computers fast enough,
you're better off doing unsupervised learning.
And once you've done the unsupervised learning,
you'll be able to learn from fewer labels.
NICHOLAS THOMPSON: So in the 1990s,
you're continuing with your research.
You're in academia.
You are still publishing, but it's not coming to a claim.
You aren't solving big problems.
When do you start--
well, actually, was there ever a moment
where you said, you know what, enough of this.
I'm going to go try something else?
GEOFFREY HINTON: Not really.
NICHOLAS THOMPSON: Not that I'm going to go sell burgers,
but I'm going to figure out a different way of doing this.
You just said we're going to keep doing deep learning.
GEOFFREY HINTON: Yes, something like this has to work.
I mean, the connections in the brain are learning somehow.
And we just have to figure it out.
And probably there's a bunch of different ways of learning
connection strengths.
The brains using one of them.
There may be other ways of doing it.
But certainly, you have to have something that can learn
these connection strengths.
And I never doubted that.
NICHOLAS THOMPSON: OK, so you never doubt it.
When does it first start to seem like it's working?
OK, you know, we've got this.
I believe in this idea, and actually, if you look at that,
if you squint, you can see it's working.
When did that happen?
GEOFFREY HINTON: OK, so one of the big disappointments
in the '80s was if you made networks
with lots of hidden layers, you couldn't train them.
That's not quite true because convolutional networks designed
by Yann LeCun, you could train for fairly simple tasks
like recognizing handwriting.
But most of the deep nets, we didn't know how to train them.
And in about 2005, I came up with a way
of doing unsupervised training of deep nets.
So you take your input, say your pixels,
and you'd learn a bunch of feature detectors
so that were just good at explaining why the pixels were
behaving like that.
And then you treat those feature detectors as the data
and then you learn another bunch of feature detectors.
So we got to explain why those feature detectors have
those correlations.
And you keep learning less and less.
And what was interesting was you could do some math
and prove that each time you learned another layer,
you didn't necessarily have a better model of the data,
but you had a band on how good your model was.
And you could get a better band each time
you added another layer.
NICHOLAS THOMPSON: What do you mean you had a band on how good
your model was?
GEOFFREY HINTON: OK, so once you got a model,
you can say how surprising does a model find this data?
You showed some data and you say,
is that the kind of thing you believe in or is
that surprising?
And you can sort of measure something that says that.
And what you'd like to do is have a model,
a good model is one that looks at the data and says yeah,
I knew that.
It's unsurprising, OK?
And it's often very hard to compute
exactly how surprising this model finds the data.
But you can compute a band on that.
You can say this model finds the data less surprising than this.
And you could show that, as you add extra layers of feature
detectors, you get a model.
And each time you add a layer, it
finds the data, the band on how surprising it
finds the data gets better.
NICHOLAS THOMPSON: Oh, I see.
OK, so that makes sense.
So you're making observations, and they're not correct.
But you know they're closer and closer to being correct.
I'm looking at the audience.
I'm making some generalization.
It's not correct, but I'm getting better and better
at it, roughly?
GEOFFREY HINTON: Roughly.
NICHOLAS THOMPSON: OK, so that's about 2005
where you come up with that mathematical breakthrough?
GEOFFREY HINTON: Yeah.
NICHOLAS THOMPSON: When do you start getting answers
that are correct and what data are you working on?
This is speech data where you first have your break through.
GEOFFREY HINTON: This was just handwritten digits.
Very simple data.
Then around the same time, they started developing GPUs.
And the people doing neural networks
started using GPUs in about 2007.
I had one very good student called
Vlad Mnih, who started using GPUs for finding roads
in aerial images.
He wrote some code that was then used by other students
for using GPUs to recognize phonemes in speech.
And so they were using this idea pre-training.
And after they'd done all this pre-training,
then they'd just stick labels on top and use back propagation.
And it turned out that way, you could have a very deep net
that was pre-trained this way.
And you could then use back propagation,
and it actually worked.
And it sort of beat the benchmarks
for speech recognition initially just by a little bit.
NICHOLAS THOMPSON: It beat the best commercially available
speech recognition.
It beat the best academic work on speech recognition.
GEOFFREY HINTON: On a relatively small data set called TIMIT,
it did slightly better than the best academic work.
It also worked on at IBM.
And very quickly people realized that this stuff,
since it was beating standard models that
are taking 30 years to develop with a bit more development
would do really well.
And so my graduate students went off to Microsoft and IBM
and Google.
And Google was the fastest to turn it into a production
speech recognizer.
And by 2012, that work that was first done in 2009
came out in Android.
And Android suddenly got much better in speech recognition.
NICHOLAS THOMPSON: So tell me about that moment where you've
had this idea for 40 years, you've
been publishing on it for 20 years,
and you're finally better than your colleagues?
What did that feel like?
GEOFFREY HINTON: Well, back then I'd
only had the idea for 30 years.
NICHOLAS THOMPSON: Correct, correct, sorry, sir.
Just a new idea.
It's fresh.
GEOFFREY HINTON: It felt really good
that it finally got the state of the art on a real problem.
NICHOLAS THOMPSON: And do you remember
where you were when you first got the revelatory data?
GEOFFREY HINTON: No.
NICHOLAS THOMPSON: No, no, OK.
All right, so you realize it works on speech recognition.
When do you start applying it to other problems?
GEOFFREY HINTON: So then we start applying it
to all sorts of other problems.
So George Dahl, who was one of the people who
did the original work on speech recognition, applied it to--
I give you a lot of descriptors of a molecule
and you want to predict if that molecule will bind to something
to act as a good drug.
And there was a competition on Kaggle.
And he just applied our standard technology design
for speech recognition to predicting
the activity of drugs and it won the competition.
So that was a sign that this stuff sort of fairly universal.
And then I had a student called [INAUDIBLE],, who said,
you know, Geoff, this stuff is going
to work for image recognition.
And Fei-Fei Li has created the correct data set for it,
and it's a public competition.
We have to do that.
And so what we did was take an approach originally developed
by Yann LeCun.
A student called Alex Krizhevsky was a real wizard.
He could make GPUs do anything.
Programmed the GPUs really, really well.
And we got results that were a lot better
than standard computer vision.
That was 2012.
And it was a coincidence I think of the speech recognition
coming out in the Android.
So you knew this stuff could solve production problems.
And on vision in 2012, it had done much better
than the standard computer vision.
NICHOLAS THOMPSON: So those are three areas where it succeeded.
So modeling chemicals, speech, voice, where was it failing?
GEOFFREY HINTON: The failure is only temporary, you understand.
[LAUGHTER]
NICHOLAS THOMPSON: Where was it failing?
GEOFFREY HINTON: For things like machine translation,
I thought it would be a very long time before we could
do that because machine translation,
you've got a string of symbols comes in
and a string of symbols goes out.
And it's fairly plausible to say in between you do manipulations
on strings of symbols, which is what classical AI is.
Actually, it doesn't work like that.
Strings and symbols come in.
You turn those into great big vectors in your brain.
These vectors interact with each other.
And then you convert it back into strings and symbols
to go out.
And if you told me in 2012 that in the next five years,
we'll be able to translate between many languages using
just the same technology, recurrent nets,
but just the stochastic gradient descent
from random initial weights, I wouldn't have believed you.
It happened much faster than expected.
NICHOLAS THOMPSON: But so what distinguishes
the areas where it works the most quickly
and the areas where it will take more time?
It seems like the visual processing, speech recognition,
sort of core human things that we
do with our sensory perception seem to be the first barriers
to clear.
Is that correct?
GEOFFREY HINTON: Yes and no because there's other things
we do like motor control.
We're very good at motor control.
Our brains are clearly designed for that.
And that's only just now a neuron net's beginning
to compete with the best other technologies there.
They will win in the end.
But they're only just winning now.
I think things like reasoning, abstract reasoning,
they're the kind of last things we learn to do.
And I think they'll be among the last things
these neural nets learn to do.
NICHOLAS THOMPSON: And so you keep
saying that neural nets will win at everything eventually.
GEOFFREY HINTON: Well, we are neural nets, right?
Anything we can do they can do.
NICHOLAS THOMPSON: Right, but just
because the human brain is not necessarily the most efficient
computational machine ever created.
GEOFFREY HINTON: Almost certainly not.
NICHOLAS THOMPSON: So why could there not be--
certainly not my human brain.
Couldn't there be a way of modeling machines
that is more efficient than the human brain?
GEOFFREY HINTON: Philosophically, I
have no objection to the idea there could be some completely
different way to do all this.
It could be that if you start with logic
and you're trying to automate logic,
and you make some really fancy theorem prover,
and you do reasoning, and then you decide
you're going to do visual perception by doing reasoning,
it could be that that approach would win.
It turned out it didn't.
But I have no philosophical objection to that winning.
It's just we know that brains can do it.
NICHOLAS THOMPSON: Right, but there are also things
that our brains can't do well.
Are those things that neural nets also
won't be able to do well?
GEOFFREY HINTON: Quite possibly, yes.
NICHOLAS THOMPSON: And then there's
a separate problem, which is we don't know entirely
how these things work, right?
GEOFFREY HINTON: No, we really don't know how they work.
NICHOLAS THOMPSON: We don't understand how top down neural
networks work.
There is even a core element of how
neural networks work that we don't understand, right?
GEOFFREY HINTON: Yes.
NICHOLAS THOMPSON: So explain that
and then let me ask the obvious follow up,
which is, we don't know how these things work.
How can those things work?
GEOFFREY HINTON: OK, you ask that when I finish explaining.
NICHOLAS THOMPSON: Yes.
GEOFFREY HINTON: So if you look at current computer vision
systems, most of them, they're basically feed forward.
They don't use feedback connections.
There's something else about current computer vision
systems, which is they're very prone to have
visceral examples.
You can change a few pixels slightly
and something that was a picture of a panda
and still looks exactly like a panda to you,
it suddenly says that's an ostrich.
Obviously, the way you change the pixels is cleverly
designed to fool it into thinking it's an ostrich.
But the point is it still looks just like a panda to you.
And initially, we thought these things work really well.
But then when confronted with the fact
that they look at a panda and be confident it's an ostrich,
you get a bit worried.
And I think part of the problem there
is that they're not trying to reconstruct from the high level
representations.
They're trying to do descriptive learning where you just
learn layers of feature detectors
and the whole, whole objective is just to change the weights.
So you get better at getting the right answer.
They're not doing things like at each level of feature
detectors, check that you can reconstruct
the data in the layer below from the activities of these feature
detectors.
And recently in Toronto, we've been discovering,
or Nick Frost's been discovering,
that if you introduce reconstruction then
it helps you be more resistant to adversarial attack.
So I think in human vision, to do
the learning we do in reconstruction and also
because we're doing a lot of learning
by doing reconstructions, we are much more resistant
to adversarial attack.
NICHOLAS THOMPSON: But you believe
that top down communication in a neural network
is how you test, how you reconstruct,
how you test and make sure it's a panda not an ostrich?
GEOFFREY HINTON: I think that's crucial, yes.
Because I think if you--
NICHOLAS THOMPSON: But brain scientists
are not entirely agreed on that, correct?
GEOFFREY HINTON: Brain scientists
all agreed on the idea that if you
have two areas of the cortex in a perceptual pathway,
if there's connections from one to the other,
they'll always be backwards connections, not necessarily
point to point.
But there will always be a backwards pathway.
They're not agreed on what it's for.
It could be for attention.
It could be for learning, or it could be for reconstruction,
or it could be for all three.
NICHOLAS THOMPSON: And so we don't know what
the backwards communication is.
You are building your new neural networks on the assumption
that-- or you're building backwards communication that
is for reconstruction into your neural networks
even though we're not sure that's how the brain works.
GEOFFREY HINTON: Yes.
NICHOLAS THOMPSON: Isn't that cheating?
GEOFFREY HINTON: Not at all
NICHOLAS THOMPSON: If you're trying
to make it like the brain, you're
doing something we're not sure is like the brain.
GEOFFREY HINTON: Not at all.
NICHOLAS THOMPSON: OK.
GEOFFREY HINTON: There's two--
I'm not doing computational neuroscience science.
That is, I'm not trying to make a model of how the brain works.
I'm looking at the brain and saying this thing works.
And if we want to make something else that works,
we should sort of look to it for inspiration.
So this is neuro inspired, not a neural model.
So the neurons we use, they're inspired
by the fact neurons have a lot of connections
and they change the strings.
NICHOLAS THOMPSON: That's interesting.
So if I were in computer science and I
was working on neural networks, and I
wanted to beat Geoff Hinton, one thing I could do
is I could build in top down communication
and base it on other models of brain science.
So based on learning, not on reconstructing.
GEOFFREY HINTON: If they were better models, then you'd win,
yeah.
NICHOLAS THOMPSON: That's very, very interesting.
All right, so let's move to a more general topic.
So neural networks will be able to solve all kinds of problems.
Are there any mysteries of the human brain that will not be
captured by neural networks or cannot?
For example, could the emotion--
GEOFFREY HINTON: No.
NICHOLAS THOMPSON: No.
So love could be reconstructed by a neural network?
Consciousness can be constructed?
GEOFFREY HINTON: Absolutely, once you've figured
out what those things mean--
we are neural networks, right?
Now consciousness is something I'm particularly interested in.
I get by fine without it.
But um--
[LAUGHTER]
So people don't really know what they mean by it.
There's all sorts of different definitions.
And I think it's a pre-scientific term.
So 100 years ago, if you ask people what is life?
They would have said, well, living things have vital force.
And when they die, the vital force goes away.
And that's the difference between being alive and being
dead, whether you got vital force or not.
And now we don't think that sort of--
we don't have vital force.
We just think it's a pre-scientific concept.
And once you understand some biochemistry
and molecular biology, you don't need vital force anymore.
You understand how it actually works.
And I think it's going to be the same with consciousness.
I think consciousness is an attempt
to explain mental phenomena with some kind of special essence.
And this special essence, you don't need it.
Once you can really explain it, then you'll
explain how we do the things that make
people think we're conscious.
And you'll explain all these different meanings
of consciousness without having some special essence
as consciousness.
NICHOLAS THOMPSON: Right, so there's no emotion that
couldn't be created.
There's no thought that couldn't be created.
There's nothing that a human mind
can do that couldn't theoretically
be recreated by a fully functioning neural network
once we truly understand how the brain works.
GEOFFREY HINTON: There's something in a John Lennon song
that sounds very like what you just said.
[LAUGHTER]
NICHOLAS THOMPSON: And you're 100% confident of this?
GEOFFREY HINTON: No, I'm a Bayesian.
So I'm 99.9% confident.
NICHOLAS THOMPSON: OK, and what is the point one?
GEOFFREY HINTON: Well, we might, for example, all
be part of a big simulation.
NICHOLAS THOMPSON: True, fair enough, OK.
[LAUGHTER]
[APPLAUSE]
That actually makes me think it's more likely that we are.
All right, so what are we learning as we do this
and as we study the brain to improve computers?
How does it work in reverse?
What are we learning about the brain
from our work in computers?
GEOFFREY HINTON: So I think what we've learned in the last 10
years is that if you take a system with billions
of parameters, and you'd use stochastic gradient
descent in some objective function,
and the objective function might be to get the right labels
or it might be to fill in the gap in a string of words,
or any objective function, it works much better than it
has any right to.
It works much better than you would expect.
You would have thought, and most people in conventional AI
thought, take a system with a billion parameters,
start them off with random values,
measure the gradient of the objective function.
That is, for each parameter figure
out how the objective function would change if you change
that parameter a little bit.
And then change it in that direction that improves
the objective function.
You would have thought that would
be a kind of hopeless algorithm that will get stuck.
And it turns out, it's a really good algorithm.
And the bigger you scale things, the better it works.
And that's just an empirical discovery really.
There's some theory coming along,
but it's basically an empirical discovery.
Now because we've discovered that,
it makes it far more plausible that the brain
is computing the gradient of some objective function
and updating the weights of strength of synapses
to follow that gradient.
We just have to figure out how it gets the gradient
and what the objective function is.
NICHOLAS THOMPSON: But we didn't understand
that about the brain.
We didn't understand the re-weighted [INAUDIBLE]..
GEOFFREY HINTON: It was a theory.
It was-- I mean, a long time ago,
people thought that's a possibility.
But in the background, there was always
sort of conventional computer scientists saying, yeah,
but this idea of everything's random,
you just learn it all by gradient descent,
that's never going to work for a billion parameters.
You have to wire in a lot of knowledge.
NICHOLAS THOMPSON: All right, so--
GEOFFREY HINTON: And we know now that's wrong.
You can just put in random parameters
and learn everything.
NICHOLAS THOMPSON: So let's expand this out.
So as we learn more and more, we will presumably
continue to learn more and more about how the human brain
functions as we run these massive tests on models
based on how we think it functions.
Once we understand it better, is there
a point where we can, essentially,
rewire our brains to be more like the most
efficient machines or change the way we think?
GEOFFREY HINTON: You'd have thought--
NICHOLAS THOMPSON: If it's a simulation that should be easy,
but not in a simulation.
GEOFFREY HINTON: You'd have thought
that if we really understand what's going on,
we should be able to make things like education work better,
and I think we will.
NICHOLAS THOMPSON: We will?
GEOFFREY HINTON: Yeah.
It would be very odd if you could finally
understand what's going on in your brain
and how it learns and not be able to adapt the environment
so you can learn better.
NICHOLAS THOMPSON: Well, OK, I don't want
to go too far into the future.
But a couple of years from now, how
do you think we will be using what we've learned
about the brain and about how deep learning works to change
how education functions?
How would you change a class?
GEOFFREY HINTON: In a couple of years,
I'm not sure we'll learn much.
I think it's going to change the education.
It's going to be longer.
But if you look at it, Assistants
are getting pretty smart now.
And once Assistants can really understand conversations,
Assistants can have conversations with kids
and educate them.
So already, I think most of the new knowledge I acquire
comes from me thinking, I wonder,
and typing something to Google and Google
tells me, if I could just have a conversation,
I'd acquire knowledge even better.
NICHOLAS THOMPSON: And so theoretically,
as we understand the brain better,
and as we set our children up in front of Assistants.
Mine right now almost certainly based on the time in New York
is yelling at Alexa to play something on Spotify, probably
"Baby Shark"--
you will program the Assistants to have better conversations
with the children based on how we know they'll learn?
GEOFFREY HINTON: Yeah, I haven't really thought much about this.
It's not what I do.
But it seems quite plausible to me.
NICHOLAS THOMPSON: Will we be able to understand
how dreams work, one of the great mysteries?
GEOFFREY HINTON: Yes, I'm really interested in dreams.
NICHOLAS THOMPSON: Good, well, let's talk about that,
GEOFFREY HINTON: I'm so interested.
I have at least four different theories of dreams.
NICHOLAS THOMPSON: Let's hear them all--
1, 2, 3, 4.
GEOFFREY HINTON: So a long time ago, there were things called--
OK, a long time ago there was Hopfield networks.
And they would learn memories as local attractors.
And Hopfield discovered that if you try and put
too many memories in, they get confused.
They'll take two local attractors
of merged them into an attractor sort of halfway in between.
Then Francis Crick and Graeme Mitchison came along and said,
we can get rid of these false minima by doing unlearning.
So we turn off the input.
We put the neural network into a random state.
We let it settle down, and we say that's bad.
Change the connections so you don't settle to that state.
And if you do a bit of that, it will
be able to store more memories.
And then Terry Sejnowski and I came along and said, look,
if we have not just the neurons where you're
storing the memories, but lots of other neurons, too,
can we find an algorithm that we'll
use all these other neurons to help you store memories?
And it turned out in the end, we came up with the Boltzmann
machine learning algorithm.
And the Boltzmann machine learning algorithm
had a very interesting property which is I show you data.
That is, I fixed the states of the observable units.
And it sort of rattles around the other units
until it's got a fairly happy state.
And once it's done that, it increases
the strength of all the connections based
on if two units are both active, it
increases connection strength.
That's called kind of Hebbian learning.
But if you just do that, the connection strengths
just get bigger and bigger.
You also have to have a phase where you cut it off
from the input.
You let it rattle around to settle
into a state it's happy with.
So now it's having a fantasy.
And once it's had the fantasy you
say, take all passive neurons that are active
and decrease the strength to the connection.
So I'm explaining the algorithm to you just as a procedure.
But actually that algorithm is the result of doing some math
and saying, how should you change these connection
strengths so that this neural network with all
these hidden units finds the data unsurprising?
And it has to have this other phase.
It has to have this what we call the negative phase when
it's running with no input.
And it's canceling out--
its unlearning whatever state it settles into.
Now what Crick pointed out about dreams
is that, we know that you dream for many hours every night.
And if I wake you up at random, you
can tell me what you were just dreaming about because it's
in your short term memory.
So we know you dream for many hours.
But in the morning, you wake up, you
can remember the last dream, but you
can't remember all the others, which is lucky because you
might mistake them for reality.
So why is it that we don't remember our dreams at all?
And Crick's view was it's the whole point of dreaming
is to unlearn those things so you put the learning
rule in reverse.
And Terry Sejnowski and I showed that actually that
is a maximum [INAUDIBLE] learning procedure
for Boltzmann machines.
So that's one theory of dreaming.
NICHOLAS THOMPSON: You showed that theoretically?
GEOFFREY HINTON: Yeah, we should theoretically
that's the right thing to do if you want
to change the weights so that your big neural network finds
the observed data less surprising.
NICHOLAS THOMPSON: And I want to go to your other theories,
but before we lose this thread, you've
proved that it's efficient.
Have you actually set any of your deep learning algorithms
to essentially dream?
Right, study this image data set for a period of time,
resort, study again, resort versus a machine
that's running continuously?
GEOFFREY HINTON: So yes, we had machine learning algorithms.
Some of the first algorithms that
could learn what to do with hidden units
were Boltzmann machines.
They were very inefficient.
But then later on, I found a way of making approximations
to them that was efficient.
And those were actually the trigger
for getting deep learning going again.
Those were the things that learned one layer feature
detector at a time.
And it was efficient form of restricted Boltzmann machine.
And so it was doing this kind of unlearning.
But rather than going to sleep, that one
would just fantasize for a little bit
after each data point.
NICHOLAS THOMPSON: So Androids do dream of electric sheep.
So let's go to theories 2, 3, and 4.
GEOFFREY HINTON: OK, theory 2 was called
the wake-sleep algorithm.
And you want to learn a generative model.
So you have the idea that you're going to have
a model that can generate data.
It has layers of features detectors.
And it activates the high level ones and the low level ones
and so on, until it activates pixels, and that's an image.
You also want to learn the other way.
You want to learn to recognize data.
And so you're going to have an algorithm that has two phases.
In the wake phase, data comes in.
It tries to recognize it.
And instead of learning the connections
it is using for recognition, it's
learning the generative connections.
So data comes in.
I activate the hidden units, and then I
learn to make those hidden units be good at reconstructing
s that data.
So it's learning to reconstruct at every layer.
But the question is, how do you learn the forward connection?
So the idea is, if you knew the forward connections,
you could learn the backward connections because you
could learn to reconstruct.
NICHOLAS THOMPSON: Yeah.
GEOFFREY HINTON: Now it also turns out
that if you knew the backward connections,
you could learn the forward connections
because what you could do is start at the top
and just generate some data.
And because you generated the data,
you'd know the states of all the hidden layers.
And so you could learn the forward connections
to recover those states.
So that would be the sleep phase.
When you turn off the input, you just generate data
and then you try and reconstruct the hidden units
that generated the data.
And so if you know the top down connections,
you'd learn the bottom up ones.
If you know the bottom up ones, you
could learn the top down ones.
And so what's going to happen if you start
with random connections and try doing both-- alternate both
kinds of learning and it works.
Now to make it work well, you have
to do all sorts of variations of it.
But it works.
NICHOLAS THOMPSON: All right, that is--
do you want to go through the other two theories?
We only have eight minutes left.
I think we should probably jump through some other questions.
We'll deal with--
GEOFFREY HINTON: If you give me another hour,
I could do the other two theories.
[LAUGHTER]
NICHOLAS THOMPSON: All right, well, Google I/O 2020.
So let's talk about what comes next.
So where is your research headed?
What problem are you trying to solve now?
GEOFFREY HINTON: The main thing I'm trying to solve,
which I've been doing for a number of years now--
actually, I'm reminded of a soccer commentator.
You may notice soccer commentators,
they always say things like they're doing very well,
but they always go wrong on the last pass.
And they never seem to sort of notice there's something
funny about that.
It's a bit circular.
So I'm working-- eventually, you're
going to end up working on something you don't finish.
And I think I may well be working
on the thing I never finish.
But it's called capsules, and it's
a theory of how you do visual perception using reconstruction
and also how you root information
to the right places.
And the two main motivating factors
were in standard neural nets, the information--
the activity in the layer just automatically goes somewhere.
You don't make decisions about where to send it.
The idea of capsules was to make decisions
about where to send information.
Now since I started working on capsules,
some other very smart people at Google
invented transformers, which are doing the same thing.
They're deciding where to route information,
and that's a big win.
The other thing that motivated capsules was coordinate frames.
So when humans do visual, they're
always using coordinate frames.
And if they impose the wrong coordinate frame on an object,
they don't even recognize the object.
So I'll give you a little task.
Imagine a tetrahedron.
It's got a triangular base and three triangular faces,
all equilateral triangles.
Easy to imagine, right?
Now imagine slicing it with a plane.
So you get a square cross-section.
That's not so easy, right?
Every time you slice it, you get a triangle.
It's not obvious how you get a square.
It's not at all obvious.
OK, but I give you the same shape described differently.
I need your pen.
Imagine, the shape you get, if you take a pen
like that, another pen that right angles like this,
and you connect all points on this pen
to all points on this pen.
That's a solid tetrahedron.
OK, you're seeing it relative to a different coordinate frame
where the edges of the tetrahedron--
these two line up with the coordinate frame.
And for this, if you think of the tetrahedron that way,
it's pretty obvious that at the top,
you've got a long rectangle this way.
At the bottom, you get a long rectangle that way.
And there's [INAUDIBLE] that you've got
to get a square in the middle.
So it's pretty obvious how you can slice it to get a square.
But that's only obviously if you think of it
with that coordinate frame.
So it's obvious that for humans, coordinate frames are
very important for perception.
And they're not at all important for conv nets.
For conv nets, if I show you a tilted square
and an upright diamond, which is actually the same thing,
they look the same to a conv net.
It doesn't have two alternative ways
of describing the same thing.
NICHOLAS THOMPSON: But how is adding coordinate frames
to your model not the same as the error
you were making in the '90s where
you were trying to put rules into the system as opposed
to letting the system be unsupervised?
GEOFFREY HINTON: It is exactly that error.
And because I am so adamant that that's a terrible error,
I'm allowed to do a tiny bit of it.
It's sort of like Nixon negotiating with China.
[LAUGHTER]
Actually that puts me in a bad role.
Anyway, so if you look at conv nets,
they're just neural nets where you wired
in a tiny bit of knowledge.
You add in the knowledge that if a feature detector is good
here, it's good over there.
And people would love to wire in just a little bit
more knowledge about scale and orientation.
But if you do it in the obvious way
of having a 4D grid instead of a 2D grid,
the whole thing blows up on you.
But you can get in that knowledge about what viewpoint
does to an image by using coordinate frames the same way
they do them in graphics.
So now you have a representation in one layer.
When you try and reconstruct the parts of an object in the layer
below, when you do that reconstruction,
you can take the coordinate frame of the whole object
and multiply it by the part whole relationship
to get the coordinate frame of the part.
And you can wire that into the network.
You can wire into the network the ability
to do those coordinate transformations.
And that should make it generalize much, much better.
It should mean the networks just find viewpoint very easy
to deal with.
Current neural networks find viewpoint other
than translation very hard to deal with.
NICHOLAS THOMPSON: So your current task
is specific to visual recognition,
or it is a more general way of improving or coming up
with the rule set for coordinate frames?
GEOFFREY HINTON: OK, it could be used for other things.
But I'm really interested in the use for visual recognition.
NICHOLAS THOMPSON: OK, last question.
I was listening to a podcast you gave the other day.
And in it, you said that the people whose ideas you value
most are the young graduate students who come into your lab
because they aren't locked into the old perceptions.
They have fresh ideas, and yet they also know a lot.
Is there anything that you, sort of looking outside yourself,
you think you might be locked into that a new graduate
student or somebody in this room who came to work with you
would shake up?
GEOFFREY HINTON: Yeah, everything I said.
NICHOLAS THOMPSON: Everything you said.
[LAUGHTER]
Take out those coordinate units.
Work on feature three, work on feature four.
I wanted to ask you a separate question.
So deep learning used to be a distinct thing,
and then it became sort of synonymous with the phrase AI.
And then AI is now a marketing term
that basically means using a machine in any way whatsoever.
How do you feel about the terminology
as the man who helped create this?
GEOFFREY HINTON: Well, I was much happier when
there was AI, which meant your logic inspired
and you do manipulations on cymbal strings.
And there was neural nets, which means
you want to do learning in a neural network.
And they were completely different enterprises
that really sort of didn't get along too well
and fought for money.
That's how I grew up.
And now I see sort of people who spent
years saying neural networks are nonsense,
saying I'm an AI professor.
So I need money.
And it's annoying.
NICHOLAS THOMPSON: So your field succeeded
kind of ate or subsumed the other field, which
then gave them an advantage in asking for money,
which is frustrating?
GEOFFREY HINTON: Yeah, now it's not entirely fair
because a lot of them have actually converted.
NICHOLAS THOMPSON: Right, so wonderful.
Well, I've got time for one more question.
So in that same interview, you were talking about AI.
And you said, think of it like a backhoe, a backhoe that
can build a hole, or if not constructed properly,
can wipe you out.
And the key is when you work on your backhoe
to design it in such a way that it's best to build a hole
and not to clock you in the head.
As you think about your work, what
are the choices you make like that?
GEOFFREY HINTON: I guess I would never deliberately
work on making weapons.
I mean, you could design a backhoe
that was very good at knocking people's heads off.
And I think that would be a bad use of a backhoe,
and I wouldn't work on it.
NICHOLAS THOMPSON: All right, well, Geoffrey Hinton--
extraordinary interview.
All kinds information-- will be back
next year to talk about dreams theories three and four.
That was so much fun.
Thank you.
[MUSIC PLAYING]