Placeholder Image

字幕列表 影片播放

  • [MUSIC PLAYING]

  • NICHOLAS THOMPSON: Hello, I'm Nicholas Thompson.

  • I'm the editor in chief of "Wired."

  • It is my honor today to get the chance

  • to interview Geoffrey Hinton.

  • They're a couple-- well, there are

  • many things I love about him.

  • But two that I'll just mention in the introduction.

  • The first is that he persisted.

  • He had an idea that he really believed in

  • that everybody else said was bad.

  • And he just kept at it.

  • And it gives a lot of faith to everybody who has bad ideas,

  • myself included.

  • Then the second, as someone who spends half his life

  • as a manager adjudicating job titles,

  • I was looking at his job title before the introduction.

  • And he has the most non pretentious job

  • title in history.

  • So please welcome Geoffrey Hinton, the engineering fellow

  • at Google.

  • [APPLAUSE]

  • Welcome.

  • GEOFFREY HINTON: Thank you.

  • NICHOLAS THOMPSON: So nice to be here with you.

  • All right, so let us start.

  • 20 years ago when you write some of your early very influential

  • papers, everybody starts to say, it's a smart idea,

  • but we're not actually going to be able to design computers

  • this way.

  • Explain why you persisted, why you were so confident that you

  • had found something important.

  • GEOFFREY HINTON: So actually it was 40 years ago.

  • And it seemed to me there's no other way the brain could work.

  • It has to work by learning the strengths of connections.

  • And if you want to make a device do something intelligent,

  • you've got two options.

  • You can program it, or it can learn.

  • And we certainly weren't programmed.

  • So we had to learn.

  • So this had to be the right way to go.

  • NICHOLAS THOMPSON: So explain, though--

  • well, let's do this.

  • Explain what neural networks are.

  • Most of the people here will be quite familiar.

  • But explain the original insight and how

  • it developed in your mind.

  • GEOFFREY HINTON: So you have relatively simple processing

  • elements that are very loosely models of neurons.

  • They have connections coming in.

  • Each connection has a weight on it.

  • That weight can be changed to do learning.

  • And what a neuron does is take the activities

  • on the connections times the weights, adds them all up,

  • and then decides whether to send an output.

  • And if it gets a big enough sum, it sends an output.

  • If the sum is negative, it doesn't send anything.

  • That's about it.

  • And all you have to do is just wire up

  • a gazillion of those with a gazillion squared weights

  • and just figure out how to change the weights,

  • and it'll do anything.

  • It's just a question of how you change the weights.

  • NICHOLAS THOMPSON: So when did you

  • come to understand that this was an approximate representation

  • of how the brain works?

  • GEOFFREY HINTON: Oh, it was always designed as that.

  • NICHOLAS THOMPSON: Right.

  • GEOFFREY HINTON: It was designed to be like how the brain works.

  • NICHOLAS THOMPSON: But let me ask you this.

  • So at some point in your career, you

  • start to understand how the brain works.

  • Maybe it was when you were 12.

  • Maybe it was when you were 25.

  • When do you make the decision that you

  • will try to model computers after the brain?

  • GEOFFREY HINTON: Sort of right away.

  • That was the whole point of it.

  • The whole idea was to have a learning device that

  • learned like the brain like people

  • think the brain learns by changing connection strengths.

  • And this wasn't my idea.

  • Turing had the same.

  • Turing, even though he invented a lot

  • of the basis of standard computer science,

  • he believed that the brain was this unorganized device

  • with random weights.

  • And it would use reinforcement learning

  • to change the connections.

  • And it would learn everything, and he

  • thought that was the best route to intelligence.

  • NICHOLAS THOMPSON: And so you were following Turing's idea

  • that the best way to make a machine is to model it

  • after the human brain.

  • This is how a human brain works.

  • So let's make a machine like that.

  • GEOFFREY HINTON: Yeah, it wasn't just Turing's idea.

  • Lots of people thought that back then.

  • NICHOLAS THOMPSON: All right, so you have this idea.

  • Lots of people have this idea.

  • You get a lot of credit.

  • In the late '80s, you start to come

  • to fame with your published work, is that correct?

  • GEOFFREY HINTON: Yes.

  • NICHOLAS THOMPSON: When is the darkest moment.

  • When is the moment where other people who

  • have been working who agreed with this idea from Turing

  • start to back away and yet you continue to plunge ahead?

  • GEOFFREY HINTON: There were always

  • a bunch of people who kept believing in it, particularly

  • in psychology.

  • But among computer scientists, I guess

  • in the '90s, what happened was data sets were quite small.

  • And computers weren't that fast.

  • And on small data sets, other methods like things

  • called support vector machines, worked a little bit better.

  • They didn't get confused by noise so much.

  • And so that was very depressing because we developed back

  • propagation in the '80s.

  • We thought it would solve everything.

  • And we were a bit puzzled about why it didn't solve everything.

  • And it was just a question of scale.

  • But we didn't really know that then.

  • NICHOLAS THOMPSON: And so why did

  • you think it was not working?

  • GEOFFREY HINTON: We thought it was not

  • working because we didn't have quite the right algorithms.

  • We didn't have quite the right objective functions.

  • I thought for a long time it's because we

  • were trying to do supervised learning

  • where you have to label data.

  • And we should have been doing unsupervised learning, where

  • you just learn from the data with no labels.

  • It turned out it was mainly a question of scale.

  • NICHOLAS THOMPSON: Oh, that's interesting.

  • So the problem was you didn't have enough data.

  • You thought you had the right amount of data,

  • but you hadn't labeled it correctly.

  • So you just misidentified the problem?

  • GEOFFREY HINTON: I thought that using labels at all

  • was a mistake.

  • You would do most of your learning

  • without making any use of labels just

  • by trying to model the structure in the data.

  • I actually still believe that.

  • I think as computers get faster, for any given size data set,

  • if you make computers fast enough,

  • you're better off doing unsupervised learning.

  • And once you've done the unsupervised learning,

  • you'll be able to learn from fewer labels.

  • NICHOLAS THOMPSON: So in the 1990s,

  • you're continuing with your research.

  • You're in academia.

  • You are still publishing, but it's not coming to a claim.

  • You aren't solving big problems.

  • When do you start--

  • well, actually, was there ever a moment

  • where you said, you know what, enough of this.

  • I'm going to go try something else?

  • GEOFFREY HINTON: Not really.

  • NICHOLAS THOMPSON: Not that I'm going to go sell burgers,

  • but I'm going to figure out a different way of doing this.

  • You just said we're going to keep doing deep learning.

  • GEOFFREY HINTON: Yes, something like this has to work.

  • I mean, the connections in the brain are learning somehow.

  • And we just have to figure it out.

  • And probably there's a bunch of different ways of learning

  • connection strengths.

  • The brains using one of them.

  • There may be other ways of doing it.

  • But certainly, you have to have something that can learn

  • these connection strengths.

  • And I never doubted that.

  • NICHOLAS THOMPSON: OK, so you never doubt it.

  • When does it first start to seem like it's working?

  • OK, you know, we've got this.

  • I believe in this idea, and actually, if you look at that,

  • if you squint, you can see it's working.

  • When did that happen?

  • GEOFFREY HINTON: OK, so one of the big disappointments

  • in the '80s was if you made networks

  • with lots of hidden layers, you couldn't train them.

  • That's not quite true because convolutional networks designed

  • by Yann LeCun, you could train for fairly simple tasks

  • like recognizing handwriting.

  • But most of the deep nets, we didn't know how to train them.

  • And in about 2005, I came up with a way

  • of doing unsupervised training of deep nets.

  • So you take your input, say your pixels,

  • and you'd learn a bunch of feature detectors

  • so that were just good at explaining why the pixels were

  • behaving like that.

  • And then you treat those feature detectors as the data

  • and then you learn another bunch of feature detectors.

  • So we got to explain why those feature detectors have

  • those correlations.

  • And you keep learning less and less.

  • And what was interesting was you could do some math

  • and prove that each time you learned another layer,

  • you didn't necessarily have a better model of the data,

  • but you had a band on how good your model was.

  • And you could get a better band each time

  • you added another layer.

  • NICHOLAS THOMPSON: What do you mean you had a band on how good

  • your model was?

  • GEOFFREY HINTON: OK, so once you got a model,

  • you can say how surprising does a model find this data?

  • You showed some data and you say,

  • is that the kind of thing you believe in or is

  • that surprising?

  • And you can sort of measure something that says that.

  • And what you'd like to do is have a model,

  • a good model is one that looks at the data and says yeah,

  • I knew that.

  • It's unsurprising, OK?

  • And it's often very hard to compute

  • exactly how surprising this model finds the data.

  • But you can compute a band on that.

  • You can say this model finds the data less surprising than this.

  • And you could show that, as you add extra layers of feature

  • detectors, you get a model.

  • And each time you add a layer, it

  • finds the data, the band on how surprising it

  • finds the data gets better.

  • NICHOLAS THOMPSON: Oh, I see.

  • OK, so that makes sense.

  • So you're making observations, and they're not correct.

  • But you know they're closer and closer to being correct.

  • I'm looking at the audience.

  • I'm making some generalization.

  • It's not correct, but I'm getting better and better

  • at it, roughly?

  • GEOFFREY HINTON: Roughly.

  • NICHOLAS THOMPSON: OK, so that's about 2005

  • where you come up with that mathematical breakthrough?

  • GEOFFREY HINTON: Yeah.

  • NICHOLAS THOMPSON: When do you start getting answers

  • that are correct and what data are you working on?

  • This is speech data where you first have your break through.

  • GEOFFREY HINTON: This was just handwritten digits.

  • Very simple data.

  • Then around the same time, they started developing GPUs.

  • And the people doing neural networks

  • started using GPUs in about 2007.

  • I had one very good student called

  • Vlad Mnih, who started using GPUs for finding roads

  • in aerial images.

  • He wrote some code that was then used by other students

  • for using GPUs to recognize phonemes in speech.

  • And so they were using this idea pre-training.

  • And after they'd done all this pre-training,

  • then they'd just stick labels on top and use back propagation.

  • And it turned out that way, you could have a very deep net

  • that was pre-trained this way.

  • And you could then use back propagation,

  • and it actually worked.

  • And it sort of beat the benchmarks

  • for speech recognition initially just by a little bit.

  • NICHOLAS THOMPSON: It beat the best commercially available

  • speech recognition.

  • It beat the best academic work on speech recognition.

  • GEOFFREY HINTON: On a relatively small data set called TIMIT,

  • it did slightly better than the best academic work.

  • It also worked on at IBM.

  • And very quickly people realized that this stuff,

  • since it was beating standard models that

  • are taking 30 years to develop with a bit more development

  • would do really well.

  • And so my graduate students went off to Microsoft and IBM

  • and Google.

  • And Google was the fastest to turn it into a production

  • speech recognizer.

  • And by 2012, that work that was first done in 2009

  • came out in Android.

  • And Android suddenly got much better in speech recognition.

  • NICHOLAS THOMPSON: So tell me about that moment where you've

  • had this idea for 40 years, you've

  • been publishing on it for 20 years,

  • and you're finally better than your colleagues?

  • What did that feel like?

  • GEOFFREY HINTON: Well, back then I'd

  • only had the idea for 30 years.

  • NICHOLAS THOMPSON: Correct, correct, sorry, sir.

  • Just a new idea.

  • It's fresh.

  • GEOFFREY HINTON: It felt really good

  • that it finally got the state of the art on a real problem.

  • NICHOLAS THOMPSON: And do you remember

  • where you were when you first got the revelatory data?

  • GEOFFREY HINTON: No.

  • NICHOLAS THOMPSON: No, no, OK.

  • All right, so you realize it works on speech recognition.

  • When do you start applying it to other problems?

  • GEOFFREY HINTON: So then we start applying it

  • to all sorts of other problems.

  • So George Dahl, who was one of the people who

  • did the original work on speech recognition, applied it to--

  • I give you a lot of descriptors of a molecule

  • and you want to predict if that molecule will bind to something

  • to act as a good drug.

  • And there was a competition on Kaggle.

  • And he just applied our standard technology design

  • for speech recognition to predicting

  • the activity of drugs and it won the competition.

  • So that was a sign that this stuff sort of fairly universal.

  • And then I had a student called [INAUDIBLE],, who said,

  • you know, Geoff, this stuff is going

  • to work for image recognition.

  • And Fei-Fei Li has created the correct data set for it,

  • and it's a public competition.

  • We have to do that.

  • And so what we did was take an approach originally developed

  • by Yann LeCun.

  • A student called Alex Krizhevsky was a real wizard.

  • He could make GPUs do anything.

  • Programmed the GPUs really, really well.

  • And we got results that were a lot better

  • than standard computer vision.

  • That was 2012.

  • And it was a coincidence I think of the speech recognition

  • coming out in the Android.

  • So you knew this stuff could solve production problems.

  • And on vision in 2012, it had done much better

  • than the standard computer vision.

  • NICHOLAS THOMPSON: So those are three areas where it succeeded.

  • So modeling chemicals, speech, voice, where was it failing?

  • GEOFFREY HINTON: The failure is only temporary, you understand.

  • [LAUGHTER]

  • NICHOLAS THOMPSON: Where was it failing?

  • GEOFFREY HINTON: For things like machine translation,

  • I thought it would be a very long time before we could

  • do that because machine translation,

  • you've got a string of symbols comes in

  • and a string of symbols goes out.

  • And it's fairly plausible to say in between you do manipulations

  • on strings of symbols, which is what classical AI is.

  • Actually, it doesn't work like that.

  • Strings and symbols come in.

  • You turn those into great big vectors in your brain.

  • These vectors interact with each other.

  • And then you convert it back into strings and symbols

  • to go out.

  • And if you told me in 2012 that in the next five years,

  • we'll be able to translate between many languages using

  • just the same technology, recurrent nets,

  • but just the stochastic gradient descent

  • from random initial weights, I wouldn't have believed you.

  • It happened much faster than expected.

  • NICHOLAS THOMPSON: But so what distinguishes

  • the areas where it works the most quickly

  • and the areas where it will take more time?

  • It seems like the visual processing, speech recognition,

  • sort of core human things that we

  • do with our sensory perception seem to be the first barriers

  • to clear.

  • Is that correct?

  • GEOFFREY HINTON: Yes and no because there's other things

  • we do like motor control.

  • We're very good at motor control.

  • Our brains are clearly designed for that.

  • And that's only just now a neuron net's beginning

  • to compete with the best other technologies there.

  • They will win in the end.

  • But they're only just winning now.

  • I think things like reasoning, abstract reasoning,

  • they're the kind of last things we learn to do.

  • And I think they'll be among the last things

  • these neural nets learn to do.

  • NICHOLAS THOMPSON: And so you keep

  • saying that neural nets will win at everything eventually.

  • GEOFFREY HINTON: Well, we are neural nets, right?

  • Anything we can do they can do.

  • NICHOLAS THOMPSON: Right, but just

  • because the human brain is not necessarily the most efficient

  • computational machine ever created.

  • GEOFFREY HINTON: Almost certainly not.

  • NICHOLAS THOMPSON: So why could there not be--

  • certainly not my human brain.

  • Couldn't there be a way of modeling machines

  • that is more efficient than the human brain?

  • GEOFFREY HINTON: Philosophically, I

  • have no objection to the idea there could be some completely

  • different way to do all this.

  • It could be that if you start with logic

  • and you're trying to automate logic,

  • and you make some really fancy theorem prover,

  • and you do reasoning, and then you decide

  • you're going to do visual perception by doing reasoning,

  • it could be that that approach would win.

  • It turned out it didn't.

  • But I have no philosophical objection to that winning.

  • It's just we know that brains can do it.

  • NICHOLAS THOMPSON: Right, but there are also things

  • that our brains can't do well.

  • Are those things that neural nets also

  • won't be able to do well?

  • GEOFFREY HINTON: Quite possibly, yes.

  • NICHOLAS THOMPSON: And then there's

  • a separate problem, which is we don't know entirely

  • how these things work, right?

  • GEOFFREY HINTON: No, we really don't know how they work.

  • NICHOLAS THOMPSON: We don't understand how top down neural

  • networks work.

  • There is even a core element of how

  • neural networks work that we don't understand, right?

  • GEOFFREY HINTON: Yes.

  • NICHOLAS THOMPSON: So explain that

  • and then let me ask the obvious follow up,

  • which is, we don't know how these things work.

  • How can those things work?

  • GEOFFREY HINTON: OK, you ask that when I finish explaining.

  • NICHOLAS THOMPSON: Yes.

  • GEOFFREY HINTON: So if you look at current computer vision

  • systems, most of them, they're basically feed forward.

  • They don't use feedback connections.

  • There's something else about current computer vision

  • systems, which is they're very prone to have

  • visceral examples.

  • You can change a few pixels slightly

  • and something that was a picture of a panda

  • and still looks exactly like a panda to you,

  • it suddenly says that's an ostrich.

  • Obviously, the way you change the pixels is cleverly

  • designed to fool it into thinking it's an ostrich.

  • But the point is it still looks just like a panda to you.

  • And initially, we thought these things work really well.

  • But then when confronted with the fact

  • that they look at a panda and be confident it's an ostrich,

  • you get a bit worried.

  • And I think part of the problem there

  • is that they're not trying to reconstruct from the high level

  • representations.

  • They're trying to do descriptive learning where you just

  • learn layers of feature detectors

  • and the whole, whole objective is just to change the weights.

  • So you get better at getting the right answer.

  • They're not doing things like at each level of feature

  • detectors, check that you can reconstruct

  • the data in the layer below from the activities of these feature

  • detectors.

  • And recently in Toronto, we've been discovering,

  • or Nick Frost's been discovering,

  • that if you introduce reconstruction then

  • it helps you be more resistant to adversarial attack.

  • So I think in human vision, to do

  • the learning we do in reconstruction and also

  • because we're doing a lot of learning

  • by doing reconstructions, we are much more resistant

  • to adversarial attack.

  • NICHOLAS THOMPSON: But you believe

  • that top down communication in a neural network

  • is how you test, how you reconstruct,

  • how you test and make sure it's a panda not an ostrich?

  • GEOFFREY HINTON: I think that's crucial, yes.

  • Because I think if you--

  • NICHOLAS THOMPSON: But brain scientists

  • are not entirely agreed on that, correct?

  • GEOFFREY HINTON: Brain scientists

  • all agreed on the idea that if you

  • have two areas of the cortex in a perceptual pathway,

  • if there's connections from one to the other,

  • they'll always be backwards connections, not necessarily

  • point to point.

  • But there will always be a backwards pathway.

  • They're not agreed on what it's for.

  • It could be for attention.

  • It could be for learning, or it could be for reconstruction,

  • or it could be for all three.

  • NICHOLAS THOMPSON: And so we don't know what

  • the backwards communication is.

  • You are building your new neural networks on the assumption

  • that-- or you're building backwards communication that

  • is for reconstruction into your neural networks

  • even though we're not sure that's how the brain works.

  • GEOFFREY HINTON: Yes.

  • NICHOLAS THOMPSON: Isn't that cheating?

  • GEOFFREY HINTON: Not at all

  • NICHOLAS THOMPSON: If you're trying

  • to make it like the brain, you're

  • doing something we're not sure is like the brain.

  • GEOFFREY HINTON: Not at all.

  • NICHOLAS THOMPSON: OK.

  • GEOFFREY HINTON: There's two--

  • I'm not doing computational neuroscience science.

  • That is, I'm not trying to make a model of how the brain works.

  • I'm looking at the brain and saying this thing works.

  • And if we want to make something else that works,

  • we should sort of look to it for inspiration.

  • So this is neuro inspired, not a neural model.

  • So the neurons we use, they're inspired

  • by the fact neurons have a lot of connections

  • and they change the strings.

  • NICHOLAS THOMPSON: That's interesting.

  • So if I were in computer science and I

  • was working on neural networks, and I

  • wanted to beat Geoff Hinton, one thing I could do

  • is I could build in top down communication

  • and base it on other models of brain science.

  • So based on learning, not on reconstructing.

  • GEOFFREY HINTON: If they were better models, then you'd win,

  • yeah.

  • NICHOLAS THOMPSON: That's very, very interesting.

  • All right, so let's move to a more general topic.

  • So neural networks will be able to solve all kinds of problems.

  • Are there any mysteries of the human brain that will not be

  • captured by neural networks or cannot?

  • For example, could the emotion--

  • GEOFFREY HINTON: No.

  • NICHOLAS THOMPSON: No.

  • So love could be reconstructed by a neural network?

  • Consciousness can be constructed?

  • GEOFFREY HINTON: Absolutely, once you've figured

  • out what those things mean--

  • we are neural networks, right?

  • Now consciousness is something I'm particularly interested in.

  • I get by fine without it.

  • But um--

  • [LAUGHTER]

  • So people don't really know what they mean by it.

  • There's all sorts of different definitions.

  • And I think it's a pre-scientific term.

  • So 100 years ago, if you ask people what is life?

  • They would have said, well, living things have vital force.

  • And when they die, the vital force goes away.

  • And that's the difference between being alive and being

  • dead, whether you got vital force or not.

  • And now we don't think that sort of--

  • we don't have vital force.

  • We just think it's a pre-scientific concept.

  • And once you understand some biochemistry

  • and molecular biology, you don't need vital force anymore.

  • You understand how it actually works.

  • And I think it's going to be the same with consciousness.

  • I think consciousness is an attempt

  • to explain mental phenomena with some kind of special essence.

  • And this special essence, you don't need it.

  • Once you can really explain it, then you'll

  • explain how we do the things that make

  • people think we're conscious.

  • And you'll explain all these different meanings

  • of consciousness without having some special essence

  • as consciousness.

  • NICHOLAS THOMPSON: Right, so there's no emotion that

  • couldn't be created.

  • There's no thought that couldn't be created.

  • There's nothing that a human mind

  • can do that couldn't theoretically

  • be recreated by a fully functioning neural network

  • once we truly understand how the brain works.

  • GEOFFREY HINTON: There's something in a John Lennon song

  • that sounds very like what you just said.

  • [LAUGHTER]

  • NICHOLAS THOMPSON: And you're 100% confident of this?

  • GEOFFREY HINTON: No, I'm a Bayesian.

  • So I'm 99.9% confident.

  • NICHOLAS THOMPSON: OK, and what is the point one?

  • GEOFFREY HINTON: Well, we might, for example, all

  • be part of a big simulation.

  • NICHOLAS THOMPSON: True, fair enough, OK.

  • [LAUGHTER]

  • [APPLAUSE]

  • That actually makes me think it's more likely that we are.

  • All right, so what are we learning as we do this

  • and as we study the brain to improve computers?

  • How does it work in reverse?

  • What are we learning about the brain

  • from our work in computers?

  • GEOFFREY HINTON: So I think what we've learned in the last 10

  • years is that if you take a system with billions

  • of parameters, and you'd use stochastic gradient

  • descent in some objective function,

  • and the objective function might be to get the right labels

  • or it might be to fill in the gap in a string of words,

  • or any objective function, it works much better than it

  • has any right to.

  • It works much better than you would expect.

  • You would have thought, and most people in conventional AI

  • thought, take a system with a billion parameters,

  • start them off with random values,

  • measure the gradient of the objective function.

  • That is, for each parameter figure

  • out how the objective function would change if you change

  • that parameter a little bit.

  • And then change it in that direction that improves

  • the objective function.

  • You would have thought that would

  • be a kind of hopeless algorithm that will get stuck.

  • And it turns out, it's a really good algorithm.

  • And the bigger you scale things, the better it works.

  • And that's just an empirical discovery really.

  • There's some theory coming along,

  • but it's basically an empirical discovery.

  • Now because we've discovered that,

  • it makes it far more plausible that the brain

  • is computing the gradient of some objective function

  • and updating the weights of strength of synapses

  • to follow that gradient.

  • We just have to figure out how it gets the gradient

  • and what the objective function is.

  • NICHOLAS THOMPSON: But we didn't understand

  • that about the brain.

  • We didn't understand the re-weighted [INAUDIBLE]..

  • GEOFFREY HINTON: It was a theory.

  • It was-- I mean, a long time ago,

  • people thought that's a possibility.

  • But in the background, there was always

  • sort of conventional computer scientists saying, yeah,

  • but this idea of everything's random,

  • you just learn it all by gradient descent,

  • that's never going to work for a billion parameters.

  • You have to wire in a lot of knowledge.

  • NICHOLAS THOMPSON: All right, so--

  • GEOFFREY HINTON: And we know now that's wrong.

  • You can just put in random parameters

  • and learn everything.

  • NICHOLAS THOMPSON: So let's expand this out.

  • So as we learn more and more, we will presumably

  • continue to learn more and more about how the human brain

  • functions as we run these massive tests on models

  • based on how we think it functions.

  • Once we understand it better, is there

  • a point where we can, essentially,

  • rewire our brains to be more like the most

  • efficient machines or change the way we think?

  • GEOFFREY HINTON: You'd have thought--

  • NICHOLAS THOMPSON: If it's a simulation that should be easy,

  • but not in a simulation.

  • GEOFFREY HINTON: You'd have thought

  • that if we really understand what's going on,

  • we should be able to make things like education work better,

  • and I think we will.

  • NICHOLAS THOMPSON: We will?

  • GEOFFREY HINTON: Yeah.

  • It would be very odd if you could finally

  • understand what's going on in your brain

  • and how it learns and not be able to adapt the environment

  • so you can learn better.

  • NICHOLAS THOMPSON: Well, OK, I don't want

  • to go too far into the future.

  • But a couple of years from now, how

  • do you think we will be using what we've learned

  • about the brain and about how deep learning works to change

  • how education functions?

  • How would you change a class?

  • GEOFFREY HINTON: In a couple of years,

  • I'm not sure we'll learn much.

  • I think it's going to change the education.

  • It's going to be longer.

  • But if you look at it, Assistants

  • are getting pretty smart now.

  • And once Assistants can really understand conversations,

  • Assistants can have conversations with kids

  • and educate them.

  • So already, I think most of the new knowledge I acquire

  • comes from me thinking, I wonder,

  • and typing something to Google and Google

  • tells me, if I could just have a conversation,

  • I'd acquire knowledge even better.

  • NICHOLAS THOMPSON: And so theoretically,

  • as we understand the brain better,

  • and as we set our children up in front of Assistants.

  • Mine right now almost certainly based on the time in New York

  • is yelling at Alexa to play something on Spotify, probably

  • "Baby Shark"--

  • you will program the Assistants to have better conversations

  • with the children based on how we know they'll learn?

  • GEOFFREY HINTON: Yeah, I haven't really thought much about this.

  • It's not what I do.

  • But it seems quite plausible to me.

  • NICHOLAS THOMPSON: Will we be able to understand

  • how dreams work, one of the great mysteries?

  • GEOFFREY HINTON: Yes, I'm really interested in dreams.

  • NICHOLAS THOMPSON: Good, well, let's talk about that,

  • GEOFFREY HINTON: I'm so interested.

  • I have at least four different theories of dreams.

  • NICHOLAS THOMPSON: Let's hear them all--

  • 1, 2, 3, 4.

  • GEOFFREY HINTON: So a long time ago, there were things called--

  • OK, a long time ago there was Hopfield networks.

  • And they would learn memories as local attractors.

  • And Hopfield discovered that if you try and put

  • too many memories in, they get confused.

  • They'll take two local attractors

  • of merged them into an attractor sort of halfway in between.

  • Then Francis Crick and Graeme Mitchison came along and said,

  • we can get rid of these false minima by doing unlearning.

  • So we turn off the input.

  • We put the neural network into a random state.

  • We let it settle down, and we say that's bad.

  • Change the connections so you don't settle to that state.

  • And if you do a bit of that, it will

  • be able to store more memories.

  • And then Terry Sejnowski and I came along and said, look,

  • if we have not just the neurons where you're

  • storing the memories, but lots of other neurons, too,

  • can we find an algorithm that we'll

  • use all these other neurons to help you store memories?

  • And it turned out in the end, we came up with the Boltzmann

  • machine learning algorithm.

  • And the Boltzmann machine learning algorithm

  • had a very interesting property which is I show you data.

  • That is, I fixed the states of the observable units.

  • And it sort of rattles around the other units

  • until it's got a fairly happy state.

  • And once it's done that, it increases

  • the strength of all the connections based

  • on if two units are both active, it

  • increases connection strength.

  • That's called kind of Hebbian learning.

  • But if you just do that, the connection strengths

  • just get bigger and bigger.

  • You also have to have a phase where you cut it off

  • from the input.

  • You let it rattle around to settle

  • into a state it's happy with.

  • So now it's having a fantasy.

  • And once it's had the fantasy you

  • say, take all passive neurons that are active

  • and decrease the strength to the connection.

  • So I'm explaining the algorithm to you just as a procedure.

  • But actually that algorithm is the result of doing some math

  • and saying, how should you change these connection

  • strengths so that this neural network with all

  • these hidden units finds the data unsurprising?

  • And it has to have this other phase.

  • It has to have this what we call the negative phase when

  • it's running with no input.

  • And it's canceling out--

  • its unlearning whatever state it settles into.

  • Now what Crick pointed out about dreams

  • is that, we know that you dream for many hours every night.

  • And if I wake you up at random, you

  • can tell me what you were just dreaming about because it's

  • in your short term memory.

  • So we know you dream for many hours.

  • But in the morning, you wake up, you

  • can remember the last dream, but you

  • can't remember all the others, which is lucky because you

  • might mistake them for reality.

  • So why is it that we don't remember our dreams at all?

  • And Crick's view was it's the whole point of dreaming

  • is to unlearn those things so you put the learning

  • rule in reverse.

  • And Terry Sejnowski and I showed that actually that

  • is a maximum [INAUDIBLE] learning procedure

  • for Boltzmann machines.

  • So that's one theory of dreaming.

  • NICHOLAS THOMPSON: You showed that theoretically?

  • GEOFFREY HINTON: Yeah, we should theoretically

  • that's the right thing to do if you want

  • to change the weights so that your big neural network finds

  • the observed data less surprising.

  • NICHOLAS THOMPSON: And I want to go to your other theories,

  • but before we lose this thread, you've

  • proved that it's efficient.

  • Have you actually set any of your deep learning algorithms

  • to essentially dream?

  • Right, study this image data set for a period of time,

  • resort, study again, resort versus a machine

  • that's running continuously?

  • GEOFFREY HINTON: So yes, we had machine learning algorithms.

  • Some of the first algorithms that

  • could learn what to do with hidden units

  • were Boltzmann machines.

  • They were very inefficient.

  • But then later on, I found a way of making approximations

  • to them that was efficient.

  • And those were actually the trigger

  • for getting deep learning going again.

  • Those were the things that learned one layer feature

  • detector at a time.

  • And it was efficient form of restricted Boltzmann machine.

  • And so it was doing this kind of unlearning.

  • But rather than going to sleep, that one

  • would just fantasize for a little bit

  • after each data point.

  • NICHOLAS THOMPSON: So Androids do dream of electric sheep.

  • So let's go to theories 2, 3, and 4.

  • GEOFFREY HINTON: OK, theory 2 was called

  • the wake-sleep algorithm.

  • And you want to learn a generative model.

  • So you have the idea that you're going to have

  • a model that can generate data.

  • It has layers of features detectors.

  • And it activates the high level ones and the low level ones

  • and so on, until it activates pixels, and that's an image.

  • You also want to learn the other way.

  • You want to learn to recognize data.

  • And so you're going to have an algorithm that has two phases.

  • In the wake phase, data comes in.

  • It tries to recognize it.

  • And instead of learning the connections

  • it is using for recognition, it's

  • learning the generative connections.

  • So data comes in.

  • I activate the hidden units, and then I

  • learn to make those hidden units be good at reconstructing

  • s that data.

  • So it's learning to reconstruct at every layer.

  • But the question is, how do you learn the forward connection?

  • So the idea is, if you knew the forward connections,

  • you could learn the backward connections because you

  • could learn to reconstruct.

  • NICHOLAS THOMPSON: Yeah.

  • GEOFFREY HINTON: Now it also turns out

  • that if you knew the backward connections,

  • you could learn the forward connections

  • because what you could do is start at the top

  • and just generate some data.

  • And because you generated the data,

  • you'd know the states of all the hidden layers.

  • And so you could learn the forward connections

  • to recover those states.

  • So that would be the sleep phase.

  • When you turn off the input, you just generate data

  • and then you try and reconstruct the hidden units

  • that generated the data.

  • And so if you know the top down connections,

  • you'd learn the bottom up ones.

  • If you know the bottom up ones, you

  • could learn the top down ones.

  • And so what's going to happen if you start

  • with random connections and try doing both-- alternate both

  • kinds of learning and it works.

  • Now to make it work well, you have

  • to do all sorts of variations of it.

  • But it works.

  • NICHOLAS THOMPSON: All right, that is--

  • do you want to go through the other two theories?

  • We only have eight minutes left.

  • I think we should probably jump through some other questions.

  • We'll deal with--

  • GEOFFREY HINTON: If you give me another hour,

  • I could do the other two theories.

  • [LAUGHTER]

  • NICHOLAS THOMPSON: All right, well, Google I/O 2020.

  • So let's talk about what comes next.

  • So where is your research headed?

  • What problem are you trying to solve now?

  • GEOFFREY HINTON: The main thing I'm trying to solve,

  • which I've been doing for a number of years now--

  • actually, I'm reminded of a soccer commentator.

  • You may notice soccer commentators,

  • they always say things like they're doing very well,

  • but they always go wrong on the last pass.

  • And they never seem to sort of notice there's something

  • funny about that.

  • It's a bit circular.

  • So I'm working-- eventually, you're

  • going to end up working on something you don't finish.

  • And I think I may well be working

  • on the thing I never finish.

  • But it's called capsules, and it's

  • a theory of how you do visual perception using reconstruction

  • and also how you root information

  • to the right places.

  • And the two main motivating factors

  • were in standard neural nets, the information--

  • the activity in the layer just automatically goes somewhere.

  • You don't make decisions about where to send it.

  • The idea of capsules was to make decisions

  • about where to send information.

  • Now since I started working on capsules,

  • some other very smart people at Google

  • invented transformers, which are doing the same thing.

  • They're deciding where to route information,

  • and that's a big win.

  • The other thing that motivated capsules was coordinate frames.

  • So when humans do visual, they're

  • always using coordinate frames.

  • And if they impose the wrong coordinate frame on an object,

  • they don't even recognize the object.

  • So I'll give you a little task.

  • Imagine a tetrahedron.

  • It's got a triangular base and three triangular faces,

  • all equilateral triangles.

  • Easy to imagine, right?

  • Now imagine slicing it with a plane.

  • So you get a square cross-section.

  • That's not so easy, right?

  • Every time you slice it, you get a triangle.

  • It's not obvious how you get a square.

  • It's not at all obvious.

  • OK, but I give you the same shape described differently.

  • I need your pen.

  • Imagine, the shape you get, if you take a pen

  • like that, another pen that right angles like this,

  • and you connect all points on this pen

  • to all points on this pen.

  • That's a solid tetrahedron.

  • OK, you're seeing it relative to a different coordinate frame

  • where the edges of the tetrahedron--

  • these two line up with the coordinate frame.

  • And for this, if you think of the tetrahedron that way,

  • it's pretty obvious that at the top,

  • you've got a long rectangle this way.

  • At the bottom, you get a long rectangle that way.

  • And there's [INAUDIBLE] that you've got

  • to get a square in the middle.

  • So it's pretty obvious how you can slice it to get a square.

  • But that's only obviously if you think of it

  • with that coordinate frame.

  • So it's obvious that for humans, coordinate frames are

  • very important for perception.

  • And they're not at all important for conv nets.

  • For conv nets, if I show you a tilted square

  • and an upright diamond, which is actually the same thing,

  • they look the same to a conv net.

  • It doesn't have two alternative ways

  • of describing the same thing.

  • NICHOLAS THOMPSON: But how is adding coordinate frames

  • to your model not the same as the error

  • you were making in the '90s where

  • you were trying to put rules into the system as opposed

  • to letting the system be unsupervised?

  • GEOFFREY HINTON: It is exactly that error.

  • And because I am so adamant that that's a terrible error,

  • I'm allowed to do a tiny bit of it.

  • It's sort of like Nixon negotiating with China.

  • [LAUGHTER]

  • Actually that puts me in a bad role.

  • Anyway, so if you look at conv nets,

  • they're just neural nets where you wired

  • in a tiny bit of knowledge.

  • You add in the knowledge that if a feature detector is good

  • here, it's good over there.

  • And people would love to wire in just a little bit

  • more knowledge about scale and orientation.

  • But if you do it in the obvious way

  • of having a 4D grid instead of a 2D grid,

  • the whole thing blows up on you.

  • But you can get in that knowledge about what viewpoint

  • does to an image by using coordinate frames the same way

  • they do them in graphics.

  • So now you have a representation in one layer.

  • When you try and reconstruct the parts of an object in the layer

  • below, when you do that reconstruction,

  • you can take the coordinate frame of the whole object

  • and multiply it by the part whole relationship

  • to get the coordinate frame of the part.

  • And you can wire that into the network.

  • You can wire into the network the ability

  • to do those coordinate transformations.

  • And that should make it generalize much, much better.

  • It should mean the networks just find viewpoint very easy

  • to deal with.

  • Current neural networks find viewpoint other

  • than translation very hard to deal with.

  • NICHOLAS THOMPSON: So your current task

  • is specific to visual recognition,

  • or it is a more general way of improving or coming up

  • with the rule set for coordinate frames?

  • GEOFFREY HINTON: OK, it could be used for other things.

  • But I'm really interested in the use for visual recognition.

  • NICHOLAS THOMPSON: OK, last question.

  • I was listening to a podcast you gave the other day.

  • And in it, you said that the people whose ideas you value

  • most are the young graduate students who come into your lab

  • because they aren't locked into the old perceptions.

  • They have fresh ideas, and yet they also know a lot.

  • Is there anything that you, sort of looking outside yourself,

  • you think you might be locked into that a new graduate

  • student or somebody in this room who came to work with you

  • would shake up?

  • GEOFFREY HINTON: Yeah, everything I said.

  • NICHOLAS THOMPSON: Everything you said.

  • [LAUGHTER]

  • Take out those coordinate units.

  • Work on feature three, work on feature four.

  • I wanted to ask you a separate question.

  • So deep learning used to be a distinct thing,

  • and then it became sort of synonymous with the phrase AI.

  • And then AI is now a marketing term

  • that basically means using a machine in any way whatsoever.

  • How do you feel about the terminology

  • as the man who helped create this?

  • GEOFFREY HINTON: Well, I was much happier when

  • there was AI, which meant your logic inspired

  • and you do manipulations on cymbal strings.

  • And there was neural nets, which means

  • you want to do learning in a neural network.

  • And they were completely different enterprises

  • that really sort of didn't get along too well

  • and fought for money.

  • That's how I grew up.

  • And now I see sort of people who spent

  • years saying neural networks are nonsense,

  • saying I'm an AI professor.

  • So I need money.

  • And it's annoying.

  • NICHOLAS THOMPSON: So your field succeeded

  • kind of ate or subsumed the other field, which

  • then gave them an advantage in asking for money,

  • which is frustrating?

  • GEOFFREY HINTON: Yeah, now it's not entirely fair

  • because a lot of them have actually converted.

  • NICHOLAS THOMPSON: Right, so wonderful.

  • Well, I've got time for one more question.

  • So in that same interview, you were talking about AI.

  • And you said, think of it like a backhoe, a backhoe that

  • can build a hole, or if not constructed properly,

  • can wipe you out.

  • And the key is when you work on your backhoe

  • to design it in such a way that it's best to build a hole

  • and not to clock you in the head.

  • As you think about your work, what

  • are the choices you make like that?

  • GEOFFREY HINTON: I guess I would never deliberately

  • work on making weapons.

  • I mean, you could design a backhoe

  • that was very good at knocking people's heads off.

  • And I think that would be a bad use of a backhoe,

  • and I wouldn't work on it.

  • NICHOLAS THOMPSON: All right, well, Geoffrey Hinton--

  • extraordinary interview.

  • All kinds information-- will be back

  • next year to talk about dreams theories three and four.

  • That was so much fun.

  • Thank you.

  • [MUSIC PLAYING]

[MUSIC PLAYING]

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

與圖靈獎得主、深度學習先驅Geoffrey Hinton的爐邊談話(Google I/O'19)。 (A Fireside Chat with Turing Award Winner Geoffrey Hinton, Pioneer of Deep Learning (Google I/O'19))

  • 3 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字