Placeholder Image

字幕列表 影片播放

  • [ MUSIC ]

  • [ APPLAUSE ]

  • BENGIO: Thank you.

  • All right.

  • Thank you for being here and participating in this colloquium.

  • So, I'll tell you about some of the things that are happening in deep learning,

  • but I only have 30 minutes so I'll be kind of quickly going through some subjects

  • and some challenges for scaling up deep learning towards AI.

  • Hopefully you'll have chances to ask me some questions during the panel that follows.

  • One thing I want to mention is I'm writing a book.

  • It's called Deep Learning, and you can already download most of the chapters.

  • These are draft versions of the chapters from my web page.

  • It's going to be an MIT Press book hopefully next year.

  • So, what is deep learning and why is everybody excited about it?

  • First of all, deep learning is just an approach to machine learning.

  • And what's particular about it, as Terry was saying, it's inspired by brains.

  • Inspired, we're trying to understand some of the principles, computational

  • and mathematical principles that could explain the kind of intelligence based

  • on learning that we see in brains.

  • But from a computer science perspective,

  • the idea is that these algorithms learn representations.

  • So, representations is a central concept in deep learning, and, of course,

  • the idea of learning representations is not new.

  • It was part of the deal of the original neural nets,

  • like the Boltzmann machine and the back prop from the '80s.

  • But what's new here and what happened about ten years ago is a breakthrough that allowed us

  • to train deeper neural networks, meaning that have multiple levels of representation.

  • And why is that interesting?

  • So already I mentioned that there are some theoretical results showing

  • that you can represent some complicated functions that are the result of the many levels

  • of compositions efficiently with these deep networks, whereas you might --

  • or in general, you won't be able to represent these kinds of functions

  • with a shallow network that doesn't have enough levels.

  • What does it mean to have more depth?

  • It means that you're able to represent more abstracts concepts,

  • and these more abstract concepts allow these machines to generalize better.

  • So, that's the essence of what's going on here.

  • All right.

  • So, the breakthrough happened in 2006 where, for the first time,

  • we were able to train these deeper networks and we used unsupervised learning for that,

  • but it took a few years before these advances made their way

  • to industry and to large scale applications.

  • So, it started around 2010 with speech recognition.

  • By 2012, if you had an Android phone, like this one, well,

  • you had neural nets doing speech recognition in them.

  • And now, of course, it's everywhere.

  • For speech, it's changed the field of speech recognition.

  • Everything uses it, essentially.

  • Then about two years later, 2012, there was another breakthrough using convolution networks,

  • which are a particular kind of deep networks that had been around for a long time

  • but that have been improved using some

  • of the techniques we discovered along these -- in recent years.

  • Really allowed us to make big impact in the field of computer vision

  • and object recognition, in particular.

  • So, I'm sure [Faye Faye] will say a few words later about that event and then the role

  • of the image net dataset in this.

  • But what's going on now is that neural nets are going beyond their traditional realm

  • of perception and people are exploring how to use them for understanding language.

  • Of course, we haven't yet solved that problem.

  • This is where a lot of the action is now and, of course,

  • continues a lot of research and R&D and computer vision.

  • Now, for example, expanding to video and many other areas.

  • But I'm particularly interested in the extension of this field in natural language.

  • There are other areas.

  • You've heard about reinforcement learning.

  • There is a lot of action there, robotics, control.

  • So, many areas of AI are now more and more seeing the potential gain coming

  • from using these more abstract systems.

  • So, today, I'm going to go through three of the main challenges that I see

  • for bringing deep learning, as we know it today, closer to AI.

  • One of them is computational.

  • Of course, for a company like IBM and other companies

  • that build machines, this is an important challenge.

  • It's an important challenge because what we've observed is

  • that the bigger the models we are able to train,

  • given the amount of data we currently have, the better they are.

  • So, you know, we just keep building bigger models

  • and hopefully we're going to continue improving.

  • Now, that being said, I think it's not going to be enough so there are other challenges.

  • One of them I mentioned has to do with understanding language.

  • But understanding language actually requires something more.

  • It requires a form of reasoning.

  • So, people are starting to use these recurrent nets you heard about, recurrent networks

  • that can be very deep, in some sense, when you consider time in order

  • to combine different pieces of evidence, in order to provide answers to questions.

  • And essentially, displayed in different forms of reasoning.

  • So, I'll say a few words about that challenge.

  • And finally, maybe one of the most important challenges that's maybe more fundamental even is

  • the unsupervised learning challenge.

  • Up to now, all of the industrial applications of deep learning have exploited supervised learning

  • where we have labeled the data we've said in that image, it's a cat.

  • In that image, there's a desk, and so on.

  • But there's a lot more data we could take advantage of that's unlabeled,

  • and that's going to be important because all of na information we need to build these AIs has

  • to come from somewhere, and we need enough data, and most of it is not going to be labeled.

  • Right. So, as I mentioned, and I guess as my colleague,

  • Ilya Sutskever from Google keeps saying, bigger is better.

  • At least up to now, we haven't seen the limitations.

  • I do believe that there are obstacles, and bigger is not going to be enough.

  • But clearly, there's an easy path forward with the current algorithms just

  • by making our neural nets a hundred times faster and bigger.

  • So, why is that?

  • Basically, what I see in many experiments with neural nets right now is that they --

  • I'm going to use some jargon here.

  • They under fit, meaning that they're not big enough or we don't train them long enough

  • for them to exploit all of the information that there is in the data.

  • And so they're not even able to learn the data by heart, right,

  • which is the thing we usually want to avoid in machine learning.

  • But that comes almost for free with these networks, and so we just have to press

  • on the pedal of more capacity and we're almost sure to get an improvement here.

  • All right.

  • To just illustrate graphically that we have some room to approach the size of human brains,

  • this picture was made up by my former student, Ian Goodfellow, where we see the sizes

  • of different organisms and neural nets over the years so the DBN here was from 2006.

  • Of the AlexNet is the breakthrough network of 2012 for computer vision,

  • and the AdamNet is maybe a couple of years old.

  • So, we see that the current technology is maybe between a bee and a frog in terms of size

  • of the networks for about the same number of synapses.

  • So, we've almost reached the kind of average number of synapses you see in natural brains,

  • between a thousand and ten thousand.

  • In terms of number of neurons, we're several orders of ranking away.

  • So, I'm going to tell you a little bit about a stream of research we've been pushing in my lab,

  • which is more connected to the computing challenge and potentially part

  • of our implementation, which is can we train neural nets that have very low precision.

  • So, we had a first paper at ICLR.

  • By the way, ICLR is the deep learning conference, and it happens every year now.

  • Yann Lecun and I started it in 2013 and it's been an amazing success

  • that year and every year since then.

  • We're going to have a third version next May.

  • And so we wanted to know how many bits do you actually require.

  • Of course, people have been asking these kinds of questions for decades.

  • But using sort of the current state of the art neural nets and we found 12,

  • and I can show you some pictures how we got these numbers on different data sets

  • and comparing different ways of representing numbers with fixed point or dynamic fixed point.

  • And also, depending on where I use those bits, you actually need less bits

  • in the activations than in the weights.

  • So, you need more rescission in the weights.

  • So, that was the first investigation.

  • But then we thought -- so that's the --

  • for the weights, that's the number of bits you actually need to keep the information

  • that you are accumulating from many examples.

  • But when you actually run your system during training, especially,

  • maybe you don't need all those bits.

  • Maybe you can get the same effect by introducing noise

  • and discretizing randomly those weights to plus one or minus one.

  • So, that's exactly what we did.

  • The idea is -- the cute idea here is that we can replace a real number by a binary number

  • that has the same expected value by, you know, sampling those two values with a probability

  • such as that the expected value is the correct one.

  • And now, instead of having a real number to multiply,

  • we have a bit to multiply, which is easy.

  • It's just an addition.

  • And why would we do that?

  • Because we want to get rid of multiplications.

  • Multiplications is what takes up most of the surface area on chips for doing neural nets.

  • So, we had a first try at this, and this is going to be presented at the next NIPS

  • in the next few weeks in Montreal.

  • And it allows us to get rid of the multiplications in the feed forward computation

  • and in the backward computation where we compute gradients.

  • But we remained with the multiplication -- even if you discretize the weights,

  • there is another multiplication at the end of the back prop

  • where you multiply -- you don't multiply weights.

  • You multiply activations and gradients.