Placeholder Image

字幕列表 影片播放

  • [ MUSIC ]

  • [ APPLAUSE ]

  • KARASICK: You all know...now know why Dr. Bengio did not need much of an introduction.

  • Thank you.

  • Let me also introduce Fei-Fei Li from Stanford.

  • She's head of the AI division lab at Stanford.

  • I know her has the image net lady.

  • And I guess we'll talk a little bit about image net challenges as we go through the discussion.

  • And John Smith is an IBM Fellow.

  • Long history of research in visual information retrieval, deep learning.

  • And I'm going to be a little selfish.

  • So, I'm a trained computer scientist, but I'm not an AI person

  • by training; lots of other things.

  • So, I thought this was a real opportunity to educate me.

  • So, I'm going ask questions that I am curious about in order to get the discussion started.

  • And you know, the first thing I wanted to ask the three of you is what I would call kind

  • of the hype correction sort of a question.

  • Computer science has a kind of a shiny object property that we're all pretty familiar with.

  • And every so often, a group or individual comes up with a breakthrough.

  • And everybody kind of runs over and the teeter-totter tips to the side.

  • And so, I always worry, whenever I see that, it's oh, God, what aren't people working on?

  • So, I guess I was interested, Yoshua, your talk really got to this.

  • It was a discussion about what we know how do and we're trying to know how to do.

  • And I guess I'd like to ask the three of you, maybe starting with Fei-Fei,

  • what do we think the limits of deep learning technologies are?

  • What won't be able to do?

  • You weren't here at the beginning, but Terry Sejnowski,

  • in the previous session, said don't trust anybody.

  • So, what do you think is sort of beyond in kind of technology?

  • LI: Okay. So, thank you, first of all, for the invitation.

  • And great, talk, Yoshua.

  • I think already Yoshua mentioned the law.

  • So, first of all, deep learning is a dynamic changing area of research.

  • So, it's very hard to pinpoint what exactly is deep learning.

  • In computer vision, a lot of people who, when they talk about refer to deep learning

  • and the success of deep learning is really the specific evolution on your network model,

  • architecture, that's unsupervised training with big data,

  • meaning image that mostly that does object recognition.

  • And that is a very narrow definition of deep learning.

  • So, when you ask the limitation of deep learning, one way to answer,

  • there's no limitation if deep learning keeps evolving.

  • That's a little bit of an irresponsible answer, I recognize, and just to be brief,

  • but I also want to echo Yoshua, in my opinion, the quest towards AI is especially

  • in my own area of computer vision that goes from perception to cognition to reasoning.

  • And that whole path, we have just begun to get a grasp.

  • We're doing very well with perception, thanks to data and these high capacity models,

  • but beyond the basic building blocks of perception such as speech

  • and object recognition, the next thing is really a slew of cognitive tasks

  • that we're not totally getting our hands on yet.

  • We begin to see question and answering or QA.

  • We begin to see image captions with grounding.

  • We're beginning, just beginning to see these budding areas of research

  • and down the road, how do we reason?

  • How do we reason in novel situations?

  • How do we learn to learn?

  • How do we incorporate intentions, predictions, emotions?

  • So, all those are still on the horizon.

  • KARASICK: Yoshua.

  • BENGIO: So, I will repeat what Terry said.

  • Like, until we have a mathematical proof, we don't know that isn't possible.

  • That being said, for sure, if you look at the current technology, there are challenges.

  • I don't know if there are impossibilities, but they are clearly challenges.

  • One of the challenges I've worked on for more

  • than two decades is the long-term dependencies challenges.

  • So, as soon as you start dealing with sequences, there are optimization challenges

  • and that makes it hard to learn, to train those neural nets to do their job.

  • Even for simple tasks.

  • And we've been studying this problem for 20 years.

  • We're making incredible progress, but it's still an obstacle.

  • And there are other challenges, like I mentioned some of the challenges come up in inference,

  • in [INAUDIBLE] learning that seem intractable.

  • But of course, at the same time, we know brains do a pretty good job of these tasks.

  • So, there's got to be some approximate methods, and we have already some,

  • that are doing very well on these things.

  • And that's what the research is really about.

  • KARASICK: John?

  • SMITH: Yes.

  • I think this is a good question to ask, is there hype and if there's hype, why is there hype?

  • I think the one thing that's clear is there's a lot of attention being given to deep learning.

  • But to some extent, it is warranted because performance is there.

  • And it's hard to argue against performance.

  • So, for many years, my colleagues here, Professor Bengio has worked on neural nets,

  • and it actually took a timing I think of many things at once,

  • of Fei-Fei's work on image net sort of coming at the right time with computation

  • that actually let people realize that there's a class of problems

  • that were previously very difficult,

  • like classifying 1000 object categories, we are now essentially solvable.

  • So, I think what we're seeing, some of these tasks

  • which we thought were very difficult are now solved.

  • So, image net, you know, 1,000 categories is essentially solved.

  • Some other data sets, like labeled faces in the wild,

  • which is face recognition, essentially solved.

  • So, I think it's hard to argue against that kind of performance.

  • And I think the question for us now is, what else should we do?

  • So, it is a shiny object.

  • But there's a lot more out there, at least in the vision sense.

  • I think we know very little about what the world looks like

  • or how to teach a computer what the world looks like.

  • But I think we're in a very good time now that we have this shiny object and we can think

  • about scaling it to a much larger set of tasks.

  • KARASICK: Thanks.

  • One of the...this is a good segue to something else I wanted to talk about.

  • One of the things that cause me to have one of the most fun jobs on the planet,

  • which is managing a bunch of researchers and developers building up Watson have bridging

  • out between at least frankly...what did you say?...

  • constantly changing technologies and sort of the pragmatics of those pesky customers

  • who don't want to use the system to do things.

  • One of the biggest challenges we have is this whole area we talk

  • about called real world evidence.

  • And it really is a discussion about reasoning in a particular domain.

  • So, if you are going to put a system like Watson in front of an oncologist, and we have,

  • and they're going to ask questions, they're going to get answers.

  • The first thing they're going to want to know is why.

  • Why did the linguistic inference engine decide that this particular passage, phrase, document,

  • was a better answer to the question than that one.

  • And I also get this when I ask my team about how much fun it is to debug these things,

  • and you actually are [hat on a hat] on a whatever that was is maybe a good illustration

  • of some of the challenges of really trying

  • to get underneath how these things work fundamentally.

  • So, how about this notion of why as opposed to what these things do?

  • Anybody? BENGIO: It's interesting you ask this question.

  • It's a very common question.

  • What if we had a human in front of us doing the job?

  • Sometimes a human is able to explain their choice.

  • And sometimes they're not really able to explain their choice.

  • And the way we trust that person is mostly because he does the right thing most of the time

  • or we have some reasons to believe that.

  • So, I think there will be progress in our technical abilities to figure out the why,

  • why is it taking those decisions, but it's always going

  • to be an approximation to the real thing.

  • The real thing is very complicated.

  • You have these millions of computations taking place.

  • The reason why it's making this decision is hidden in those millions of computations.

  • And it's going to be true essentially of any complex enough system.

  • So, the why is going to be an approximation, but still,

  • sometimes it can give you the queues that you need to figure it out.

  • But ultimately we can't really have a completely clearing picture of why it's doing it.

  • One thing I want to add is I think there's going to be progress in that direction

  • as we advance on the natural language side.

  • For example, think of the example I gave with the images and the sentences.

  • So, maybe you can think of the task was not

  • to actually describe the image but do something with it.

  • But now you can ask the computer about what it sees in the image,

  • even though that was not the task, to get a sense of, you know,

  • why it's getting things wrong and even ask where it was seeing these things.

  • So, we can design the system so that we can have some answers,

  • and the machine can actually talk back in English about what's going on inside.

  • LI: And just to add, I think, you know, in most of our research,

  • the interpretability is what you call why.

  • And a lot of us are making effort into that.

  • In addition to the image captioning work that both of our labs have worked on in terms

  • of not only generating the sentence but grounding back the words

  • into the spatial region where the words make sense.

  • For example, we're recently working on videos and using a lot

  • of the attention-based LSTM models and there we're looking

  • at how we can actually explain using some of these attention models

  • where actions are taking place in the temporal spatial segment of a long video.

  • So, all these attempts are trying to understand the why question

  • or at least make the model interpretable.

  • SMITH: Yes, I think the question is why is actually a very important one in applications.

  • I think particularly as we look to apply deep learning techniques in industry problems.

  • I'll give one example.

  • So, one of the areas that we're applying deep learning techniques is

  • around melanoma detection.

  • So, looking at skin cancer, looking at skin lesion images

  • and essentially training the computer based on those types of lesions.

  • And what we know is possible, actually it's...the essential value proposition

  • of a deep learning is that we can learn a representation from those images,

  • from those pixels that can be very effective for then building discrimination and so on.

  • So, we can actually get the systems to be accurate using deep learning techniques.

  • But these representations are not easy for humans to understand.

  • They're actually very different

  • from how clinicians would look at the features of those images.

  • So, around melanoma, around skin lesions in particular,

  • doctors are trained to look at sort of ABCDE.

  • Asymmetry, border, color, diameter, evolution, those kinds of things.

  • And so when our system is making some decisions about these images,

  • it's not conveying that information in ABCDE.

  • So, it actually can get to a better result in the end,

  • but it's not something that's easily consumable by that clinician,

  • ultimately who needs to make the decision.

  • So, I think we have to...we do have to think about how we're going to design these systems

  • to convey not only final classifications, but a set of information,

  • a set of features in some cases that make sense to those humans who need...

  • BENGIO: You could just train to also output...

  • SMITH: You can do that, yes, absolutely.

  • Yes. Right.

  • So, I think there are thing that can be done, but the applications may give these requirements

  • and it may influence how we use deep learning.

  • KARASICK: Yes.

  • I think there's going to be kind of a long interesting discussion as you look at the use

  • of these algorithms in regulated settings, how to characterize them in such a way

  • that the regulators are happy campers, whatever the technical term is.

  • So, let's continue on this discussion around, if you like, domains.

  • One of the things that I've seen about systems like this, you know,

  • the notion of what's an application is a function of who you are.

  • So, an application of deep learning, talk about image, speech,

  • question and answer, natural language processing.

  • When you climb up into a, if you like, an industrial domain, the things that people

  • who give IBM money understand, banks, governments, insurance companies,

  • now increasingly folks in the healthcare industry, there's really a lot of very,

  • very deep domain knowledge that we have used to train systems like Watson.

  • One of the things that's both a blessing and a curse

  • with deep learning is this...you get taken away from some of the more traditional things

  • like feature engineering that we've all seen.

  • But on the other hand, the feature engineering

  • that you see really embeds deep understanding and knowledge of the domain.

  • So, to me, and I'm pretty simple-minded about this stuff, we are going to have

  • to see how these two different worlds come together so that we can mix understanding

  • and knowledge and reasoning and a domain with the kinds of things that we're beginning to see,

  • you know, starting the classification and lifting up on deep learning.

  • So, research in this area?

  • What are people...

  • BENGIO: So, first of all, if you have features that you believe are good,

  • there's nothing that prevents you from using them as extra input.

  • KARASICK: Absolutely.

  • BENGIO: You can use the raw thing.

  • You can use your features.

  • You can use both.

  • That's perfectly fine.

  • But you have to sometimes think of it, where are you going to put them in the system.

  • But typically, there's nothing that prevents you from using them.

  • Also, researchers working on deep learning have been very creative in ways

  • of incorporating prior knowledge.

  • So, in computer vision, they could tell you

  • about the different approaches that people have used.

  • There are lots of thing we know about images, we can use essentially

  • to provide more data, more examples.

  • Like transformations of images.

  • And of course, the architectures themselves we're using by the convolutional nets,

  • they also incorporate prior knowledge.

  • And we can play with that if we have other kinds of knowledge,

  • we can sometimes change the architecture accordingly.

  • And one of the most power ways in which we can incorporate prior knowledge is

  • that we have these intermediate presentations and we can preassign meaning

  • to some of these representations.

  • You could say, well, okay, so that part of the representation is supposed