字幕列表 影片播放
Today I thought I'd talk about a paper fairly recent.
It was last year
A paper called "Concrete Problems in AI Safety"
Which is going to be related to the stuff I was talking about
before with the "Stop Button". It's got a bunch of authors; mostly from Google Brain
Google's AI research department, I guess..
Well a lot of it's AI research, but specifically Google Brain and some
people from Stanford and Berkeley and opening iEARN. Whatever... it's a
collaboration between a lot of different authors
The idea of the paper is trying to lay out a set of problems that we are able to
currently make progress on like if we're concerned about this far-off sort of
super intelligence stuff.. Sure; it seems important and it's interesting and
difficult and whatever, but it's quite difficult to sit down and actually do
anything about it because we don't know very much about what a super
intelligence would be like or how it would be implemented or whatever....
The idea of this paper is that it... It lays out some problems that we can tackle now
which will be helpful now and that I think will be helpful later on as well
with more advanced AI systems and making them safe as well. It lists five problems:
Avoiding negative side effects, which is quite closely related to the stuff we've
been talking about before with the stop button or the stamp collector. A lot of
the problems with that can be framed as negative side effects. They do the thing
you ask them to but in the process of doing that they do a lot of things but
you don't want them to. These are like the robot running over the baby right?
Yeah, anything where it does the thing you wanted it to, like it makes you the
cup of tea or it collects you stamps or whatever, but in the process of doing
that, it also does things you don't want it to do. So those are your negative side
effects. So that's the first of the research areas is how do we avoid these
negative side effects.. Then there's avoiding reward hacking, which is about
systems gaming their reward function. Doing something which technically counts
but isn't really what you intended the reward function to be. There's a lot of
different ways that that can manifest but this is like this is already a
common problem in machine learning systems where you come up with your
evaluation function or your reward function or whatever your objective
function and the system very carefully optimizes to exactly what you wrote and
then you realize what you wrote isn't what you meant. Scalable oversight is the
next one. It's a problem that human beings have all the time, anytime you've
started a new job. You don't know what to do and you have someone who does who's
supervising you. The question is what questions do you
ask and how many questions do you ask because current machine learning systems
can learn pretty well if you give them a million examples but you don't want your
robot to ask you a million questions, you know. You want it to only ask a few
questions and use that information efficiently to learn from you. Safe
exploration is the next one which is about, well, about safely exploring the
range of possible actions. So, you will want the system to experiment, you know,
try different things, try out different approaches. That's the only way it's
going to find what's going to work but there are some things that you don't
want it to try even once like the baby. Right, right.. Yeah you don't want it to
say "What happens if I run over this baby?" Do you want certain possible things
that it might consider trying to actually not try at all because you
can't afford to have them happen even once in the real world. Like a
thermonuclear war option; What happens if I do this? You don't want it to try that.
Is that the sort of thing that.. Yeah, yeah.. I'm thinking of war games.. Yes, yeah.. yeah. Global
Thermal Nuclear War . It runs through a simulation of every possible type of
nuclear war, right? But it does it in simulation. You want your system not to
run through every possible type of thermonuclear war in real life to find
out it doesn't work cause you can't.. It's too unsafe to do that even once. The last
area to look into is robustness to distributional shift. Yeah
It's a complicated term but the concept is not. It's just that the
situation can change over time. So you may end up; you may make something.
You train it; it performs well and then things change to be different from the
training scenario and that is inherently very difficult. It's something
humans struggle with. You find yourself in a situation you've
never been in before but the difference I think or one of the
useful things that humans do is, notice that there's a problem a lot of current
machine learning systems. If something changes underneath them
and their training is no longer useful they have no way of knowing that. So they
continue being just as confident in their answers that now make no sense
because they haven't noticed that there's a change. So.. if we can't
make systems that can just react to completely unforeseen circumstances, we
may be able to make systems that at least can recognize that they're in
unforeseen circumstances and ask for help and then maybe we have a scalable
supervision situation there where they recognize the problem and that's when
they ask for help. I suppose a simplified simplistic example of this is when you have
an out-of-date satnav and it doesn't seem to realize that you happen to be doing
70 miles an hour over a plowed field because somebody else, you know, built a
road there. Yeah, exactly. The general tendency of unless you program them
specifically not to; to just plow on with what they think they should be doing.
Yeah. It can cause problems and in a large scale heavily depended on , you know , in
this case, it's your sat-nav. So it's not too big of a deal because it's not
actually driving the car and you know what's wrong and you can ignore it
As AI systems become more important and more integrated into
everything, that kind of thing, can become a real problem.
Although, you would hope the car doesn't take you in plowed field in
first place. Yeah. Is it an open paper or does it
leave us with any answers? Yeah. So
the way it does all of these is it gives a quick outline of what the
problem is. The example they usually use is a cleaning robot like we've made this.
We've made a robot it's in an office or something and it's cleaning up and then
they sort of framed the different problems those things that could go
wrong in that scenario. So it's pretty similar to they get me a cup of tea and
don't run over the baby type set up. It's clean the office and, you know, not knock
anything over or destroy anything. And then, for each one, the paper talks about
possible approaches to each problem and
things we can work on, basically. Things that we don't know how to do yet but
which seem like they might be doable in a year or two and some careful thought
This paper. Is this one for people to read? Yeah, really good. It doesn't cover
anything like the range of the problems in AI safety but of the problems
specifically about avoiding accidents, because all of these are
these are ways of creating possible accidents, right? Possible causes of
accidents. There's all kinds of other problems you've been having in AI that
don't fall under accidents but within that area I think it covers everything
and it's quite readable. It's quite... It doesn't require really high-level because it's
an overview paper, doesn't require high-level AI understanding for the most
part. Anyone can read it and it's on archive so you know it's freely
available. These guys now working on AI safety, or did this then
They've hung their hat up. They've written a paper and they're hoping
someone else is gonna sort it all out. These people are working on AI
safety right now but they're not the only people. This paper was released in
summer of 2016, so it's been about a year since it came out and since then there
have been more advances and some of the problems posed have had really
interesting solutions or well.. Not solutions, early work, that looks like it
could become a solution or approaches new interesting ideas about ways to
tackle these problems. So I think as a paper, it's already been successful in
stirring new research and giving people a focus to build their AI safety research on
top of. So we just need to watch this space, right? Yeah, exactly..