字幕列表 影片播放 列印英文字幕 Today I thought I'd talk about a paper fairly recent. It was last year A paper called "Concrete Problems in AI Safety" Which is going to be related to the stuff I was talking about before with the "Stop Button". It's got a bunch of authors; mostly from Google Brain Google's AI research department, I guess.. Well a lot of it's AI research, but specifically Google Brain and some people from Stanford and Berkeley and opening iEARN. Whatever... it's a collaboration between a lot of different authors The idea of the paper is trying to lay out a set of problems that we are able to currently make progress on like if we're concerned about this far-off sort of super intelligence stuff.. Sure; it seems important and it's interesting and difficult and whatever, but it's quite difficult to sit down and actually do anything about it because we don't know very much about what a super intelligence would be like or how it would be implemented or whatever.... The idea of this paper is that it... It lays out some problems that we can tackle now which will be helpful now and that I think will be helpful later on as well with more advanced AI systems and making them safe as well. It lists five problems: Avoiding negative side effects, which is quite closely related to the stuff we've been talking about before with the stop button or the stamp collector. A lot of the problems with that can be framed as negative side effects. They do the thing you ask them to but in the process of doing that they do a lot of things but you don't want them to. These are like the robot running over the baby right? Yeah, anything where it does the thing you wanted it to, like it makes you the cup of tea or it collects you stamps or whatever, but in the process of doing that, it also does things you don't want it to do. So those are your negative side effects. So that's the first of the research areas is how do we avoid these negative side effects.. Then there's avoiding reward hacking, which is about systems gaming their reward function. Doing something which technically counts but isn't really what you intended the reward function to be. There's a lot of different ways that that can manifest but this is like this is already a common problem in machine learning systems where you come up with your evaluation function or your reward function or whatever your objective function and the system very carefully optimizes to exactly what you wrote and then you realize what you wrote isn't what you meant. Scalable oversight is the next one. It's a problem that human beings have all the time, anytime you've started a new job. You don't know what to do and you have someone who does who's supervising you. The question is what questions do you ask and how many questions do you ask because current machine learning systems can learn pretty well if you give them a million examples but you don't want your robot to ask you a million questions, you know. You want it to only ask a few questions and use that information efficiently to learn from you. Safe exploration is the next one which is about, well, about safely exploring the range of possible actions. So, you will want the system to experiment, you know, try different things, try out different approaches. That's the only way it's going to find what's going to work but there are some things that you don't want it to try even once like the baby. Right, right.. Yeah you don't want it to say "What happens if I run over this baby?" Do you want certain possible things that it might consider trying to actually not try at all because you can't afford to have them happen even once in the real world. Like a thermonuclear war option; What happens if I do this? You don't want it to try that. Is that the sort of thing that.. Yeah, yeah.. I'm thinking of war games.. Yes, yeah.. yeah. Global Thermal Nuclear War . It runs through a simulation of every possible type of nuclear war, right? But it does it in simulation. You want your system not to run through every possible type of thermonuclear war in real life to find out it doesn't work cause you can't.. It's too unsafe to do that even once. The last area to look into is robustness to distributional shift. Yeah It's a complicated term but the concept is not. It's just that the situation can change over time. So you may end up; you may make something. You train it; it performs well and then things change to be different from the training scenario and that is inherently very difficult. It's something humans struggle with. You find yourself in a situation you've never been in before but the difference I think or one of the useful things that humans do is, notice that there's a problem a lot of current machine learning systems. If something changes underneath them and their training is no longer useful they have no way of knowing that. So they continue being just as confident in their answers that now make no sense because they haven't noticed that there's a change. So.. if we can't make systems that can just react to completely unforeseen circumstances, we may be able to make systems that at least can recognize that they're in unforeseen circumstances and ask for help and then maybe we have a scalable supervision situation there where they recognize the problem and that's when they ask for help. I suppose a simplified simplistic example of this is when you have an out-of-date satnav and it doesn't seem to realize that you happen to be doing 70 miles an hour over a plowed field because somebody else, you know, built a road there. Yeah, exactly. The general tendency of unless you program them specifically not to; to just plow on with what they think they should be doing. Yeah. It can cause problems and in a large scale heavily depended on , you know , in this case, it's your sat-nav. So it's not too big of a deal because it's not actually driving the car and you know what's wrong and you can ignore it As AI systems become more important and more integrated into everything, that kind of thing, can become a real problem. Although, you would hope the car doesn't take you in plowed field in first place. Yeah. Is it an open paper or does it leave us with any answers? Yeah. So the way it does all of these is it gives a quick outline of what the problem is. The example they usually use is a cleaning robot like we've made this. We've made a robot it's in an office or something and it's cleaning up and then they sort of framed the different problems those things that could go wrong in that scenario. So it's pretty similar to they get me a cup of tea and don't run over the baby type set up. It's clean the office and, you know, not knock anything over or destroy anything. And then, for each one, the paper talks about possible approaches to each problem and things we can work on, basically. Things that we don't know how to do yet but which seem like they might be doable in a year or two and some careful thought This paper. Is this one for people to read? Yeah, really good. It doesn't cover anything like the range of the problems in AI safety but of the problems specifically about avoiding accidents, because all of these are these are ways of creating possible accidents, right? Possible causes of accidents. There's all kinds of other problems you've been having in AI that don't fall under accidents but within that area I think it covers everything and it's quite readable. It's quite... It doesn't require really high-level because it's an overview paper, doesn't require high-level AI understanding for the most part. Anyone can read it and it's on archive so you know it's freely available. These guys now working on AI safety, or did this then They've hung their hat up. They've written a paper and they're hoping someone else is gonna sort it all out. These people are working on AI safety right now but they're not the only people. This paper was released in summer of 2016, so it's been about a year since it came out and since then there have been more advances and some of the problems posed have had really interesting solutions or well.. Not solutions, early work, that looks like it could become a solution or approaches new interesting ideas about ways to tackle these problems. So I think as a paper, it's already been successful in stirring new research and giving people a focus to build their AI safety research on top of. So we just need to watch this space, right? Yeah, exactly..
A2 初級 人工智能安全的具體問題(論文) - Computerphile (Concrete Problems in AI Safety (Paper) - Computerphile) 2 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字