主動（機器）學習 - Computerphile (Active (Machine) Learning - Computerphile)

字幕列表影片播放

so a lot of your videos have bean about machine learning it right, and we know that this now really works.
Particularly deep.
Learning has really transformed, but it's possible, and many tasks are now simply possible using machine learning.
But there's a small catch and that is that a ll.
These techniques need data, and they need a lot of data.
Data comes in two parts.
The features, as we call them, which is the rule in Put the images, the audio signals your sensor data and then the labels.
Labels usually have to be provided by humans.
So if you're, for example, trying to find all the faces in the image, and somebody needs to actually look at the image and draw a little bounding boxes around every face so you know where they are, that is time consuming and also very boring.
So people don't like it.
It's a it's a job that needs to be done recently.
There has bean progress in a couple of techniques that make that a lot easier to do, and they're called active learning and cooperative learning, which is a form off active learning so that the trick is quite simple.
Actually, it's working with the machine.
So you have a little bit of data that you've collected, and part of that that's a 10% of that.
You have annotated painstakingly hours and hours of work, but you've got a temperature of your data annotated.
Now you can train a machine learner on that 10% of data.
Um, and then you can make predictions on the other 90% of it.
Now you need a machine learning that gives a confidence on all its prediction.
So it doesn't just say there's a face.
It has to say.
I'm pretty sure that's a face that or it's a sure that might be a face, but you know, 10% sure or something like that and what you can then do in active learning, is you, um you ask the human to annotate the data that it's not sure about, and then that way you only start annotating data that it really needs to know about.
So basically, you have your database.
That's that you stuck with 10%.
You're going to training machine learner on that.
That gives you a hypothesis, and that's her fourth assist will take the other 90% of data that it hasn't been trained on the It makes predictions on that now.
Some of those predictions will have high confidence, and you know, there's a probably not that interesting, and some others have low confidence, and they might be useful to have a look at.
So you ask, Are you confident machine?
If it's north, then you're going to involve a person who is happy because it only needs to look at the little bit of data so that person will label that data.
And that goes back into your machine learning train algorithm, which creates a new Apophis.
So we start with a zero after training it going through this little one time, we get aged one et cetera, et cetera, and you still apply that to the data that you haven't labeled yet.
So actually the labeling really goes to the database, and the database has a division what is labeled on what isn't able to get in that issues for the training, so that's active learning that works quite well, so you need to decide in practice, you don't just looking for this confidence.
You can also include tricks like what data looks very different from the data that I've already labeled based on some sort of similarity measure based on the feature, so you can make sure, of course, we've got two images in a video.
The subsequent friends will look very similar, and it's probably very valuable to to imitate the next frame.
Then the next day next week it's probably more valuable toe entertain friends that look very different so you can include that as well.
In the the selection process of what the label, the our party's cooperative learning and really that's where you're going toe say L.
If I've got a very good machine owner and I'm confident that it creates good confidence levels, then you can say OK, this data for which we said Yes, we are confident about the data we machine able that were basically we're going to accept the machine label and that means that this date is now labeled wasn't previously labeled, So this is a test Data from the date set wasn't refusing, labeled.
We have such a high confidence on this that we're going to say we're going to accept it, and that way it goes back into the database as well and is used in the next training integrations.
Well, so this way we have cooperation in the laboring between the machine Lerner and the human who both together help to label the data.
And people have done studies on this how well it works.
And it's basically means that you can spend, you know, 10 20% of the time you would normally spent an irritating databases and then still get the same performance.
So it saves people a lot of time, and I think these are the kinds off new areas in machine learning that you will see happening now.
Is this why we're seeing all these captures?
Can you see a pedestrian crossing in this image in that is there is surreptitiously collecting data from people.
Yes, absolutely, Dave, on what we want to do if we want to change the network so that when we put these images through these two aerated very similar on these two pairs, elated is very different.
And actually, what we usually do is we label this one an anchor, this one a positive sample on this one, A negative sample.