AI YouTube評論 - Computerphile (AI YouTube Comments - Computerphile)

字幕列表影片播放

Michael is YouTube comments you've got on your screen.
No, these are completely fake comments that I have generated using a neural network, so I've trained it on YouTube.
Comments for computer file.
So basically, I've read a lot of computer file comments on now generating semi plausible new comments.
But talk about Lennox on DDE floppies and job, for example, all kinds of semi plausible things, but not really.
If you actually read it.
Let's talk a bit about recurrent Newell networks.
We've covered your networks a few times.
Are videos on machine learning deep learning convolution?
Your nets Google deep Dream in this sort of thing.
A lot of people are doing research in new networks now on dhe, some of them are doing things like natural language processing.
So things like trying to predict sentences or workout sentiment analysis from sentences, things like is this sentence of happy centers or sad sentence?
If it's an angry tweet or relax tweet, I'm gonna very briefly recap, but I would advise people to go back to these other videos first.
So if you've got a new or network on dure trying to make some predictions and tech or anything sequential.
Really?
So we've got various.
You know, Linc's going up here and like this, and we put in something in this side.
Let's say a sentence and we want to predict something about that sentence.
I may be what the next word is going to be, or whether or not but be sentences, a positive or negative one or so you know, this sort of thing.
The problem is that this is a very fixed structure.
It's not very good at sequential data.
You can imagine.
I might put the first letter in here and the second letter in here and the third letter in here.
And as long as all my sentences have the exact same length in the exact same structure, that kind of thing work.
But as soon as things start to move around a bit like words, but we're looking for, like the first name of the person in the sentence of the subject, it's gonna become much more difficult.
So let's imagine, you know, using that example that I'm trying to predict the subject of a sentence based on some sentences, right?
So I've collected together lots of lots of sentences, and I've got lots of lots of the subject of that sentence, and I put all the sentences in and try and learn the subject.
The problem is that the subject moves around in a sentence.
Sometimes it's over here.
Sometimes it's over here.
So this network has got quite a difficult job to learn not only how to determine what the subject is, but also how to do that when it could be anywhere.
So you you might say, You know, I went to the park on Tuesday, but you might also say, on Tuesday I went to the park, So if you're trying to predict when you went to the park, you gotta learn until it's like it's more complicated than you packed first think So What we really want to try and do is put in sequential data.
We want to put in one word or one letter at a time and then try and make decisions on that.
So we need some way off examining this overtime.
So if we simplify this into a sort of a different, slightly different structures, let's imagine we're training a very simple your network to take characters.
Let's say T H E and produce a guess as to what the Nets character will be.
That way we can work out things like what?
Like the word.
We could also, for example, have an output.
But that major sentiment analysis, because it's starting to generate and work out structure of words.
So let's say our input is t and then h and many in a row, right?
So we have our input nose here of our network.
We have our hidden nodes on dhe.
We have some kind of learned mapping between these two things.
And then we have an output, which is our prediction of what the next character will be right now in a normal new network situation.
Although, you know, this is, I've turned this now on its side of it.
Essentially, what we're trying to do is say, Well, look, I don't put here is h or it should be a tch.
So we go forward to the network, we see whether it gets his age and we essentially calculate a loss and say, Well, that wasn't very good.
You thought it would be said, And so we then adjust these weights here to try and make that a little bit better.
The problem with this sort of structure is that we're only ever trying to make a decision based on one character input.
So you know that the letter that follows is going to differ depending on what the other letters are, you know, isn't always going to be an A or a tea or something else.
This is inherently gonna struggle with this kind of problem.
So what we do is we also want to bear in mind what we've seen before.
We want to say we've seen an H, but we've also seen a team previously.
Can we make a better decision?
About what?
About what will be.
And we do that by basically making more weights and connecting up the seven layers like this.
This is essentially a current Newell network.
This is gonna be expanded unrolled structure.
But it's the same kind of principle when we first start this h zero, we put in the tea and it goes, Well, I'm predicting, given a t what the most likely letter is.
It could be an age.
It could be in a could be lots of different things.
All right, so we make it, we make an attempt to guessing here.
We don't know very much.
We put it in a day and we say, Well, we've already seen it t and we know about because it's stored somewhere in this hidden area.
So maybe we can make it slightly better.
Guess we've seen a T.
Now we've seen it age.
So you know, it could be any next for the what could be an A for that, like that Kind of I kind of guessed.
It's unlikely to be a Zed, you know, unless people would make lots of typos, we can predict maybe that this could be any next.
They won't actually do see any now and so we can put in here and say We looked We saw t we saw on H.
Now we're seeing any.
Can we make it really any more informed decision as to what's going on there?
And it turns out we can so we can see what it could be a space because it maybe we're at the end of the word.
Or it could be in a because it's the word is theater or an end.
Because the word is then there's lots of different ideas it could be.
But you can imagine if this was very long, we could be bringing in words from previously as well.
So we could say, Well, look on DDE isn't what makes more sense than and then I see that's not true.
It doesn't really make sense.
It depends on you Gotta have more sentence.
The more sentence you have, the more powerful your your your prediction is going to be on.
What this will actually predict is basically the likelihood of the next letter.
So it will say, given th e, I think the next output has a 10% chance of being in a but a 50% chance of being a space right and some other chances of being other characters maybe get the idea.
The only other interesting thing about becoming your networks is that the weights here and here and here are shared.
So they are the same because otherwise this one here would always be learning what it is to see the first character.
And this one will be, you know, seeing the 1st 2 characters.
It doesn't generalize very well.
So what we do is when we when we train this network up.
We actually train these hidden neurons to predict in the general case of wherever they are in the sentence, and that makes him really good at varying blank V input.
Right?
So this works really well for YouTube comments, for example, because they often different lengths.
They might have a similar structure, but they're very, very different lengths.
So I thought have moved off my sentiment analysis and, you know, word prediction kind of thing to kind of character level, low level thing.
But you could imagine if you just ignored these outputs here and we have only one output at the end of a sentence, you could try and predict something like the sentiment of a sentence, whether it's a happy one or predict the subject of his engines or something like that.
It's a natural language.
Processing lends itself quite well to this kind of idea.
So I've trained one of these up, and I thought, Well, what I want to do is do some word.
Prediction is lots of fun examples online, and we'll link a blogger really good brought post in the comments are from really fun examples of predicting text.
You could basically Creighton your network that does this and then basically just start generating random text.
It's quite fun.
So I thought what text I have access to, which would be fun to generate.
And when I realized that we have access to all the YouTube comments for computer file, right over historic YouTube comments for computer phone.
So what I've done is I've downloaded all the comments since the beginning of computer fall, and I put them into a network a little bit like this and try to get it to predict Maur words.
And it has a stab, right?
So let's have a look and see what it does.
So, yeah, this is one of our machine learning Service I boy to just trade up with becoming network.
The machine's name is dead Pool on Dhe.
You're welcomed by a Basquiat deadpool.
When you look in time well spent so long returned Now it's the only servant does it.
No one else seems to be interested Now.
What I described just now is a common you're networking.
It's sort of traditional form the modern neural networks that we used to do this a slightly more powerful called long short term memory networks will cover maybe the architecture of those in a different video.
But that's what I've trained here.
Okay, so you know, YouTube comments come with things like time stamps on DDE.
What their comment, ideas and who they've applied to Things like this.
I decided not to make my life too complicated, so I got rid of most of that.
And the structure, broadly speaking, is the word comment.
Then in black, it's the person who commented like cheese, a dignity here, then a colon, and then what they actually said.
So I haven't done any kindof filtering off expletives and things like this.
So let's hope nothing comes up.
But I have done some things like we move slightly old characters and things like this just to make to make it a bit different.
You got to think that if your training your your network, you have to have output notes for every possible character.
So the few of those you have, the easier it'll be.
The most recent comment would be will be these.
But of course, I scrape me off a couple of weeks ago, and so maybe some more people wanted since then.
Now I've trained this network up and basically, you know, I left it running for a few hours on.
What we're doing is we're putting in strings of text on Dhe having it learned what the next characters like him to be, right?
So when it sees th e, it's probably going to see a space or an end when it sees comment space.
It's probably going to put in a starting black it because that's the beginning of a comment.
It's this kind of rules, and then what we can do is want is trained.
We can base it just asking to keep spitting out text based on the most probable letters.
So we started off.
We say, right, go and it says, Well, there's a bit there's There's probably a fairly high chance that the first letter was a deep let's say so.
It puts puts out of D and then it goes well, given that we've seen a day, the next letter is probably in a let's say, but it could be any or it could be in our so we pick one at random based on how likely they are, So we're always basically typing at random, but you some suggestions of this network, and it produces semi plausible text.
I mean, is nonsense, but it's also not totally stupid.
So let's see what it does.
Based on my pretty train network, I'm going to produce 5000 characters of random text and let's see what looks like eso just boot up for a minute, and then it will run pretty quick once it gets going.
There we go so you can see we've got comments.
We've got replies.
At first glance, it looks plausible when you actually read into the comet, thereby bizarre.
Let's pick a couple in cirrus eso eso this guy by blocking flop.
Now that might be a real person, but probably is just a completely made up Use.
The name I was no pretty puzzling the code gaming so they would have found out something tread.
The largest stop information killed this guy.
Hessel to hundreds, replied.
I find your own difference profound.
Certainly, hards use useful, usually a and so full stop.
It's got the punctuation correct.
It's finishing sentences with four stops.
Let's try again.
It was a long one.
There there is, so some of them are very long new oranges who I match it humans, but for cooling of the most complex or magnetic consistent brackets, like a computer to expect founding creating organizations.
The interesting thing about this is that so sometimes you'll say gets looking at YouTube Mind to let you stand out now the sentence doesn't really make sense, but the words have actually spelt correctly.
And YouTube, for example, has a capital y capital T, which is quite clever.
It's learned that usually when you should have seen there's a capital wine, a capital T, but we're not picking the most likely letter every time we're picking one of the plausible possibilities, so sometimes it's going to pick the wrong one.
So up because it away with sampling.
So it might say we've seen a Y o u T u B if there's a 99% chance of any but is also a 1% chance of a space, and so it'll put into space that 1% of the time.
But sometimes you're going to see typos basically using this approach.
What another interesting is the reply sometimes specified.
The target of the apply with this plus, plus another user name also probably messed up this username ice metal punk appears quite a lot in my output.
Now I actually think that that's what we'll use the name.
So if you're watching, you know, congrats.
You posted so many times our comments.
But my network has learned your name and is now just generating your comments for you.
It's this Fish is quite fun.
This alone it is.
I mean, I'm quite impressed by this.
People might be thinking what is just generating nonsense, but you gotta consider this is quite a small new your network, looking at the random stuff on computer file comments, bearing in mind lots of different types of people commenting lots of different video content.
And it's learned a kind of broad structure for a comment system.
It's got comments.
It's got replies.
It's tagging people in replies.
It's making up usernames.
It's learned some people's usernames.
I mean, it's it's surprisingly impressive in the block post that I linked to some other really impressive examples of sort of things these things can do.
This has got uses in things like natural language processing, but convolution aversion to be things are starting to see use in image processing as well.
Can we, for example, start to do things over video sequences or over free dimensional data?
Things like this.
So L s t m's and recurrent, your networks are going to see a lot of use going forward because they could do some really impressive things over time.
Oh, look, so that's interesting.
Is that it gone wrong?
Yeah, People tend to put numbers in a row.
Let's say when they're typing out, you know, big numbers or something like that.
So when it starts with numbers, it tends to just get stuck in a loop.
Just producing more numbers doesn't happen very often, but it does happen.
This is no infallible, as you might have noticed from the grammar.
Obviously, this is about fuck.
But could this be used as a chat about or something like that?
I mean, theoretically, you could, yes, the text that I've trained this on this far to board in some sense to make a reasonable chapel because clearly none of it makes any sense.
I mean, you might get the occasional sentence.
It kind of makes sense.
It seems a bit like broken English, but it's not about useful.
If you targeted this, though, that let's say someone's own emails.
It would start to probably generate quite plausible emails of theirs or suggest words.
So, for example, for auto complete and purpose is usually what we would be using.
This kind of thing is not necessarily generating text, but to make some decision based on what it seemed.
So it's because it can change together all of these kinds of sentences.
You could think it's starting to look at least 50 or 100 characters into the past.
We can start to maybe a sentiment analysis or do predictions off kind of sentence or the content of a sentence.
Um, this is the kind of practice, perhaps more practical purposes of this apartment.
Just a bit of fun.
I like this one great video.
See, now what?
Now we're getting somewhere.
Compute foul sounds more helpful than than what?
It doesn't go into detail.
He's working with the phone customization.
I mean, we know we're not ready to take over the world with this network yet, but, you know, one step at a time people complain about my fingerprints on the monitor when I have it off, but I haven't cleaned it yet Because I'm lazy, right?
I thought by now, with all these views, a new team, you have someone to clean up for you.
And yet no one's appears unbelievable.
I know there is a reply from computer follow, if you know where was that, then?
About five lines.
Oh, and from apply from Professor Dave, be as well.