Name: 獨角獸AI - Computerphile (Unicorn AI - Computerphile)
Uploaded: 2021-01-14T10:34:32.000Z
Duration: 11 min 57 s
Description: 【看影片學英語】數萬部 YouTube 影片，搭配英漢字典即點即查，輕鬆掌握單字發音與用法，長久累積看電影不必再看字幕。

In the previous video we were talking about

transformers this architecture that uses attention to give

Unprecedented ly good performance on sort of language modeling tasks and some other tasks as well

but when were looking at language modeling and that was in preparation to make a video about

GPG 2, which is this very giant language model that has been there was recently

Well, it was recently not released actually by open AI the way that they generated the data set for this is pretty cool

to get enough text they went to Reddit and

They pulled every website that is linked to from reddit. Do we have any idea of how many days lots?

Literally, everything was everything that had more than three karma

I think or maybe more than two karma something like that like

Anything that had somebody had thought to post around it and at least two or three people who had thought was good enough to upload

They scraped the text from that. It's pretty much just a transformer. It's not the the

Architecture is not especially novel. They haven't done any like amazing new

the more data you give them the better they do and the bigger you make them the better they do and

Everything that we built up until this point is clearly not

Like we haven't hit the limits of what this can do

Bottle necked on data and maybe network size

So what happens if we'd like to turn that 211 what happens if we just give this all?

The data and make a really big one. It makes sense to talk about the acronym right so it's a generative pre-training

Transformer so generative same as generative adversarial network. It generates outputs to generate samples

Your pre-trained is this thing. I was talking about all of the different things

You can use a language model for right you can do you can do translation. You can try and resolve ambiguities

You can do summarization. You can answer questions. You can use the probabilities for augmenting other systems

So yeah, there's a bunch of different benchmarks for these different tasks

that you might want your language model to do and

This is what we talked about in the grid worlds video of having these like standardized problems with standardized metrics and standardized data sets

So that if you're comparing two different methods, you know that you're actually comparing apples to apples

And this is like very important it gives you numbers on these things. It's often quite difficult

Expected to like you're generating samples of text and it's like how plausible is this text? How realistic does it look like?

How do you put a number on that it's kind of difficult. So there's all of these standardized metrics and

People came to realize which actually I mean I say that as though it's like some amazing discovery

It's fairly obvious. If you train your system in a like an unsupervised way on a large corpus of just general English text and

Train that with the data from this benchmark or the data from that benchmark

You can like fine-tune it so you start with something which has like a decent

Understanding of how English works more or less and then you say now I'm going to give you these

Samples for like question answering or I'm going to build a system using that to solve to go for this benchmark

So it's pre trained you start with something. That's like a general-purpose language model and then you from that a

Actual benchmark or problem you're trying to solve

Can give you better performance than to starting from nothing and training to each of the benchmarks from scratch

The point of the GPT 2 paper the thing that makes it cool is they said okay if we make a really huge one

What if we just make a giant model and then just try and run it on the benchmarks without messing with it?

Without showing it any of their specialized data for that benchmark. Just the raw

general-purpose language model, how does that perform and it turns out

Actually doesn't sound like very much but like for text text that's insane, right? It's

Google's entire index of the Internet in 98

and they trained it on that and they ended up with a

1.5 billion parameter model, but which is like a previous state of the art system was 345 million

So they've just made the thing much much bigger and it performs really well some of their samples that they published quite

You could say and now that we've talked a little about the problems that

Neural networks or any language model really?

we can now realise just how impressive these samples are because when you look at them as a you know,

If you look at them uninitiated, you're like yeah, that's pretty realistic

It seems to like make sense and it's cool. But when you look at it knowing how language models work, it's like

very impressive the the coherence and the

Consistency and the long-range dependencies so we can look at this one that got everybody's attention the unicorns one

So they prompted it with in a shocking finding scientists discovered a herd of unicorns

living in a remote previously unexplored valley in the Andes Mountains

Even more surprising to the researchers was the fact that the unicorns spoke perfect English

And from there you then say you go to your language model gbgt, and you say given that we started with this

What's the next word and what's the word after that and so on?

So it goes on the scientist named the population after their distinctive horn of its unicorn

These four horned silver white unicorns were previously unknown to science

We do have a clue here as a human being unicorns for horned doesn't quite make sense

Now after almost two centuries the mystery of what sparked this odd phenomenon is finally solved. Dr

Jo are G an evolutionary biologist from the University of La Paz

This is impressive because we've mentioned the Andes Mountains in our prompt and so now it's saying okay

This is clearly, you know in a shocking finding. This is a science press release news article

It's seen enough of those because it has every single one that was ever linked to from reddit, right?

So it knows how these go it knows. Okay third paragraph

This is when we talk about the scientist, we interview the scientist, right? Okay

First word of the scientist paragraph, dr. Obviously, right because this is the now we're in the name of the scientist

conditioning on the fact that we have the Andes Mountains

So we need to get where we're in South America