Name: AI語言模型與變形金剛 - Computerphile (AI Language Models & Transformers - Computerphile)
Uploaded: 2021-01-14T10:13:34.000Z
Duration: 20 min 40 s
Description: 【看影片學英語】數萬部 YouTube 影片，搭配英漢字典即點即查，輕鬆掌握單字發音與用法，長久累積看電影不必再看字幕。

過去簡單式

this very powerful language model from open AI and I thought it would make sense to start by just doing a video about

transformers and language models in general because

Language model implemented as a transformer, but you have a previous video about generating YouTube comments, which is the same kind of task, right?

That's a language modeling task from language processing to generate new samples for cooling of the most complex or magnetic

Consistent brackets like a computer to expect found in creating organizations

I believe that video was made October 2017 and this paper came out December 2017, which has kind of

Revolutionized the way that people carry out that kind of task. That's not the GPT -  2 that's something before that, right?

That's the transformer, which is a new realm. Yeah relatively new

for neural networks, that can do actually all kinds of tasks, but they're especially good at this kind of

a language model is a probability distribution over like sequences of

Tokens or symbols or words or whatever in a language?

So for any given like sequence of tokens, it can tell you how likely that is

So if you have a good language model of English

It can look at a sequence of you know words or characters or whatever and say how likely that is to occur in English

How likely that is to be an English phrase or sentence or whatever

And when you have that you can use that for a lot of different tasks. So

If you want to generate text, then you can you can just sort of sample from that distribution and keep giving it

so you you sample a word and then you say

And to be clear sampling from a distribution means you're just taking

Your you're sort of rolling the dice on that probability distribution and taking whichever one comes out. So

And then say okay conditioning on that given that the first word of this sentence is V

What does the probability distribution look like for the second word?

And then you sample from that distribution and then it's you know

with a cat and you say given that it's the cat what's likely to come next and so on so you can you can build

Your distribution that's one of the things you could use it for

most of us kind of have an example of this sort of in our pockets of

Its actual absolutely right and that's like that's the that's the way that most people interact with a language model

I guess this is how I often start a sentence

apparently with I I am not sure if you have any questions or concerns, please visit the

Plugin settings so I can do it for the first time in the future of that's no good

Here's a different option. Let's just see what this way. Maybe the same

But I can't find it on the phone screen from the phone screen on the phone screen on the phone screen on the phone screen

On the phone screen. I don't actually know how this is implemented

it might be a neural network, but my guess is that it's some kind of

like Markov model Markov chain type setup where you just

for each word in your language you look at your data set and you see how often a particular

Following that word and then that's how you build your distribution

So like for the word "I" the most common word to follow that is "am" and there are a few others, you know

This sentence on the phone screen on the phone screen on the phone screen on the phone screen on the phone screen

This is the super low probability sentence where I would somebody type this and the thing is it's like myopic

It's only I'm not sure I even it's probably only looking at the previous word

It might be looking at like the previous two words, but the problem is to look back. It becomes extremely expensive

Like you've got I don't know 50,000 words that you might be looking at and so then it so you're you're you're remembering

2, that's 50,000 squared right and if you want to go back three words

You have to cube it. So you like raising it to the power of the number of words back you want to go which is

Basically doesn't look back by the time we're saying on the it's already forgotten the previous time

It said on the it doesn't realize that it's repeating itself and there are slightly better things you can do in this general area

But like fundamentally if you don't remember you're not going to be able to make good sentences

If you can't remember the beginning of the sentence by the time you're at the end of it, right?

One of the big areas of progress in language models is handling long term dependencies

I mean handling dependencies of any kind but especially long term dependencies

You've got a sentence that's like Shawn came to the hack space to record a video and I talked to

Blank right in that situation if your model is good

you're expecting like a pronoun probably so it's it's she they

You know them whatever and but the relevant piece of information is the words short

Which is like all the way at the beginning of the sentence

so your model needs to be able to say oh, okay, you know Shawn that's

Usually associated with male pronouns, so we'll put the male pronoun in there. And if your model doesn't have that ability to look back

Or to just remember what it's just said then

It's just a slight like it might make a guess

just a random guess at a pronoun and might get it wrong or it might just

Frank, you know just like introduced a new name because it's guessing at what's likely to come there and it's completely forgotten that sure was

Ever like a thing. So yeah, these kind of dependencies are a big issue with things that you would want to language model to do

Language models for generating text in this way, but you can also use them for all kinds of different things. So like

people use language models for translation

Obviously you have some input sequence that's like in English and you want to output a sequence in French or something like that

Having a good language model is really important so that you end up with something. That makes sense

Summarization is a task that people often want

Where you read in a long piece of text and then you generate a short piece of text. That's like a summary of that

that's the kind of thing that you would use a language model for or

reading a piece of text and then answering questions about that text or

If you want to write like a chatbot that's going to converse with people having a language model as good like basically almost all

right is it's useful to have this the other thing is

字幕列表影片播放

AI語言模型與變形金剛 - Computerphile (AI Language Models & Transformers - Computerphile)

stuff

sort

relevant

entire