字幕列表 影片播放
LAURENCE MORONEY: Hi, and welcome
to this episode in "Natural Language Processing, Zero
to Hero" with TensorFlow.
In the previous videos in this series,
you saw how to tokenize text and how to use sequences of tokens
to train a neural network.
In particular, you saw how to create a neural network that
classified text by sentiments.
And in this case, you trained a classifier
on sarcasm headlines.
But the next step I'm often asked when it comes to text
is, what about generating text?
Can a neural network create text based on the corpus
that it's trained on, and can we get an AI to write poetry?
Well, the answer to this is yes.
And over the next few videos, I'll
show you a simple example on how you can achieve this.
Before we can do that, though, an important concept
that you'll need to understand is recurrent neural networks.
This type of neural network takes the sequence of data
into account when it's learning.
So for example, in the case of a classifier for text
that we just saw, the order in which
the words appear in the sentence doesn't really matter.
What determined the sentiment was the vector that
resulted in adding up all of the individual vectors
for the individual words.
The direction of that vector roughly gave us the sentiments.
But if we're going to generate text, the order does matter.
For example, consider this sentence.
"Today the weather is gorgeous, and I see a beautiful blue"--
something.
If you were trying to predict the next word--
and the concept of creating text really
boils down to predicting the next word--
you'd probably say, "sky," because that
comes after "beautiful" and "blue,"
and the context is the weather, which
we saw earlier in the sentence.
So how do we fit this to neural networks?
Let's take a look at what's involved
in changing from sequence list data to sequential data.
Neural networks for classification or regression
tend to look like this.
It's kind of like a function that you
feed in data and labels, and it infers the rules that
fits the data to the labels.
But you could also express it like this.
The f of data and labels equals the rules.
But there's no sequence inherent in this.
So let's take a look at some numeric sequences
and explore the anatomy of them.
And here's a very famous one called the Fibonacci sequence.
To describe the rules that make this sequence,
let's describe the numbers using a variable.
So for example, we can say n0 for the first number, n1
for the next, and so on.
And the rule that then defines the sequence
is that any number in the sequence
is the sum of the two numbers before it.
So if we start with 1 and 2, the next number
is 1 plus 2, which is 3.
The next number is 5, which is 2 plus 3, and so on.
We could also try to visualize it like this on a computation
graph.
If the function is plus, we feed in 1 and 2 to get 3.
We also pass this answer and the second parameter,
which in this case was 2, onto the next computation.
This gives us 2 plus 3, which is 5.
This gets fed into the next computation
along with the second parameter, so 5 plus 3 get added to get 8,
and so on.
So every number is in essence contextualized
into every other number.
We started with 1, and added it to 2 to get 3.
The 1 and the 3 still exists.
And when added to 2 again, we get 5.
That 1 still continues to exist throughout the series.
Thus, a numeric value can recur throughout the life
of the series.
And this is the basis of the concept
of a recurrent neural network.
Let's take a look at this type of network
in a little more detail.
Typically, a recurrent neuron is drawn like this.
There's a function that gets an input value that
produces an output value.
In addition to the output, it also
produces another feed-forward value that
gets passed to the next neuron.
So a bunch of them together can look like this.
And reading from left to right, we can feed x0 into the neuron,
and it calculates a result, y0, as well as a value that
gets passed to the next neuron.
That gets x1 along with the fed-forward value
from the previous neuron and calculates y1.
And its output is combined with x
to get y2 and a feed-forward value to the next neuron,
and so on.
Thus, sequence is encoded into the outputs, a little bit
like the Fibonacci sequence.
This recurrence of data gives us the name
recurrent neural networks.
So that's all very well.
And you may have seen a little catch
in how this could work with natural language processing.
A simple RNN like the one that I've just shown
is a bit like the Fibonacci sequence
in that the sequence can be very strong,
but it weakens as the context spreads.
The number at the position 1 has very little impact
on the number at the position 100, for example.
It's there, but it's tiny.
And that could be useful for predicting text
where the signal to determine the text
is close by, for example, the beautiful blue something
that we mentioned earlier.
It's easy for us to see that "sky" is the next word.
But what about a sentence like this?
"I lived in Ireland, so they taught me how to speak"--
something.
Now, you might think it's "Irish,"
but the correct answer is "Gaelic."
But think about how you predicted that word.
The key word that dictated it was much further back
in the sentence, and it's the word "Ireland."
If we were only predicting based on the words that
are close to the desired one, we'd miss that completely,
and we'd get a bad prediction.
The key there is to go beyond the very short-term memory
of a recurrent neural network with a longer short-term memory
and a network type not surprisingly
called long short-term memory, or LSTM.
You'll see that in the next video,
so don't forget to hit that Subscribe
button for more great episodes of "Coding TensorFlow at Home."
[MUSIC PLAYING]