Placeholder Image

字幕列表 影片播放

  • [MUSIC PLAYING]

  • PAIGE BAILEY: Hi, my name is Paige Bailey.

  • And I am here today to deliver the talk of Karmel

  • Allison and Yash Katariya, who unfortunately couldn't be here

  • today.

  • But for the purposes of this talk,

  • just imagine that I am Karmel.

  • So to get started--

  • my son is in kindergarten.

  • And he's just starting to learn to read.

  • And one of my favorite parts of this process

  • is that even before he could reliably decode words,

  • he started to produce them.

  • And, of course, doing what I do, I

  • can't help but be amazed at how similar his learning

  • process is to neural nets.

  • The rule set for spelling and grammar

  • is so complex in English that we rarely present

  • a set of instructions that he can follow.

  • Instead, he starts almost randomly with letters

  • and sounds.

  • And then he gets feedback from me

  • and from other readers in his life which he incorporates.

  • Some things he memorizes.

  • Some are lucky guesses.

  • And somehow over months of learning,

  • he started to form a consistently interpretable

  • mental model of written words.

  • And he produces mostly understandable text.

  • So here you can see, "I was eating breakfast

  • with my cousins and my sister."

  • I've been particularly fascinated by my son's learning

  • here, because it happens to align with what many of us

  • have started to call an NLP revolution.

  • We have enough data and enough tools

  • that we've started to rapidly push the cutting

  • edge in natural language understanding and related

  • tasks.

  • I've heard it said that, with text today,

  • we're at a cambrian explosion, like when

  • ResNet was first published.

  • We're at the beginning of this faster than--

  • faster than expected progression of language models.

  • So we're going to take advantage of all of this research

  • and tooling.

  • And we're going to teach a simple neural net to generate

  • the next word of a phrase.

  • And we'll use this to tell a robot children's story.

  • So the first thing we need is data.

  • And we're going to take advantage

  • of the children's book test corpus released by Facebook.

  • This data set is a set of Creative Commons

  • children's books that have been converted into a series

  • of passages and fill-in-the-blank style

  • questions about each passage.

  • From this model, we just need the raw book text which

  • looks like what you see here.

  • As an aside, these books are out of copyright,

  • which means they're often old.

  • And that means the corpus is full of literature

  • that's problematic by today's standards.

  • I'm going to gloss over that for the sake of this talk.

  • But if you go and actually use this data set,

  • please do consider what cultural norms

  • you might be teaching your machine learning models.

  • So first, we're going to load the data.

  • Now, that we have a corpus, we can

  • load into our Python interpreter and start playing with it.

  • Here, we use the text line data set to load the data.

  • And we can simply print out a few lines to see what we have.

  • So tell me what you notice here.

  • It looks like we have some cleaning work to do.

  • So using our data set much like we would use a NumPy array

  • or Pandas data frame, we can filter

  • and map transformations across it.

  • Here, I'm dropping those pesky book titles

  • and filtering out punctuation within each row.

  • And after that transforming, I can print out a few lines

  • to check and make sure the data looks as expected.

  • Now, I have a data set where each row is

  • a sentence of arbitrary length.

  • And I've decided I want to train a windowed model to predict

  • a new word, given some words to start.

  • So I need to take these data set rows and make

  • THEM all equal length.

  • Once again, without leaving TensorFlow data sets,

  • I can split, flatten, and regroup my rows

  • so that each row is a set of 11 words.

  • But I want that last word in each row

  • to be a separate label.

  • So I just define a simple row-wise function

  • or pop out my labels.

  • And voila, we have pairs of examples and labels.

  • But there's still a problem with this data.

  • Why can't I just feed this into a dense network

  • or any arbitrary model?

  • Well, it's words.

  • Machine learning models don't speak English.

  • And, of course, the problem with this data set

  • is that we need numbers--

  • only numbers, not lines, just numbers.

  • So we're going to need to transform our input sentences

  • into numeric representations.

  • And the way that we do this is with preprocessing

  • layers, which are highly exciting

  • and a new addition to TensorFlow 2.

  • So preprocessing layers were recently

  • reviewed as part of the Keras API specification.

  • And there are a set of layers that

  • take care of data transformations

  • that are outside the training path.

  • These follow the same APIs as normal layers

  • and can be called and composed just like layers.

  • They play nicely with TensorFlow data sets as well.

  • So you can paralyze preprocessing transformations.

  • And importantly, just like normal Keras layers,

  • these become part of your model and therefore, get

  • serialized in the SavedModel and become part of inference

  • to prediction, which is critical to minimizing training

  • serving SKU.

  • This is the set of preprocessing layers

  • that is already complete.

  • They are experimental in 2.2, as we ensure that the APIs match

  • how you use them.

  • So please check them out for all of your preprocessing needs.

  • So now we know how to get our data into the correct format.

  • And the next step is to build a language model that

  • can learn a representation of all of these words

  • so that we can generate text on the fly.

  • One of the classic models used for text translation

  • and generation is a sequence-to-sequence model.

  • A sequence-to-sequence model typically has two parts--

  • an encoder that uses RNN blocks to encode

  • input data and a decoder that borrows state from the encoder

  • in order to correctly predict the target outputs.

  • Now, sequence-to-sequence models have some complicated

  • moving parts.

  • It's not a simple feed-forward network,

  • and there are a lot of parameters to keep track of.

  • But luckily, we don't have to go it alone here.

  • TensorFlow AddOns is a community-maintained repository

  • built on top of TensorFlow 2 that

  • provides especially complex or experimental layers,

  • optimizers, and other utilities.

  • And one such utility is the seq2seq package,

  • which provides a number of layers and classes

  • that make building sequence-to-sequence models

  • much easier.

  • And you'll see me use these throughout this example.

  • Because the architecture of the sequence-to-sequence model

  • is fairly complex and requires special state passing,

  • we're going to subclass the Keras-based model

  • and build our network explicitly.

  • We start here with the text factorization layer

  • we've already discussed since that's

  • going to be what we feed in our input data

  • through to convert it to indices.

  • After we vectorize the inputs, we're

  • going to pass them through the encoder blocks--

  • first in embedding, then in LSTM.

  • And you'll note that in our init function here,

  • we're just configuring the layers.

  • And we're not actually passing any data through them yet.

  • In Keras, each layer has two stages--

  • the construction, when you parametrize

  • your layer, as seen here, and the calling

  • of the layer, which will come later

  • when we pass our data through.

  • The decoder is somewhat more complicated.

  • But here, we can leverage the TensorFlow AddOn sampler

  • and decoder and set up the decoder LSTM

  • and connect it to projection layer, which

  • is our final dense layer that maps to the vocabulary

  • we want to predict, plus two tokens for predicting calls

  • and out of vocabulary words.

  • The final set of layers we will set up

  • are a pair of attention layers.

  • There is a large and growing field of attention, research,

  • and machine learning.

  • But given the time constraints of this talk,

  • I'm only going to give you a very hand wavy explanation

  • and say, attention is a technique

  • that allows the model to track intermediate states as it

  • steps long sequences.

  • And it will allow the model to give more weight

  • to certain time steps of a sequence

  • when predicting the final word.

  • Here, we use a simple dense attention

  • layer that comes with tf.keras.

  • And you'll see how we connect this between our encoder

  • and decoder in a minute.

  • So we have a lot of state to pass between our encoder

  • and decoder, which means we can't just use

  • the standard Keras fit call.

  • And one of the things I'm most excited about in TensorFlow 2

  • is that we have refactored and cleaned

  • up the Keras training loop.

  • And we've made it much more modular.

  • So instead of overriding the entire fit loop

  • or throwing it out altogether, I'm

  • just going to define a single forward pass

  • from my model in this special function,

  • train_step, and that is going to get called

  • by model.fit with each step.

  • So we overwrite train_step in our encoder decoder model.

  • That train_step is going to get one batch of data at a time,

  • so we just need to define the forward pass for that one

  • batch.

  • And the first thing we do here is unpack our data

  • and separate the example from the label.

  • You'll note that we call our own vectorization layer here

  • to ensure that our input strings get correctly

  • transformed to indices.

  • Next, still inside our single training step,

  • we're going to record our forward pass

  • under a gradient take.

  • Anything that needs to be back propped through

  • should go under here.

  • So while the vectorized layer or preprocessing layer was out

  • of the tape, our encoder embedding, LSTM, and so forth

  • all belong under the tape.

  • And here we pass our inputs through the set

  • of layers we defined on our init to encode them.

  • We also set up our attention layers

  • to track the intermediate state coming out

  • of the encoding layers.

  • And next, we decode, which is to say

  • we try to predict our targets using

  • the state from our encoder.

  • The decoder here, we'll go over the many epics it runs for,

  • train its own weights separate from the encoder weights.

  • And in concert, the encoder and decoder

  • will learn to predict text based on the outputs.

  • And now that we've run all of our layers

  • necessary to form the forward pass, we can compute the loss

  • and collect the outputs of this step

  • so that they can be optimized.

  • The Keras model takes care of collecting variables

  • and setting the optimizer, so we just

  • choose how we want to pass things through here.

  • As the final step, we collect and return

  • the metrics we set in our model.

  • The next step is to pick an optimizer loss and accuracy

  • metric for our model.

  • And these are going to govern the actual training

  • and optimization process.

  • And we select them from a bunch of independently parametrizable

  • options that are built into tf.keras, and then we train.

  • It might take a while to convergence.

  • So I threw in a model checkpoint to callback

  • to make sure I can save my model weights as I go.

  • We can monitor progress as we train.

  • And our goal is to reach some degree of convergence

  • of the reported accuracy.

  • And with this model, this happens somewhere around 45

  • epics at about 70% accuracy.

  • And 90% accuracy is pretty good.

  • But we might ask ourselves, can we do better?

  • And the parameters I chose when training

  • were thoroughly arbitrary, copied

  • from somewhere on the internet.

  • Maybe the models should be bigger or smaller.

  • Typically, we spend some time tuning these model parameters

  • to ensure that we have the best results for a given model

  • architecture.

  • We call this process hyperparameter tuning.

  • And notably, easy hyperparameter tuning

  • is one of the most requested features for Keras.

  • And good news, we have a package for that.

  • KerasTuner was released last October.

  • And it works with TensorFlow, Keras, and even scikit-learn.

  • It allows you to create hypermodels

  • that encode tunable parameters, distribute through training,

  • and state the model for others to use.

  • So let's take it for a quick spin

  • in order to tune some of our models parameters.

  • For example, let's say we wanted to tune the number of RNN units

  • in our model.

  • We can import the pip installable KerasTuner package

  • and then define a function that takes hyperparameter object

  • and uses it to build a compiler model.

  • Inside this function, instead of passing

  • in a fixed init for RNN units, we

  • can use the magic and selector objects,

  • which will allow us to try any integer in this range.

  • There are a number of different selectors you can use here,

  • including floating point numbers and enums.

  • We can then define a tuner algorithm

  • for searching our hyperparameter space

  • and use the tuner to build and fit our model intelligently

  • across that space.

  • Because we overwrote train_step with our custom functionality,

  • everything works within the Keras ecosystem.

  • And the tuner will be able to call .fit,

  • just like we did to get the correct training behavior.

  • The tuner will run through all of the different hyperparameter

  • combinations you've configured.

  • And it even works with Colab and prints out trial information,

  • as you see here.

  • And after a few tuning sessions, we

  • see that, in fact, the best RNN unit count is 1,024.

  • And rerunning with 1,024 units, we improve our accuracy.

  • And now we have an even better model

  • that gets above 90% accuracy.

  • So now that we have a trained model, of course,

  • our goal was not just to create it,

  • but to actually generate text.

  • So how do we take the model we've built

  • and use it to write a sentence?

  • The first step is to use our model's predict one word

  • given the length of the input, which is exactly what we

  • trained the model to do.

  • Just as we did with train step, we

  • can overwrite predict step to define the operations

  • on just one batch of data.

  • In our predict step, we run through the--

  • or we've run the inputs through the same encoder and decoder

  • we saw with training, but now with fixed weights.

  • We can also throw in some custom logic here.

  • And we allowed the model to predict from the top end

  • choices instead of always the most likely word.

  • And we also convert back to the actual English word

  • rather than just returning the numeric indices.

  • We can try this out.

  • And we see that, indeed, we produce the next predicted word

  • correctly.

  • But, of course, we don't want single words.

  • We want a whole sentence so we can go further and define

  • a custom predict that just takes a single string

  • and then generates one word at a time

  • to continuously append to that starting string

  • and generate a much longer string.

  • And lo and behold, we can generate

  • vaguely meaningful statements.

  • It's not perfect.

  • And it's entirely unpunctuated.

  • But it can be a lot of fun.

  • And unlike Karmel's six-year-old,

  • it doesn't lie and happens to like telling funny stories

  • and doesn't get tired.

  • You can see that there is a little bit of punctuation that

  • is vaguely human-understandable, even though the model is

  • quite simple and the data is relatively small

  • and constrained.

  • And indeed, the model we just built

  • is the very first baby step of text processing.

  • And it is heavily restricted by having it fit onto slides.

  • But these same tools and techniques

  • are used to build some amazing large-scale models that

  • run at Google scale.

  • So if you're interested in moving

  • from just learning to sending emails

  • and going big with text in TensorFlow 2,

  • check out some of the code that researchers and engineers

  • at Google have released, including

  • the tf.text repository and KerasBert,

  • as well as keras-transformer, which

  • is an example of a truly cutting-edge NLP model.

  • You'll also hear more about TFHub in the next talk,

  • so stay tuned.

  • So to summarize, use TensorFlow data sets and preprocessing

  • layers to transform your inputs.

  • Check out TensorFlow AddOns to special use layers

  • and utilities, subclass Keras models for complicated training

  • pipelines.

  • Tune your hyperparameters with KerasTuner.

  • And don't forget to check out the entire ecosystem

  • of NLP tools built on top of TensorFlow.

  • So thank you so much to the authors of the presentation,

  • Karmel and Yash, to the illustrator

  • of the presentation, the artist responsible for all

  • of these drawings, and to all of you who are listening online.

  • Very excited to talk about all of these tf.keras and tf.text

  • improvements.

  • Thank you.

  • [MUSIC PLAYING]

[MUSIC PLAYING]

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

用TensorFlow和Keras學習閱讀(TF Dev Summit '20) (Learning to read with TensorFlow and Keras (TF Dev Summit '20))

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字