字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] PAIGE BAILEY: Hi, my name is Paige Bailey. And I am here today to deliver the talk of Karmel Allison and Yash Katariya, who unfortunately couldn't be here today. But for the purposes of this talk, just imagine that I am Karmel. So to get started-- my son is in kindergarten. And he's just starting to learn to read. And one of my favorite parts of this process is that even before he could reliably decode words, he started to produce them. And, of course, doing what I do, I can't help but be amazed at how similar his learning process is to neural nets. The rule set for spelling and grammar is so complex in English that we rarely present a set of instructions that he can follow. Instead, he starts almost randomly with letters and sounds. And then he gets feedback from me and from other readers in his life which he incorporates. Some things he memorizes. Some are lucky guesses. And somehow over months of learning, he started to form a consistently interpretable mental model of written words. And he produces mostly understandable text. So here you can see, "I was eating breakfast with my cousins and my sister." I've been particularly fascinated by my son's learning here, because it happens to align with what many of us have started to call an NLP revolution. We have enough data and enough tools that we've started to rapidly push the cutting edge in natural language understanding and related tasks. I've heard it said that, with text today, we're at a cambrian explosion, like when ResNet was first published. We're at the beginning of this faster than-- faster than expected progression of language models. So we're going to take advantage of all of this research and tooling. And we're going to teach a simple neural net to generate the next word of a phrase. And we'll use this to tell a robot children's story. So the first thing we need is data. And we're going to take advantage of the children's book test corpus released by Facebook. This data set is a set of Creative Commons children's books that have been converted into a series of passages and fill-in-the-blank style questions about each passage. From this model, we just need the raw book text which looks like what you see here. As an aside, these books are out of copyright, which means they're often old. And that means the corpus is full of literature that's problematic by today's standards. I'm going to gloss over that for the sake of this talk. But if you go and actually use this data set, please do consider what cultural norms you might be teaching your machine learning models. So first, we're going to load the data. Now, that we have a corpus, we can load into our Python interpreter and start playing with it. Here, we use the text line data set to load the data. And we can simply print out a few lines to see what we have. So tell me what you notice here. It looks like we have some cleaning work to do. So using our data set much like we would use a NumPy array or Pandas data frame, we can filter and map transformations across it. Here, I'm dropping those pesky book titles and filtering out punctuation within each row. And after that transforming, I can print out a few lines to check and make sure the data looks as expected. Now, I have a data set where each row is a sentence of arbitrary length. And I've decided I want to train a windowed model to predict a new word, given some words to start. So I need to take these data set rows and make THEM all equal length. Once again, without leaving TensorFlow data sets, I can split, flatten, and regroup my rows so that each row is a set of 11 words. But I want that last word in each row to be a separate label. So I just define a simple row-wise function or pop out my labels. And voila, we have pairs of examples and labels. But there's still a problem with this data. Why can't I just feed this into a dense network or any arbitrary model? Well, it's words. Machine learning models don't speak English. And, of course, the problem with this data set is that we need numbers-- only numbers, not lines, just numbers. So we're going to need to transform our input sentences into numeric representations. And the way that we do this is with preprocessing layers, which are highly exciting and a new addition to TensorFlow 2. So preprocessing layers were recently reviewed as part of the Keras API specification. And there are a set of layers that take care of data transformations that are outside the training path. These follow the same APIs as normal layers and can be called and composed just like layers. They play nicely with TensorFlow data sets as well. So you can paralyze preprocessing transformations. And importantly, just like normal Keras layers, these become part of your model and therefore, get serialized in the SavedModel and become part of inference to prediction, which is critical to minimizing training serving SKU. This is the set of preprocessing layers that is already complete. They are experimental in 2.2, as we ensure that the APIs match how you use them. So please check them out for all of your preprocessing needs. So now we know how to get our data into the correct format. And the next step is to build a language model that can learn a representation of all of these words so that we can generate text on the fly. One of the classic models used for text translation and generation is a sequence-to-sequence model. A sequence-to-sequence model typically has two parts-- an encoder that uses RNN blocks to encode input data and a decoder that borrows state from the encoder in order to correctly predict the target outputs. Now, sequence-to-sequence models have some complicated moving parts. It's not a simple feed-forward network, and there are a lot of parameters to keep track of. But luckily, we don't have to go it alone here. TensorFlow AddOns is a community-maintained repository built on top of TensorFlow 2 that provides especially complex or experimental layers, optimizers, and other utilities. And one such utility is the seq2seq package, which provides a number of layers and classes that make building sequence-to-sequence models much easier. And you'll see me use these throughout this example. Because the architecture of the sequence-to-sequence model is fairly complex and requires special state passing, we're going to subclass the Keras-based model and build our network explicitly. We start here with the text factorization layer we've already discussed since that's going to be what we feed in our input data through to convert it to indices. After we vectorize the inputs, we're going to pass them through the encoder blocks-- first in embedding, then in LSTM. And you'll note that in our init function here, we're just configuring the layers. And we're not actually passing any data through them yet. In Keras, each layer has two stages-- the construction, when you parametrize your layer, as seen here, and the calling of the layer, which will come later when we pass our data through. The decoder is somewhat more complicated. But here, we can leverage the TensorFlow AddOn sampler and decoder and set up the decoder LSTM and connect it to projection layer, which is our final dense layer that maps to the vocabulary we want to predict, plus two tokens for predicting calls and out of vocabulary words. The final set of layers we will set up are a pair of attention layers. There is a large and growing field of attention, research, and machine learning. But given the time constraints of this talk, I'm only going to give you a very hand wavy explanation and say, attention is a technique that allows the model to track intermediate states as it steps long sequences. And it will allow the model to give more weight to certain time steps of a sequence when predicting the final word. Here, we use a simple dense attention layer that comes with tf.keras. And you'll see how we connect this between our encoder and decoder in a minute. So we have a lot of state to pass between our encoder and decoder, which means we can't just use the standard Keras fit call. And one of the things I'm most excited about in TensorFlow 2 is that we have refactored and cleaned up the Keras training loop. And we've made it much more modular. So instead of overriding the entire fit loop or throwing it out altogether, I'm just going to define a single forward pass from my model in this special function, train_step, and that is going to get called by model.fit with each step. So we overwrite train_step in our encoder decoder model. That train_step is going to get one batch of data at a time, so we just need to define the forward pass for that one batch. And the first thing we do here is unpack our data and separate the example from the label. You'll note that we call our own vectorization layer here to ensure that our input strings get correctly transformed to indices. Next, still inside our single training step, we're going to record our forward pass under a gradient take. Anything that needs to be back propped through should go under here. So while the vectorized layer or preprocessing layer was out of the tape, our encoder embedding, LSTM, and so forth all belong under the tape. And here we pass our inputs through the set of layers we defined on our init to encode them. We also set up our attention layers to track the intermediate state coming out of the encoding layers. And next, we decode, which is to say we try to predict our targets using the state from our encoder. The decoder here, we'll go over the many epics it runs for, train its own weights separate from the encoder weights. And in concert, the encoder and decoder will learn to predict text based on the outputs. And now that we've run all of our layers necessary to form the forward pass, we can compute the loss and collect the outputs of this step so that they can be optimized. The Keras model takes care of collecting variables and setting the optimizer, so we just choose how we want to pass things through here. As the final step, we collect and return the metrics we set in our model. The next step is to pick an optimizer loss and accuracy metric for our model. And these are going to govern the actual training and optimization process. And we select them from a bunch of independently parametrizable options that are built into tf.keras, and then we train. It might take a while to convergence. So I threw in a model checkpoint to callback to make sure I can save my model weights as I go. We can monitor progress as we train. And our goal is to reach some degree of convergence of the reported accuracy. And with this model, this happens somewhere around 45 epics at about 70% accuracy. And 90% accuracy is pretty good. But we might ask ourselves, can we do better? And the parameters I chose when training were thoroughly arbitrary, copied from somewhere on the internet. Maybe the models should be bigger or smaller. Typically, we spend some time tuning these model parameters to ensure that we have the best results for a given model architecture. We call this process hyperparameter tuning. And notably, easy hyperparameter tuning is one of the most requested features for Keras. And good news, we have a package for that. KerasTuner was released last October. And it works with TensorFlow, Keras, and even scikit-learn. It allows you to create hypermodels that encode tunable parameters, distribute through training, and state the model for others to use. So let's take it for a quick spin in order to tune some of our models parameters. For example, let's say we wanted to tune the number of RNN units in our model. We can import the pip installable KerasTuner package and then define a function that takes hyperparameter object and uses it to build a compiler model. Inside this function, instead of passing in a fixed init for RNN units, we can use the magic and selector objects, which will allow us to try any integer in this range. There are a number of different selectors you can use here, including floating point numbers and enums. We can then define a tuner algorithm for searching our hyperparameter space and use the tuner to build and fit our model intelligently across that space. Because we overwrote train_step with our custom functionality, everything works within the Keras ecosystem. And the tuner will be able to call .fit, just like we did to get the correct training behavior. The tuner will run through all of the different hyperparameter combinations you've configured. And it even works with Colab and prints out trial information, as you see here. And after a few tuning sessions, we see that, in fact, the best RNN unit count is 1,024. And rerunning with 1,024 units, we improve our accuracy. And now we have an even better model that gets above 90% accuracy. So now that we have a trained model, of course, our goal was not just to create it, but to actually generate text. So how do we take the model we've built and use it to write a sentence? The first step is to use our model's predict one word given the length of the input, which is exactly what we trained the model to do. Just as we did with train step, we can overwrite predict step to define the operations on just one batch of data. In our predict step, we run through the-- or we've run the inputs through the same encoder and decoder we saw with training, but now with fixed weights. We can also throw in some custom logic here. And we allowed the model to predict from the top end choices instead of always the most likely word. And we also convert back to the actual English word rather than just returning the numeric indices. We can try this out. And we see that, indeed, we produce the next predicted word correctly. But, of course, we don't want single words. We want a whole sentence so we can go further and define a custom predict that just takes a single string and then generates one word at a time to continuously append to that starting string and generate a much longer string. And lo and behold, we can generate vaguely meaningful statements. It's not perfect. And it's entirely unpunctuated. But it can be a lot of fun. And unlike Karmel's six-year-old, it doesn't lie and happens to like telling funny stories and doesn't get tired. You can see that there is a little bit of punctuation that is vaguely human-understandable, even though the model is quite simple and the data is relatively small and constrained. And indeed, the model we just built is the very first baby step of text processing. And it is heavily restricted by having it fit onto slides. But these same tools and techniques are used to build some amazing large-scale models that run at Google scale. So if you're interested in moving from just learning to sending emails and going big with text in TensorFlow 2, check out some of the code that researchers and engineers at Google have released, including the tf.text repository and KerasBert, as well as keras-transformer, which is an example of a truly cutting-edge NLP model. You'll also hear more about TFHub in the next talk, so stay tuned. So to summarize, use TensorFlow data sets and preprocessing layers to transform your inputs. Check out TensorFlow AddOns to special use layers and utilities, subclass Keras models for complicated training pipelines. Tune your hyperparameters with KerasTuner. And don't forget to check out the entire ecosystem of NLP tools built on top of TensorFlow. So thank you so much to the authors of the presentation, Karmel and Yash, to the illustrator of the presentation, the artist responsible for all of these drawings, and to all of you who are listening online. Very excited to talk about all of these tf.keras and tf.text improvements. Thank you. [MUSIC PLAYING]
B1 中級 用TensorFlow和Keras學習閱讀(TF Dev Summit '20) (Learning to read with TensorFlow and Keras (TF Dev Summit '20)) 3 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字