字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] KARMEL ALLISON: Hi and welcome to Coding TensorFlow. I'm Karmel Allison, and I'm here to guide you through a scenario using TensorFlow's high-level APIs. This video is the second in a three-part series. In this, we'll dig deeper into preparing the data for machine learning, including using feature columns, categorical data, and much more. We'll also explore a machine learning model built using Keras that can be trained with this data. In the previous video, we spoke about a complex data set and how you can load that and get it ready to use in TensorFlow. We used the Covertype data set from the US Forestry Service and Colorado State University, which has about 500,000 rows of geophysical data collected from particular regions in National Forest areas. We are going to use the features in this data set to try to predict the soil type that was found in each region. We took the raw data and put it into a TensorFlow data set that generates dictionaries of feature tensors and labels, but we still have lots of feature types. Some are continuous, some are categorical, some are one-hot encoded. We need to represent these in a way that is meaningful to an ML model. You'll learn how to do that in this video, so let's get started. We are going to use feature columns for that. In TensorFlow, a feature column is a configuration class. It doesn't itself hold any data but it tells our model how to transform the raw data so that it matches the expectation in many ML models that the data is numeric and continuous. If you're working with data that is already numeric, image data, for example, feature columns may not be necessary, but for many real-world applications, data is structured and represents vocabularies or human concepts that we need to transform before we can use them in machine learning models. Feature columns are a great way to do that. Let's take, for example, our Covertype category, which is an integer between 1 and 7 that represents the type of tree in the region. You'll note that all we've done here is define a type of feature, and we haven't passed any of our data into that feature yet. It is just a configuration object that will tell our model to expect categorical IDs less than the outer range value of 8. Now we have to configure how we want to transform our categorical data for use in a model that expects continuous data. Using feature columns, we can trivially build a set of instructions that allow the model to convert the categories into an embedding column, as shown here. Now, we could have done this processing in our data parsing function ourselves, converting the categorical IDs to a one-hot vector manually. The advantage of using feature columns is that the transformations they encode become part of the model's graph and can therefore be exported with the saved model. So you should push any transformations that you want to apply to data both during training and at inference time into feature columns. We can define columns for each of our features. Data that is already numeric is straightforward. We just use a numeric column. Sometimes, as in the case of soil type data here, data is spread out over a vector, and numeric feature columns allow us to easily capture that relationship with the shape argument so that our model understands wilderness area as a length 40 tensor rather than 40 independent tensors. All right, so we configure all of our features, and then what? Well, these become the first layer of our model using a feature layer. When we train our model, this first layer will act like any other Keras layer, but its primary role will be to take in the raw data, including the categorical indices, and transform it into the appropriate representations that our neural net is expecting. This layer will also handle creating and training our embedding Covertype. So if you have data that needs transformation before it fits into a model-- maybe it's categorical like ours or even has string names and vocabularies-- you can use feature columns to handle those transformations, batch by batch, in TensorFlow, rather than having a whole separate pipeline to do feature transformations in memory. TensorFlow provides many feature columns and even ways to combine individual columns into more complex representations of the data that your model can learn. So, before we wrap up, let me quickly show you how this would be a layer in a Keras model, which we'll go into in more detail in the next video. Note that we are using tf.keras here, which implements the Keras API spec but adds additional TensorFlow-specific features on top of it, like support for TensorFlow's eager execution, optimizers, and so on. Since the first thing I want to try is a simple sequence of deep learning layers, Keras is the easiest way to start. We will start with a simple sequential model, but what I want to focus on right now is just this first layer. Our first layer is a feature layer that will do all the data transformation we just discussed and feed the transformed data into the rest of the model. We'll do that in part three of this series, where we'll look at adding the data and training the model with it, including choosing loss functions and optimizers. It will be right here on the TensorFlow YouTube channel, so don't forget to hit that Subscribe button, and I'll see you there. [MUSIC PLAYING]
B2 中高級 TensorFlow高級API。第二部分--深入研究數據和特性 (TensorFlow high-level APIs: Part 2 - going deep on data and features) 2 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字