字幕列表 影片播放 列印英文字幕 Hi, I'm Robert Crowe, and today I'm going to be talking about TensorFlow Extended, also known as TFX, and how it helps you put your amazing machine learning models into production. This is episode three of our five-part series on real world machine learning in production. We've covered a lot so far in episodes one and two, so if you haven't seen those yet, I'd really recommend watching them. In today's episode, we'll be asking the question, why do I need metadata? Let's find out. ♪ (music) ♪ TFX implements a metadata store using ML metadata, which is an open-source framework for doing exactly that-- storing ML metadata. It's stored in a relational database, so, basically, any SQL-compatible database will do. But what exactly do we store in our metadata store? First, we store things like the models that we've trained, the data that we trained them on, and their evaluation results. We refer to the things that we store in metadata as artifacts, and artifacts have properties. The data itself is stored outside the database, but the properties and location of the data object is kept in the metadata store. Next, we keep execution records for every component each time it was run. Remember that an ML pipeline is often run frequently over a long lifetime as new data comes in or as conditions change, so keeping that history becomes important. Finally, we also include the lineage or provenance of the data objects as they flow through the pipeline. That allows us to track forward and backward through the pipeline to understand the origins and results of running our components as our data and code changes. This is really important when we need to optimize or debug our pipeline, which would be very hard without it. Now that we have some idea what's in our metadata store, let's look at some of the functionality that we get from it. First, having a lineage or provenance of all of our data artifacts allows us to trace forward and backward in our pipeline-- for example, to see what data our model was trained with, or what impact some new feature engineering had on our evaluation metrics. In some use cases, this ability to trace the origins and results of our data may even be a regulatory or legal requirement. Remember that it's not just for today's model or today's results. We're also interested in understanding how our data and results change over time as we take in new data and retrain our model. We often want to compare to model runs that we ran yesterday or last week to understand why our results got better or worse. Production solutions aren't one-time things. They live for as long as you need them, and that can be months or years. We can also make our pipeline much more efficient by only rerunning components when necessary and using a warm start to continue training. Remember that we're often dealing with large data sets that can take hours or days to run. If we've already trained our model for a day and we want to train it some more, let's start from where we left off instead of starting over from the beginning. That's only possible if we saved our model in metadata. We can also make our pipeline much more efficient by only rerunning our other components when the input or code has changed. Instead of rerunning the component again, we can just pull the previous result from cache. For example, if a new run of the pipeline only changes parameters of the trainer, then the pipeline can reuse any data pre-processing artifact, such as vocabularies, and this can save a lot of time given that large data volumes make data pre-processing expensive. With TFX and ML metadata, this re-use comes out of the box, while the user sees a simpler run pipeline interface and does not have to worry about manually selecting which components to run. Again, that can save us hours of processing. That's much easier if we've saved our components' input and results in metadata. In our next episode, we'll discuss orchestration and dive into each of the components that come standard with TFX. For more information on TFX, visit us at tensorflow.org/tfx and don't forget to comment and like us below. And thanks for watching. ♪ (music) ♪
B1 中級 為什麼我需要元數據?(TensorFlow Extended) (Why do I need metadata? (TensorFlow Extended)) 1 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字