Placeholder Image

字幕列表 影片播放

  • Hi, I'm Robert Crowe,

  • and today I'm going to be talking about TensorFlow Extended,

  • also known as TFX, and how it helps you put your amazing machine learning models

  • into production.

  • This is episode three of our five-part series

  • on real world machine learning in production.

  • We've covered a lot so far in episodes one and two,

  • so if you haven't seen those yet, I'd really recommend watching them.

  • In today's episode, we'll be asking the question,

  • why do I need metadata?

  • Let's find out.

  • ♪ (music) ♪

  • TFX implements a metadata store using ML metadata,

  • which is an open-source framework for doing exactly that--

  • storing ML metadata.

  • It's stored in a relational database,

  • so, basically, any SQL-compatible database will do.

  • But what exactly do we store in our metadata store?

  • First, we store things like the models that we've trained,

  • the data that we trained them on, and their evaluation results.

  • We refer to the things that we store in metadata as artifacts,

  • and artifacts have properties.

  • The data itself is stored outside the database,

  • but the properties and location of the data object

  • is kept in the metadata store.

  • Next, we keep execution records for every component

  • each time it was run.

  • Remember that an ML pipeline is often run frequently over a long lifetime

  • as new data comes in or as conditions change,

  • so keeping that history becomes important.

  • Finally, we also include the lineage or provenance of the data objects

  • as they flow through the pipeline.

  • That allows us to track forward and backward through the pipeline

  • to understand the origins and results of running our components

  • as our data and code changes.

  • This is really important when we need to optimize

  • or debug our pipeline,

  • which would be very hard without it.

  • Now that we have some idea what's in our metadata store,

  • let's look at some of the functionality that we get from it.

  • First, having a lineage or provenance of all of our data artifacts

  • allows us to trace forward and backward in our pipeline--

  • for example, to see what data our model was trained with,

  • or what impact some new feature engineering had on our evaluation metrics.

  • In some use cases,

  • this ability to trace the origins and results of our data

  • may even be a regulatory or legal requirement.

  • Remember that it's not just for today's model or today's results.

  • We're also interested in understanding how our data and results change over time

  • as we take in new data and retrain our model.

  • We often want to compare to model runs that we ran yesterday or last week

  • to understand why our results got better or worse.

  • Production solutions aren't one-time things.

  • They live for as long as you need them, and that can be months or years.

  • We can also make our pipeline much more efficient

  • by only rerunning components when necessary

  • and using a warm start to continue training.

  • Remember that we're often dealing with large data sets

  • that can take hours or days to run.

  • If we've already trained our model for a day

  • and we want to train it some more, let's start from where we left off

  • instead of starting over from the beginning.

  • That's only possible if we saved our model in metadata.

  • We can also make our pipeline much more efficient

  • by only rerunning our other components when the input or code has changed.

  • Instead of rerunning the component again,

  • we can just pull the previous result from cache.

  • For example, if a new run of the pipeline only changes parameters of the trainer,

  • then the pipeline can reuse any data pre-processing artifact,

  • such as vocabularies,

  • and this can save a lot of time

  • given that large data volumes make data pre-processing expensive.

  • With TFX and ML metadata, this re-use comes out of the box,

  • while the user sees a simpler run pipeline interface

  • and does not have to worry

  • about manually selecting which components to run.

  • Again, that can save us hours of processing.

  • That's much easier if we've saved our components' input

  • and results in metadata.

  • In our next episode, we'll discuss orchestration

  • and dive into each of the components that come standard with TFX.

  • For more information on TFX, visit us at tensorflow.org/tfx

  • and don't forget to comment and like us below.

  • And thanks for watching.

  • ♪ (music) ♪

Hi, I'm Robert Crowe,

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

為什麼我需要元數據?(TensorFlow Extended) (Why do I need metadata? (TensorFlow Extended))

  • 1 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字