字幕列表 影片播放 列印英文字幕 Hi, I'm Robert Crowe, and today I'm going to be talking about TensorFlow Extended, also known as TFX, and how it helps you put your amazing machine-learning models into production. This is Episode 2 of our five-part series on "Real World Machine Learning in Production." We covered a lot in Episode 1, so, if you haven't seen that yet, I'd really recommend watching it. In today's episode, we'll be asking the question, "How do these pipeline things work?" Let's find out. TFX pipelines are created as a sequence of components, each of which performs a different task. Components are organized into directed acyclic graphs, or "DAGs." But what exactly is a component? A TFX component has three main parts: a driver, an executor, and a publisher. Two of these parts, the driver and publisher, are mostly boilerplate code that you could change but probably will never need to. Where you insert your code and do your customization is really in the executor. The driver handles coordinating job execution and feeding data to the executor. The publisher takes the results of your executor and updates the metadata store, which we'll talk about more in the next episode. But the executor is really where the work is done for each of the components. So, first, we need a configuration for our component, and with TFX, that configuration is done using Python. Next, we need some input for our component and a place to send our results. That's where the metadata store comes in. We'll talk more about the metadata store in our next episode, but, for now, just be aware that, for most components, the input will come from the metadata store and the result will be written back to the metadata store. So, as our data moves through the pipeline, components will read metadata that was produced by an earlier component and write metadata that will probably be used by a component farther down the pipeline. There are some exceptions, like at the beginning and end of the pipeline, but, for the most part, that's how data flows through a TFX pipeline. To organize all these components and manage these pipelines, we need orchestration. But what is orchestration, exactly, and how does it help us? If all that you need to do is kick off the next stage of the pipeline, task-aware architectures are enough. You can simply start the next component as soon as the previous component finishes. But a task- and data-aware architecture is much more powerful, and really almost a requirement for any production system, because it stores all the artifacts of every component over many executions. Having that metadata creates a much more powerful pipeline and enables a lot of things which would otherwise be very difficult. So TFX implements a task- and data-aware pipeline architecture. We'll be discussing that in detail in the next episode, so stay tuned. To put an ML pipeline together, define the sequence of components that make up the pipeline, and manage their execution, we need an orchestrator. An orchestrator provides a management interface that we can use to trigger tasks and monitor our components. One of the ways that TFX is open and extendable is with orchestration. We provide support for Apache Airflow and Kubeflow out of the box. But you can write code to use a different orchestrator if you need to. So if you've already got an orchestrator that you like, you can use it with TFX. We don't force you to change. Here's what a TFX DAG, or directed acyclic graph, looks like in two different orchestrators-- Airflow and Kubeflow. It's the same DAG, just two slightly different ways of displaying it. In our next episode, we'll discuss the role of metadata and how it helps us create much more powerful pipelines. For more information on TFX, visit us at tensorflow.org/tfx, and don't forget to comment and like us below, and thanks for watching. ♪ (music) ♪
B2 中高級 TFX管道是如何工作的?(TensorFlow Extended) (How do TFX pipelines work? (TensorFlow Extended)) 3 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字