字幕列表 影片播放 列印英文字幕 ♪ (music) ♪ Hi, everybody, and welcome to TensorFlow Meets. I'm really excited to have Clemens Mewald with us. Now Clemens did a couple of talks at the TensorFlow Developers Summit about TFX, about data, and you had a really busy Summit, right? Yeah, it's been a lot of content. So what exactly is TensorFlow Extended? TensorFlow Extended is an end-to-end machine learning platform that was built at Google for our product teams to accelerate the path from training a model to deploying it in production, and it really contains all of the components you need to go from data to serving. And now we are in the process of making it available to the community through open source. Now, I know there's been a few pretty big releases in TensorFlow, like in the pipeline stuff. Could you talk us through that a little bit? So, in the past, we've released components of TFX, so those were the libraries, along the way. TensorFlow Data Validation was the most recent one. But today we've actually released some of the horizontal components that integrate all of these libraries into one end-to-end product so that you can actually configure and run it as one end-to-end pipeline and not as individual libraries. Okay, now just taking a step back for a second, so the whole idea around TensorFlow Extended (TFX) is really if I want to productize my machine learning. I remember you once showed a diagram of the machine learning itself is this little box. In order to be able to deploy a system there's all this stuff around outside it. So lots of great progress has been made in that. Any particular ones that excite you? So I think, in general... As you mentioned, we've released a lot of these components in the past. And, by the way, these are the same components that we use at Google for our production workloads. The thing that I'm most excited about these days is the horizontal pieces that pull them all together, and one of the ones that we've launched today is actually the metadata piece that keeps track of all the artifacts and all of the metadata of those runs that enable a lot of very exciting features and use cases. Can we dig a little deeper into the metadata? How would I start using it, for example? What use would it be for me if I'm training production? So if you use any of our TFX examples, you are already using metadata, so for the user, it should be-- come out of the box. And one of the most obvious use cases that people get excited about is experiment tracking. So, every single time we train a new model we keep a record of these models, and then, after the fact, you can go back and look at TensorBoard for these models, you can compare TensorBoard runs for these models. And we also keep track of all of the other artifacts that have been created in this pipeline. I see, okay, cool! Now, obviously, data is the lifeblood of any machine learning, training, and building any kind of production system, so there's lots of great tools for managing data, and I know one of your talks was around data. So where are we at with all of that? So, specifically, TensorFlow Data Validation is the library that we use to make sure that the data that we ingest into these systems is actually good data, and we're aware that it doesn't have any errors in it or it's not missing any specific features. And that library has been open source for a while, but now, with this end-to-end release, we actually built a few components using this library. So the first one computes statistics. You can actually look at the distribution of your data. The second one infers a schema from those statistics. And the third one validates your data against the schema, so it can actually create an anomaly report to tell you if your data conforms to the schema, or why it doesn't-- if there's new types of values, or if there's distributional shift in your data. And that comes in really handy to prevent you with training mistakes, right? Exactly, because you need to catch these things early. Because if you train a model on bad data, you will find out late that your model isn't performing well. So what you really want to do is catch these mistakes early so that you can fix your data before you even start training a model. Yep, and also just prevent burning training time. - Exactly. - Right, so okay. So now, if I want to just start taking advantage of all these things. If I want to start taking advantage of metadata. I know we said a lot of it's out of the box... And if I want to start using Data Validation, where should I, as a developer, go to get started with all of this. The easiest place is probably on the TensorFlow.org website. So on tensorflow.org/tfx is where we linked our new user guides. And the user guide actually walks you through how to use TensorFlow Extended and explains each and every component along the way. Oh, nice. That's really handy. And are there some samples I can kick around? Yes, so we actually ported our Chicago text example to use this orchestration... - Oh, cool! - And to use that metadata. So we actually showcase how you can orchestrate the pipeline using Airflow, on a local machine, and then how to actually scale all of this on Cloud using Kubeflow pipelines. Nice, nice. So, thank you so much, Clemens. This is always inspirational and is always informative. And thanks, everybody, for watching this episode of TensorFlow Meets. If you have any questions for me, if you have any questions for Clemens, just please leave them in the comment below. And links to everything that we discussed today I'll put into the description text. So, thanks so much. ♪ (music) ♪
B1 中級 TensorFlow Extended (TFX)和元數據(TensorFlow Meets) (TensorFlow Extended (TFX) and Metadata (TensorFlow Meets)) 2 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字