Placeholder Image

字幕列表 影片播放

  • ♪ (music) ♪

  • Hi, everybody, and welcome to TensorFlow Meets.

  • I'm really excited to have Clemens Mewald with us.

  • Now Clemens did a couple of talks

  • at the TensorFlow Developers Summit

  • about TFX, about data,

  • and you had a really busy Summit, right?

  • Yeah, it's been a lot of content.

  • So what exactly is TensorFlow Extended?

  • TensorFlow Extended is an end-to-end machine learning platform

  • that was built at Google for our product teams

  • to accelerate the path from training a model

  • to deploying it in production,

  • and it really contains all of the components you need

  • to go from data to serving.

  • And now we are in the process

  • of making it available to the community through open source.

  • Now, I know there's been a few pretty big releases in TensorFlow,

  • like in the pipeline stuff.

  • Could you talk us through that a little bit?

  • So, in the past, we've released components of TFX,

  • so those were the libraries, along the way.

  • TensorFlow Data Validation was the most recent one.

  • But today we've actually released

  • some of the horizontal components

  • that integrate all of these libraries

  • into one end-to-end product

  • so that you can actually configure and run it

  • as one end-to-end pipeline

  • and not as individual libraries.

  • Okay, now just taking a step back for a second,

  • so the whole idea around TensorFlow Extended (TFX)

  • is really if I want to productize my machine learning.

  • I remember you once showed a diagram

  • of the machine learning itself is this little box.

  • In order to be able to deploy a system

  • there's all this stuff around outside it.

  • So lots of great progress has been made in that.

  • Any particular ones that excite you?

  • So I think, in general...

  • As you mentioned,

  • we've released a lot of these components in the past.

  • And, by the way, these are the same components

  • that we use at Google for our production workloads.

  • The thing that I'm most excited about these days

  • is the horizontal pieces that pull them all together,

  • and one of the ones that we've launched today

  • is actually the metadata piece

  • that keeps track of all the artifacts

  • and all of the metadata of those runs

  • that enable a lot of very exciting features and use cases.

  • Can we dig a little deeper into the metadata?

  • How would I start using it, for example?

  • What use would it be for me if I'm training production?

  • So if you use any of our TFX examples,

  • you are already using metadata,

  • so for the user, it should be-- come out of the box.

  • And one of the most obvious use cases

  • that people get excited about is experiment tracking.

  • So, every single time we train a new model

  • we keep a record of these models,

  • and then, after the fact, you can go back

  • and look at TensorBoard for these models,

  • you can compare TensorBoard runs for these models.

  • And we also keep track of all of the other artifacts

  • that have been created in this pipeline.

  • I see, okay, cool!

  • Now, obviously, data is the lifeblood of any machine learning,

  • training, and building any kind of production system,

  • so there's lots of great tools for managing data,

  • and I know one of your talks was around data.

  • So where are we at with all of that?

  • So, specifically, TensorFlow Data Validation

  • is the library that we use

  • to make sure that the data that we ingest into these systems

  • is actually good data,

  • and we're aware that it doesn't have any errors in it

  • or it's not missing any specific features.

  • And that library has been open source for a while,

  • but now, with this end-to-end release,

  • we actually built a few components using this library.

  • So the first one computes statistics.

  • You can actually look at the distribution of your data.

  • The second one infers a schema from those statistics.

  • And the third one validates your data against the schema,

  • so it can actually create an anomaly report

  • to tell you if your data conforms to the schema,

  • or why it doesn't--

  • if there's new types of values,

  • or if there's distributional shift in your data.

  • And that comes in really handy

  • to prevent you with training mistakes, right?

  • Exactly, because you need to catch these things early.

  • Because if you train a model on bad data,

  • you will find out late that your model isn't performing well.

  • So what you really want to do is catch these mistakes early

  • so that you can fix your data

  • before you even start training a model.

  • Yep, and also just prevent burning training time.

  • - Exactly. - Right, so okay.

  • So now, if I want to just start

  • taking advantage of all these things.

  • If I want to start taking advantage of metadata.

  • I know we said a lot of it's out of the box...

  • And if I want to start using Data Validation,

  • where should I, as a developer, go

  • to get started with all of this.

  • The easiest place is probably on the TensorFlow.org website.

  • So on tensorflow.org/tfx is where we linked our new user guides.

  • And the user guide actually walks you through

  • how to use TensorFlow Extended

  • and explains each and every component along the way.

  • Oh, nice. That's really handy.

  • And are there some samples I can kick around?

  • Yes, so we actually ported our Chicago text example

  • to use this orchestration...

  • - Oh, cool! - And to use that metadata.

  • So we actually showcase how you can orchestrate the pipeline

  • using Airflow, on a local machine,

  • and then how to actually scale all of this on Cloud

  • using Kubeflow pipelines.

  • Nice, nice. So, thank you so much, Clemens.

  • This is always inspirational and is always informative.

  • And thanks, everybody,

  • for watching this episode of TensorFlow Meets.

  • If you have any questions for me,

  • if you have any questions for Clemens,

  • just please leave them in the comment below.

  • And links to everything that we discussed today

  • I'll put into the description text.

  • So, thanks so much.

  • ♪ (music) ♪

♪ (music) ♪

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

TensorFlow Extended (TFX)和元數據(TensorFlow Meets) (TensorFlow Extended (TFX) and Metadata (TensorFlow Meets))

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字