TensorFlow Extended（TFX）（2018年TensorFlow發展峰會）。 (TensorFlow Extended (TFX) (TensorFlow Dev Summit 2018))

字幕列表影片播放

♪ (music) ♪
Hello, everyone.
First, thanks everyone for coming to attend the Dev Summit.
And second, thanks for staying around this long.
I know it's been a very long day.
And there has been a lot of information that we've been throwing at you.
But we've got much, much more and many more announcements to come.
So please stick with me.
My name is Clemens, and this is Raz.
We're going to talk about TensorFlow Extended today.
But before we do this, I'm going to do a quick survey.
Can I get a quick show of hands?
How many of you do machine learning in a research or academic setting?
Okay.
Quite a big number.
Now how many of you do machine learning in a production setting?
Okay.
That looks about half-half.
Obviously, also a lot of overlap.
So for those of you who do machine learning in a production setting,
how many of you have agreed with this statement?
Yeah? Some? Okay.
I see a lot of hands coming up.
So everyone that I speak with who's doing machine learning
in production agrees with this statement:
"Doing machine learning in production is hard," and it's too hard.
Because after all, we actually want to democratize machine learning
and get more and more people to allow them to deploy machine learning
in their products.
One of the main reasons why it's still hard is because in addition
to the actual machine learning.
So this small orange box where you actually use TensorFlow,
you may use Keras to put together your layers
and train your model.
You need to worry about so much more.
There's all of these other things that you have to worry about
to actually deploy machine learning in a production setting
and serve it within your product.
Now the good news is that this is exactly
what TensorFlow Extended is about.
TFX in [inaudible] Google is an [inaudible] machine learning
platform that allows our developers to go all the way from data to production
and serving machine learning models
as fast as possible.
Now before we introduce TFX,
we saw that going through this process
of writing some of these components, some of them didn't exist before.
Gluing them together and actually getting to
a launch took anywhere between six to nine months,
sometimes even a year.
Once we've deployed TFX and allow developers to use it,
in many cases, people can use this platform and get up and running
with it in a day and actually get to a deployable model in production
in the order of weeks or in just a month.
Now, TFX is a very large system and platform that consists
of a lot of components and a lot of services
so unfortunately I can't talk about all of this in the next 25 minutes.
So we're only going to be able to cover a small part of it but we're talking
about the things that we've already open sourced and made available to you.
First, we're going to talk about TensorFlow Transform
and show you how to apply transformations on your data
consistently between training and serving.
Next, Raz is going to introduce you to a new product that we're open sourcing
called TensorFlow Model Analysis.
We're going to give a demo of how all of this works together end to end
and then make a broader announcement of our plans for TensorFlow Extended
and sharing it the community.
Let's jump into TensorFlow Transform first.
So, a typical ML pipeline that you may see in the wild
is during training,
you usually have a distributed data pipeline that applies transformations
to your data.
Because usually you train in a large amount of data,
this needs to be distributed,
and you're on this pipeline
and sometimes materialize the output before you actually
put it into your trainer.
Now at serving time,
we need to find a way to somehow replay those exact transformations online.
As a new request comes in, it needs to be sent to your model.
There's a couple of challenges with this.
The first one is, usually those two things are very different code paths.
The data distribution systems that you would use for batch processing
are very different from the libraries and tools that you would use to--
in real time transform data to make a request to your model.
Now we have two different code paths.
Second, in many cases, it's very hard to keep those two in sync.
I'm sure a lot of you have seen this.
You change your batch processing pipeline and introduce a new feature or change
how it behaves and you somehow need to make sure that the code
that they actually use in your production system is changed
at the same time and is kept in sync.
The third problem is, sometimes you actually want to deploy
your TensorFlow machine learning model in many different environments.
You want to deploy it in a mobile device; you want to deploy in a server;
maybe you want to put it on a car; now suddenly you have
three different environments where you have to apply
these transformations, and maybe there's different languages
that you use for those, and it's also very hard
to keep those in sync.
And this introduces something that we call training serving skew,
where the transformations that you do at training time may be different
from the ones in serving time, which usually leads to bad quality
of your serving model.
TensorFlow Transform addresses this by helping you write
your data processing job at training time,
so actually help you create those data pipelines to do those
transformations, and at the same time,
it emits a TensorFlow graph that can be
in line with your training model and also your serving model.
Now what this does is, it actually hermetically seals the model,
and your model takes a raw data request as input,
and all of the transformations are actually happening
within the TensorFlow graph.
This is a lot of advantages, one of them is that you no longer
have any code in your serving environment that does these
transformations because they're all being done in the TensorFlow graph.
Another one is wherever you deploy this TensorFlow model,
all of those transformations are applied in a consistent way.
No matter where this graph is being evaluated.
Let's see how that looks like.
This is a code snippet of a pre-processing function
that you would write with TF Transform.
I'm just going to walk you through what happens here
and what we need to do for this.
First thing we do is normalize this feature.
As all of you know, in order to re-normalize a feature
we need to compute the mean and the standard deviation,
and to actually apply this transformation, we need to subtract by the mean
and divide by the center of deviation.
So what has to happen is, for the input feature X,
we have to compute these statistics which is a trivial task.
If the data fits into a single machine, you can do it easily.
It's a non-trivial task if you have a gigantic training data set
and actually have to compute these metrics...
...effectively.
Once we have these metrics we can actually apply this transformation
to the feature.
This is to show you that the output of this transformation can then be,
again, multiplied with another tensor--
which is just a regular TensorFlow transformation.
And then in order to bucketize a feature, you also again need to compute
the bucket boundaries to actually apply this transformation.
And again, this is a distributed data job to compute those metrics for the result
of an already transformed feature.
This is another benefit to then actually apply this transformation.
The next examples just show you that in the same function it can apply
any other tensor in tensor [inaudible] function and there's also some
of what we call mappers in TF transform that don't require this analyze phase.
So, N-grams doesn't require us to actually run a data pipeline
to compute anything.
Now what happens here is that these orange boxes
are what we call analyzers.
We realize those as actual data pipelines that compute those metrics over your data.
They're implemented using Apache Beam.
And we're going to talk about this more later.
But what this allows us to do is actually run this distributor data pipeline
in different environments.
There's different runners for Apache Beam.
And all of the transforms are just simple instance to instance transformations
using pure TensorFlow code.
What happens when you run TensorFlow Transform
is that we actually run these analyze phases,
compute the results of those analyze phases,
and then inject the result as a constant in the TensorFlow graph--
so this is on the right-- and in this graph,
it's a hermetic TensorFlow graph that applies all the transformations,
and it can be in-lined in your serving graph.
So now your serving graph has the transform graph
as part of it and can play through all of these transforms
wherever you want to deploy this TensorFlow model.
What can be done with TensorFlow Transform?
At training time for the batch processing, really anything that you can do
with a distributed data pipeline.
So there's a lot of flexibility here with types of statistics you can compute.
We provide a lot of utility functions for you,
but you can also write custom data pipelines.
And at serving time because we generate a TensorFlow graph that applies
these transformations-- we're limited to what you can do
with a TensorFlow graph, but for all of you who know TensorFlow,
there's a lot of flexibility in there as well.
Anything that you can do in a TensorFlow graph,
you can do with your transformations.
Some of the common use cases that we've seen, the ones on the left
I just spoke about, you can scale a continuous value to the C-score
which is minimalization or to a value between 0 and 1.
You can bucketize a continuous value.
If you have text features, you can apply Bag of Words or N-grams,
or for feature crosses, you can actually cross
those strings and then generate vocabs of the result of those crosses.
As mentioned before, TF Transform is extremely powerful
in actually being able to chain together these transforms so you can apply
transform under result of a transform and so on.
Another particular interesting transform is actually applying
another TensorFlow model.
You've heard about the saved model before?
If you have a saved model that you can apply as a transformation,
you can use this until you've transformed.
Let's say you have an image and you want to apply
an inception model as it transforms and then use the output of that
inception model maybe to combine it with some other feature
or use it as an input feature to your model.
You can use any other TensorFlow model
that ends up being in-lined in your transform graph
and also in-lined in your serving graph.
All of this is available today and you can go check it out
on github.com/tensorflow/transform.
With this I'm going to hand it over to Raz who's going to talk
about TensorFlow Model Analysis.
Alright, thanks Clemens.
Hi, everyone.
I'm really excited to talk about
TensorFlow Model Analysis today.
We're going to talk a little bit about metrics.
Let's see, next slide.
Alright, so we can already get metrics today right?
We use TensorBoard. TensorBoard's awesome.
You saw an earlier presentation today about TensorBoard.
It's a great tool-- while you're training,
you can watch your metics, right?
If your training isn't going well, you can save yourself
a couple of hours of your life, right?
Terminate the training, fix some things...
Let's say you have your trained model already.
Are we done with metics? Is that it?
Is there any more to be said about metics after we're done training?
Well, of course, there is.
We want to know how well our trained model actually does
for our target population.
I would argue that we want to do this in a distributed fashion
over the entire data set.
Why wouldn't we just sample?
Why wouldn't we just save more hours of our lives, right?
And just sample, make things fast and easy.
Let's say you start with a large data set.
Now you're going to slice that data set.
You're going to say, "I'm going to look at people at noon time."
Right? That's a feature.
>From Chicago, my hometown.
Running on this particular device.
Each of these slices reduce the size
of your evaluation dataset by a factor.
This is an exponential decline.
By the time you're looking at the experience for a particular...
...set of users, you're not left with very much data.
And the error bars on your performance measures, they're huge.
I mean, how do you know that the noise doesn't exceed your signal
by that point, right?
So really you want to start with your larger dataset
before you start slicing.
Let's talk about a particular metric.
I'm not sure--
Who's heard of the ROC Curve?
It's kind of an unknown thing in machine learning these days.
Okay.
We have our ROC Curve, and I'm going to talk about a concept
that you may or may not be familiar with
which is ML Fairness.
So what is fairness?
Fairness is a complicated topic.
Fairness is basically how well does our machine learning model do
for different segments of our population, okay?
You don't just have one ROC Curve,
you have an ROC Curve for every segment.
You have an ROC Curve for every group of users.
Who here would run their business
based on their top line metrics?
No one! Right? That's crazy.
You have to slice your metrics; you have to go in and dive in
and find out how things are going so that lucky user,
that black curve on the top, great experience.
That unlucky user, the blue curve?
Not such a great experience.
When can our models be unfair to various users?
One instance is if you simply don't have a lot of data
from which to draw your inferences.
Right?
We use Stochastic optimizers,
and if we re-train the model,
it does something different every time, slightly.
You're going to get a high variance for some users just because
you don't have a lot of data there.
We may be incorporating data from multiple data sources.
Some data sources are more biased than others.
So some users just get the short end of the deal, right?
Whereas other users get the ideal experience.
Our labels could be wrong. Right?
All of these things can happen.
Here's TensorFlow Model Analysis.
You're looking here at the UI hosted within a Jupyter Notebook.
On the X-axis, we have our loss.
You can see there's some natural variance in the metrics.
We're not always going to get spot on the same precision
and recall for every segment of population.
But sometimes you'll see... what about those guys
at the top there experiencing the highest amount of loss?
Do they have something in common?
We want to know this.
Sometimes our users that...
...get the poorest experience,
they're sometimes our most vocal users, right?
We all know this.
I'd like to invite you to come visit ml-fairness.com.
There's a deep literature about
the mathematical side of ML Fairness.
Once you've figured out how to measure fairness,
there's a deep literature about what to do about it.
How does TensorFlow Model Analysis actually give you these sliced metrics?
How did you go about getting these metrics?
Today you export a saved model for serving.
It's kind of a familiar thing.
TensorFlow Model Analysis is simple.
As it's simple, it's similar.
You export a saved model for evaluation.
Why are these models different? Why export two?
Well the eval graph that we serialize as a saved model
has some additional annotations
that allow our evaluation batch job
to find the features, to find the prediction, to find the label.
We don't want those things mixed in with our serving graphs
so you export a second one.
So this is the GitHub.
We just opened it, I think last night at 4.30 pm.
Check it out.
We've been using it internally for quite some time now.
Now it's available externally as well.
The GitHub has an example
that kind of puts it all together
so that you can try all these components that we're talking about
from your local machine.
You don't have to get an account anywhere.
You just get cloned and run the scripts
and run the code lab.
This is the Chicago Taxi Example.
So we're using public data from-- publicly available data
to determine which riders will tip their driver
and which riders, shall we say,
don't have enough money to tip today.
What does fairness mean in this context?
So our model is going to make some predictions.
We may want to slice these predictions by time of day.
During rush hour we're going to have a lot of data so hopefully
our model's going to be fair if that data is not biased.
At the very least it's not going to have a lot of variance.
But how's it going to do at 4 a.m. in the morning?
Maybe not so well.
How's it going to do when the bars close?
An interesting question.
I don't know yet, but I challenge you to find out.
So this is what you can run using your local scripts.
We start with our raw data.
We run the TF Transform; the TF Transform emits
a transform function and our transformed examples.
We train our model.
Our model, again, emits two saved models as we talked about.
One for serving and one for eval.
And we try this all locally, just run scripts and play with the stuff.
Clemens talked a little bit about transform.
Here we see that we want to take our dense features,
and we want to scale them to a particular Z-Score.
And we don't want to do that batch by batch
because the mean for each batch is going to differ,
and there's going to be fluctuations.
We may want to do that across the entire data set.
We may want to normalize these things across the entire data set.
We build a vocabulary; we bucket for the wide part of our model,
and we emit our transform function, and into the trainer we go.
You heard earlier today about TF Estimators,
and here is a wide and deep estimator
that takes our transformed features
and emits to saved models.
Now we're in TensorFlow Model Analysis,
which reads in the saved model
and runs it against all of the raw data.
We called render slicing metrics from the Jupyter Notebook,
and you see the UI.
The thing to notice here is that this UI is immersive, right?
It's not just a static picture that you can look at and go,
"Huh" and then walk away from.
It lets you see your errors broken down
by bucket or broken down by feature,
and it lets you drill in and ask questions
and be curious about how your models are actually treating various subsets
of your population.
Those subsets may be the lucrative subsets
you really want to drill in.
And then you want to serve your models so our demo--
our example has a one-liner here
that you can run to serve your model.
Make a client request--
the thing to notice here is that we're making
a GRPC request to that server.
We're taking our feature tensors, we're serializing them
into the GRPC request, sending them to the server
and back comes probability.
But that's not quite enough, right?
We've heard a little bit of feedback about this server.
The thing that we've heard is that GRPC is cool,
but REST is really cool.
I tried.
This is actually one of the top feature requests
on GitHub for model serving.
You can now pack your tensors into a JSON object,
send that JSON object to the server
and get a response back to [inaudible].
Much more convenience and I'm very excited to say
that it'll be released very soon.
Very soon.
I see the excitement out there.
Back to the end to end.
You can try all of these pieces end to end all on your local machine.
Because they're using Apache Beam direct runners, and direct runners
allow you to take your distributive job and to run them all locally.
Now if you swap in Apache Beam's data flow runner,
you can now run against the entire data set in the cloud.
The example also shows you how to run the big job
against the cloud version as well.
We're currently working with a community to develop
a runner for Apache Flink, a runner for Spark.
Stay tuned to the TensorFlow blog
and to our GitHub...
...and you can find the example at tensorflow/model-analysis
and back to Clemens.
Thank you, Raz.
(applause)
Alright, so we've heard about Transform.
We've heard how to train models, how to use model analysis
and how to serve them.
But I hear you say you want more.
Right? Is that enough?
You want more? Alright.
You want more.
And I can think of why you want more.
Maybe you read the paper we published last year and presented
at KDD about TensorFlow Extended.
In this paper we laid out this broad vision of how
this platform works within Google and all of the features that it has
and all the impact that we have by using it.
Figure one, which allows these boxes and describes
what TensorFLow Extended actually is.
Although, overly simplified, this is still much more
than we've discussed today.
Today, we spoke about these four components
of TensorFlow Extended.
Now it's important to highlight that this is not yet an end to end
machine learning platform.
This is just a very small piece of TFX.
These are the libraries that we've open-sourced
for you to use.
But we haven't yet released the entire platform.
We're working very hard on this because we've seen
the profound impact that it had internally--
how people could start using this platform
into applying machine learning in production using TFX.
And we've been working very hard to actually make
more of these components available to you.
So in the next phase, we're actually looking into our data components
and looking to make those available to users
that you can analyze your data, visualize the distributions,
and detect anomalies because it's an important part
of any machine learning pipeline
to detect changes and shifts in your data and anomalies.
After this we're actually looking into some of the horizontal pieces
that helped tie all of these components together
because if they're only single libraries, you still have
to glue them together yourself.
You still have to use them individually.
They have well-defined interfaces, but you still have to combine them
by yourself.
Internally we have a shared configuration framework that allows you
to configure the entire pipeline and a nice integrated fountain
that allows you to monitor the status of these pipelines
and see progress and inspect the different artifacts
that have been produced by all of the components.
So this is something that we're also looking to release
later this year.
And I think you get the idea.
Eventually we want to make all of this available to the community
because internally, hundreds of teams use this
to improve our product.
We really believe that this will be as transformative
to the community as it is at Google.
And we're working very hard to release more of these technologies
into the entire platform to see what you can do
with them for your products and for your companies.
Keep watching the TensorFlow blog posts for a more detailed announcement
about TFX and our future plans.
And as mentioned, you can already use
some of these components today.
Transform is released.
Model Analysis was just released yesterday,
Serving is also released,
and the end-to-end example is available
under the shortlink and you can find it on the model analysis [inaudible].
So with this, thank you from both myself and Raz,
and I'm going to ask you to join me in welcoming
a special external guest, Patrick Brand, who's joining us from Coca-Cola,
who's going to talk about applied AI at Coca-Cola.
Thank you.
(applause)
♪ (music) ♪