TFX這個東西到底是什麼？(TensorFlow Extended) (What exactly is this TFX thing? (TensorFlow Extended))

字幕列表影片播放

Hi, I'm Robert Crowe
and today I'm going to be talking about TensorFlow Extended,
also known as TFX, and how it helps you put
your amazing machine learning models into production.
This is the first episode in our five-part series
on real world machine learning which will help you
get up to speed on using TFX to create your own
production machine learning pipelines.
In today's episode, we'll be asking the question,
what exactly is this TFX thing anyway?
Let's find out.
♪ (upbeat music) ♪
When we think about ML, we usually only think
about the great models that we can now create.
After all, that's what all the research papers
are focused on.
But when we want to take that amazing model
and make it available to the world, we need to think about
all the things that a production solution requires.
So that's why we have TFX, to build production pipelines
so that we can offer our amazing models to the world.
Google created TFX because we needed it.
And there was nothing already available that could meet our needs.
Google does a ton of ML.
And not just Google but all of the alphabet companies.
There's ML in almost everything we do.
In fact, TFX wasn't the first ML pipeline framework
that Google created.
it evolved out of earlier attempts and is now the default framework
for the majority of Google's ML production solutions.
And now, Google has open sourcing TFX
and making it available to everyone.
And it's not just Google.
TFX has had deep impact on our partners,
including Twitter, Airbnb and PayPal.
As ML developers putting a model into production,
what do we need to think about?
First, when we start planning for developing an ML application,
we have all the normal ML things to think about.
That includes getting labeled data if we're doing supervised learning,
and making sure that our data set covers well the space
of possible inputs.
We also want to minimize the dimensionality of our feature set
while maximizing the predictive information it contains.
And we need to think about fairness.
And make sure that our application won't be unfairly biased.
We also need to consider rare conditions,
especially in applications like healthcare
where we might be making predictions
for conditions that only occur in rare, but important, situations.
And finally, we need to consider that this will be a living solution
that will evolve over time as new data flows in
and as conditions change and plan for life cycle management
of our data.
But in addition to all that, we need to remember that
we're putting a software application into production.
That means that we still have all the requirements that
any production software application has,
including scalability, consistency, modularity
and testability, as well as safety and security.
We're way beyond just training a model now.
By themselves, these are challenges
for any production software deployment.
And we can't forget about them just because we're doing ML.
How are we going to meet all these needs
and get our amazing new model into production?
We don't pretend to have all the answers.
This is an evolving field within the ML community
and we welcome contributions.
If you're interested in a more in-depth discussion
of the challenges of machine learning in production environments,
this is a great paper.
That's what TFX is all about.
TFX allows you to create production ML pipelines that include
many of the requirements for production software deployments
and best practices.
It starts with ingesting your data and flows through data validation,
feature engineering, training, evaluating and serving.
In addition to TensorFlow, itself, we've created libraries
for each of the major phases of an ML pipeline,
TensorFlow Data Validation, TensorFlow Transform
and TensorFlow Model Analysis.
TFX implements a series of pipeline components
which leverage these libraries,
which in this diagram are in orange, and allows you to create
your own components too.
To tie all this together, we created some horizontal layers
for things like pipeline storage, configuration and orchestration.
These layers are really important
for managing and optimizing your pipelines
and the applications that you run on them.
We'll be discussing those more in later episodes.
For now, that should give you an idea of what we're talking about
when we think about implementing a production ML pipeline with TFX.
In our next episode, we'll discuss how TFX pipelines actually work.
For more information on TFX, visit us at tensorflow.org/tfx
and don't forget to comment and like us below
and thanks for watching.
♪ (music) ♪