介紹TensorFlow 2.0及其高級API（TF Dev Summit '19）。 (Introducing TensorFlow 2.0 and its high-level APIs (TF Dev Summit '19))

字幕列表影片播放

[MUSIC PLAYING]
MARTIN WICKE: I'm Martin Wicke.
I'm the engineer lead for a TensorFlow 2,
and I am going to, unsurprisingly, talk
about TensorFlow 2.
TensorFlow has been extremely successful.
And inside of Google, outside of Google,
it has grown into a vibrant community
that really when we started it we were not able to imagine.
And our users are building these ingenious, clever, elegant
things everyday from art and music,
from what you just heard about medieval manuscripts, science,
and medicine.
And the things people created were unexpected and beautiful.
And we learned a lot from that.
And we learned a lot from how people did it.
And TensorFlow flow has enabled all this creativity
and really jumpstarted this whole AI democratization,
which is great.
But it has been a little bit hard to use sometimes
And we know.
At sometimes it's been a little painful.
So using sessions wasn't the most natural thing
to do when coming from regular Python.
And the TensorFlow API has grown over time
and got a little bit cluttered, a little bit confusing.
In the end, you can do everything with TensorFlow,
but it wasn't always clear what's the best
way to do it with TensorFlow.
And so we've learned a lot since we started this,
and we realized that you need rapid prototyping,
you need easier debugging, there's a lot of clutter.
So we're fixing this.
We're addressing these issues with a new version
of TensorFlow--
today, releasing an alpha of TensorFlow 2.
Many of you have participated in the design reviews
that went into this for all the features
that we have implemented.
Those of you who are really living
on the bleeding edge of development
have probably played with a nightly.
You have installed it.
You have tested it.
You have found issues.
You have fixed issues.
So thank you very much for your help.
You can now install the alpha release of TensorFlow 2
using just pip install --pre tenserflow.
And you can go and play with it.
So what has changed?
When we create TensorFlow 2, really the biggest focus
is on usability.
We've adopted Keras as the high level API for TensorFlow.
We've integrated it very deeply with TensorFlow.
Keras gives you a clear path of how to develop models,
how to deployment models.
It has a [INAUDIBLE] API that makes people productive,
and we know that it gets you started quickly.
TensorFlow also includes a complete implementation
of Keras, of course.
But we went further.
We have extended Keras so that you
can use all of the advanced features in TensorFlow
directly from tf.keras.
The other big change in TensorFlow 2
is that it is Eager execution by default. Traditional 1.x
TensorFlow used a declarative style,
and it was often a little bit dissonant with the surrounding
Python.
And in TensorFlow 2, TensorFlow behaves just
like the surrounding code.
So if you add up two numbers, you
get a result back immediately, which is amazing, really.
[LAUGHTER]
But you still get all the benefits
of working with A graph--
robust program serializating, easy distribution
of programs, distributed computation,
optimizations on the graph.
All of that stuff is not going away, it's just getting easier.
So TensorFlow 2 is a major release,
which means that we have the ability to clean it up.
And we did-- like, a lot.
So we had a lot of duplicate functionality
that we have removed.
We've consolidated the API.
We've organized the API.
We've made sure that the API looks and feels consistent.
And this is not only about TensorFlow itself,
this is about the whole ecosystem of tools that
has grown around TensorFlow.
We've spent a lot of effort to try and align
all of these interfaces.
And we've defined exchange formats
that'll allow you to move all throughout this ecosystem
without hitting any barriers.
We have removed a lot of stuff from TensorFlow,
but that doesn't mean that it's not--
it has gotten more flexible, not less.
It retains and expands on the flexibility
we had that enabled all of this development.
And we've created a more complete low level
API that now exposes all of the internally use ops
to the users via the robs module.
We provide inheritable interfaces
for all of the crucial concepts in TensorFlow,
like variables and checkpoints.
And this allows framework authors
to build on top of TensorFlow while maintaining
interoperability with the rest of the ecosystem.
And you'll hear a lot of these examples later.
There'll be talk about TF agents later today.
There'll be a talk about Sonnet by DeepMind tomorrow.
So you'll see examples of how this works in practice.
Now, the real question for you is, of course, what
do I have to do to be a part of this great new era
of TensorFlow?
And we know it's always hard to upgrade to anything.
And that's especially true for major version upgrades.
At Google, we will now start the process
of converting one of the largest codebases in the world
to TensorFlow 2.
And while we're doing that, we'll
be writing a lot of migration guides.
we have started that already, and some of them
are online and will provide a lot of best practices for you.
And if we can do it at Google, you can probably do it too.
So we're giving you and us a lot of tools
to make this transition easier.
And first of all, as part of TensorFlow 2,
we're shipping a compatibility module--
we call it tf.compat.v1--
which contains all of the 1.x API.
So if you rely on a specific deprecated function,
you can still use it, and you can still find it there,
except for t tf.contrib, which is not
going to be included at all.
But we've created a community-supported alternative
that support your use case if you rely on something in there.
We're also publishing a script which will automatically
update your code so it runs on TensorFlow 2.
Let me show you how that works.
Let's say I want to convert this program, which
is a simple example that we have online of a simple language
model trained on Shakespeare.
So what I do is I simply run tf_upgrade_v2,
which is a utility that's included with any installation
of TensorFlow 2.
It tells me what it did.
And in this case, there's really just a handful of things
that it changed.
And you can see, if you look at what changed,
there's some functions that it renamed.
The reorganization of the API leads to renamed functions.
So multinomial was actually the wrong name for this function.
So it renamed it to random.categorical.
That's something the script will do for you,
so you don't have to worry too much about it.
A lot of functions had arguments renamed, their order changed,
or some arguments added or deleted.
As far as possible, the script will make those changes
for you.
And then, if all else fails, sometimes there
isn't really a perfect equivalent in TensorFlow 2
to a symbol that existed in TensorFlow 1
And then, we'll use the compatibility module
that I talked about earlier.
If there is no perfect replacement,
we'll use that in order to make sure
that your code still works as expected after the conversion.
So it's pretty conservative.
So for instance, the AdamOptimizer
is a very subtle behavior change that mostly probably
won't affect you.
But just in case it might, we will convert it
to compat.v1 AdamOptimizer.
Once the conversion is complete and there were no errors,
then you can run the new program in TensorFlow 2,
and it'll work.
That's the idea.
So it should be pretty easy for you.
Hopefully, you won't get trouble.
Note that this automatic conversion,
it will fix it so it works, but it won't fix your style
to the new TensorFlow 2 style.
That cannot be done automatically.
And there's a lot to learn about this process,
so if you want to know more, check out the talk by Anna
and Tomer tomorrow at 10:30 about 2.0
and porting your models.
That'll be worthwhile.
Of course, as we go through this process at Google,
we'll also publish a lot more documentation.
But this talk is the best start.
All right.
So let me give you an idea about the timeline of all of this.
We have cut 2.0 alpha today or yesterday--
it took some time building it.
We're now working on implementing some missing
features that we already know about.
We're converting libraries.
We're converting Google.
There's lots of testing, lots of optimization
that's going to happen over the next couple of months.
We're targeting to have a release candidate in spring.
Spring is a sort of flexible concept--
[LAUGHTER]
But in spring.
After we have this release candidate,
it still has to go through release testing and integration
testing, so I expect that to take a little bit longer than a
for our regular release chains.
But that's when you can see an RC, and then,
eventually, a final.
So if you want to follow along with the process, please do so.
All of this is tracked online.
It's all on the GitHub TensorFlow 2 project tracker.
So I would just go and look at that
to stay up to date if you were looking
for a particular feature or just want to know what's going on.
If you file any bugs, those end up there as well so everybody
knows what's going on.
So TensorFlow 2 really wouldn't be possible without you.
That has been the case already.
But in the future, even more so, we'll need you to test this.
We'll need you to tell us what works and what doesn't.
So please install the alpha today.
We're extremely excited to see what you can create with us.
So please go, install the alpha, check out the docs.
They're at tensorflow.org/r2.0.
And with that, I think you should probably
hear about what it's like to be working with TensorFlow 2.
We'll have plenty of content over the next couple of days
to tell you exactly about all the different aspects of it.
But we'll start with Karmel, who will speak about high level
APIs in TensorFlow 2.0.
Thank you
[APPLAUSE]
KARMEL ALLISON: Hi.
My name is Karmel Allison, and I'm an engineering manager
for TensorFlow.
Martin just told you about TensorFlow 2.0
and how to get there. and I'm going
to tell you a little bit about what we're bringing
in our high level API and what you can
expect to find when you arrive.
But first, what do I mean when I say high level APIs?
There are many common pieces and routines
in building machine learning models, just
like building houses.
And with our high level APIs, we bring your tools
to help you more easily and reproducibly build your models
and scale them.
As Martin mentioned, a couple of years
ago, TensorFlow adopted Keras as one of the high level
APIs we offered.
Keras is, at its heart, a specification
for model building.
It works with multiple machine learning frameworks,
and it provides a shared language
for defining layers, models, losses, optimizers, and so on.
We implemented a version of Keras
that has been optimized for TensorFlow
inside of core TensorFlow, and we
called it TF Keras, one of our high level APIs.
Raise your hand if you used TF Keras already?
OK, a lot of you.
That's good, because this next slide will seem familiar.
In its simplest form, Keras is just that, simple.
It was built from the ground up to be
Pythonic and easy to learn.
And as such, it has been instrumental in inviting people
in to the machine learning world.
The code here represents an entire model definition
and training loop, meaning it's really easy for you
to design and modify your model architectures without needing
to write tons of boilerplate.
At the same time, by relying on inheritance and interfaces,
Keras is extremely flexible and customizable,
which is critical for machine learning applications.
Here we have a subclass model.
I can define arbitrary model architectures,
modify what happens at each training step,
and even change the whole training loop if I want to.
Which is to say, Keras is simple and effective.
Anyone can figure out how to use it.
But we had a problem, TF Keras was built for smaller models.
And many machine learning problems,
including the ones we see internally at Google,
need to operate at a much larger scale.
And so we have estimators.
Estimators were built from the ground up to distribution
and scale with fault tolerance across hundreds of machines,
no questions asked.
This here is the much-loved wide and deep model,
a workhorse of the machine learning
world that took many years of research to define
but is available as a built-in for training and deployment.
Which is to say, estimators are powerful machines.
But you've told us that there's a steep learning curve,
and it's not always easy to figure out
which parts to connect where.
And we learned a lot in the past two years
building out estimator in TF Keras
And we've reached the point that we
don't think you should have to choose
between a simple API and a scalable API.
We want a higher level API that takes
you all the way from [INAUDIBLE] to planet scale,
no questions asked.
So in TensorFlow 2.0, we are standardizing on the Keras API
for building layers and models.
but we are bringing all the power of estimators
into TF Keras, so that you can move from prototype
to distributed training to production serving in one go.
OK, so brass tacks--
how are we doing this?
Well, this is a TF Keras model definition in TensorFlow 1.13.
And this is the same model definition in 2.0.
[LAUGHTER]
Some of you may notice the code is exactly the same.
OK, so what actually is different?
Well, we've done a lot of work to integrate Keras
with all of the other features that TensorFlow 2.0 brings
to the table.
For example, in 1.13 this built a graph-based model that
ran a session under the hood.
In 2.0, the same model definition
will run in Eager mode without any modification.
This lets you take advantage of everything Eager can do for us.
Here we see that our data set pipeline behaves just
like a numpy array, easy to debug
and flows natively into our Keras model.
But at the same time, data sets have
been optimized for performance so that you
can iterate over and train with a data set
with minimal performance overhead.
We're able to achieve this performance
with data sets in Keras by taking advantage of graphs,
even in an Eager context.
Eager makes debugging and prototyping easy,
and all the while Keras builds an Eager-friendly function,
under the hood, that tracks your model,
and the TensorFlow runtime then takes
care of optimizing for performance and scaling.
That said, you can also explicitly run
your model step-by-step in Eager mode with the flag run_eagerly.
Even though for this example, it might be overkill,
you can see that run_eagerly allows
you to use Python control flow and Eager mode for debugging
while you're prototyping your model.
Another big change coming in 2.0 is
that we have consolidated many APIs across TensorFlow
under the Keras heading, reducing duplicative classes
and making it easier to know what you should use and when.
We now have one set of optimizers.
These optimizers work across TensorFlow
in and out of eager mode, on one machine or distributed.
You can retrieve and set hyperparamters
like normal Python attributes, and these optimizers
fully serializable.
The weights and configuration can
be saved both in the TensorFlow checkpoint
format and the backend diagnostic Keras format.
We have one set of metrics that encompasses all the former TF
metrics and Keras metrics and allow for easy subclassing
if you want even more.
Losses, similarly, have been consolidated into a single set
with a number of frequently used built-ins and an easily
customizable interface if you so choose.
And we have one set of layers, built
in the style of Keras layers.
Which means that they are class-based
and fully configurable.
There are a great number of built in layers, including
all of the ones that come with the Keras API specification.
One set of layers that we took particular care with where
the RNN layers in TensorFlow.
Raise your hand if you've used RNN layers in TensorFlow?
OK, these slides are for you.
[LAUGHTER]
In TensorFlow V1 we had several different versions
of LSTm and GRUs, and you had to know ahead
of time what device you were optimizing for in order
to get peak performance with cuDNN kernels.
In 2.0, there is one version of the LSTM
and one version of the GRU layer.
And they select the write operation
for the available device at runtime
so that you don't have to.
This code here runs the cuDNN kernel
if you have GPUs available, but also
falls back to CPU ops if you don't have GPUs available.
In addition to all of those, if you
need layers that are not included in the built-in set,
TF Keras provides an API that is easy to subclass and customize
so that you can innovate on top of the existing layers.
This, in fact, is how the community repository
TensorFlow add-ons operates.
They provide specialized and particularly complex layers--
metrics, optimizers, and so on--
to the TensorFlow community by building
on top of the Keras base classes.
All right.
So we streamlined API--
that's a start.
The next thing we did was integrate Keras
across all of the tools and capabilities
that TensorFlow has.
One of the tools we found critical to the development
of estimators was easily configurable structured data
passing with feature columns.
In TensorFlow 2.0 you can use feature columns
to parse your data and feed it directly
into downstream Keras layers.
These feature columns work both with Keras and estimators,
so you can mix and match to create reusable data input
pipelines.
OK.
So you have data in your model, and you're ready to train.
One of the most loved tools we have for running your training
jobs is TensorBoard.
And I'm pleased to say in TensorFlow 2.0,
TensorBoard integration with Keras is as simple as one line.
Here we add a TensorBoard callback to our model
when training.
And this gets us both our training progress--
here we see accuracy and loss--
and a conceptual graph representing the model we
built layer-by-layer.
As an added bonus, this same callback even
includes full profiling of your model,
so you can better understand your model's performance
and device placement, and you can more readily find
ways to minimize bottlenecks.
Speaking of performance, one of the exciting pieces
we're bringing to TensorFlow 2.0 is
the tf.distribute.Strategy API.
In TensorFlow 2.0 we will have a set
of built in strategies for distributing your training
workflows that work natively with Keras.
These APIs have been designed to be easy to use,
to have great out-of-the-box performance,
and to be versatile enough to handle many different
distribution architectures and devices.
So how do they work?
Well, with Keras you can add and switch distribution strategies
with only a few lines of code.
Here we are distributing the same model you just
saw across multiple GPUs by defining the model network,
compiling, and fitting within the distribution strategy
scope.
Because we've integrated distribution strategies
throughout Keras in TensorFlow, you get a lot
with these few lines of code.
Data will be prefetched to GPUs batch-by-batch,
variables will be mirrored in sync
across all available devices using allreduce,
and we see greater than 90% over multiple GPUs.
This means you can scale your model up without changing code
and without losing any of the conveniences of Keras.
Once you've trained your model, you're
likely going to want to package it for use with production
systems, mobile phones, or other programming languages,
Keras models can now export directly
to saved_model, the serialization
format that works across the TensorFlow ecosystem.
This functionality is experimental
while we iron out a few API details, but today,
and going into 2.0, you can easily
export your models for use with TF Serving, TF Lite, and more.
You can also easily reload these models back
into Python to continue training and using
in your normal workflow.
So that's where we are today in the alpha.
But we're obviously not done yet,
so let me tell you a little bit about what's coming up
in the next few months.
We talked about distributing over multi-GPU,
but that's just the start.
Using the same model code, we can swap out our strategy
and scale up to multiple nodes.
Here we take the same Keras model
we just saw and distribute it across multiple machines,
all working in sync using collective ops
to train your model faster than ever before.
Multi-worker mirrored strategy currently supports ring
and nickel for allreduce, which allows us to achieve great
out-of-the-box performance.
This API is still being developed,
but you can try it out today in the nightlies
if you are interested in a TensorFlow-native solution
to synchronous multi worker training.
And we are very excited to say that using Keras
and distribute strategies you will also
be able to use the same code to distribute your model on TPUs.
You'll have to wait for the next TensorFlow release for this
to actually work on TPUs.
But when it's released, this strategy
will still work on Google Cloud TPUs
and with Colab, which can now access TPUs directly.
And as we approach the final 2.0 release,
we will continue bringing scalability into Keras.
We will be enabling distribution with parameter service
strategy, which is the multi-node asynchronous
training that estimator does today.
We will also be exposing canned estimator like wide and deep
directly from the Keras API for those of you
who want a higher level higher level API.
And we will be adding support for partitioned variables
across many machines for some of the largest scale models,
like those we run at Google.
And that's just a sneak peak.
So please do check out the 2.0 preview
and try out TF Keras if you haven't already.
If you have more questions, you'll
be hearing more throughout the day about Keras and high level
API and how they integrate across TensorFlow.
And we also have a workshop tomorrow that Martin mentioned.
[MUSIC PLAYING]