TensorFlow內部：函數，而不是會話 (Inside TensorFlow: Functions, not sessions)

字幕列表影片播放

ALEX PASSOS: I'm Alex Passos, and I'm
here again to talk about functions not sessions.
This function is the new way of using graphs
in TensorFlow in TF2.
All the material I'm going to cover here,
the design and the motivation, is mostly
described in one of the RFCs in the TensorFlow community GitHub
repo.
So if you go to GitHub.com/tenso rflow/communities/rfcs,
you will see on our see an RFC with exactly this title
in there where we go for a bunch of the motivation and a bunch
of the high-level design.
So here I'm mostly going to focus
on some nitty gritty details of the motivation
and more details about the implementation.
And things that if you're working on TensorFlow
and you're using functions to do something
or you're curious about function internals,
I hope to at least point you to the right
places to start reading the code to understand what's happening.
I'm mostly going to focus today on the high-level Python
side of things.
And there's another training session later.
I think the title's going to be eager execution runtime.
That's going to focus more on the C++ side of stuff.
So I think to understand functions,
it helps if you understand where we're coming from,
which is the session about run world in TensorFlow one.
And I think in TF1, when TF was originally designed,
it was designed as a C++ runtime first and only later came
a Python API.
And as far as a C++ runtime goes,
the API of graphs and sessions is pretty reasonable.
So you build a graph by some function
that the runtime does not care about,
and then you connect to the runtime
by opening this session.
This connection is important because a runtime can
be local, can be distributed.
There are all sorts of in between things.
And to actually run computation, you just call session at run.
Because you have a graph, you give it
the names of your inputs, the names of your outputs,
the names of particular nodes that you want to run.
And the runtime will go do its thing and return to you
the results in C++ normal arrays that you can use to manipulate
your data.
So this is very nice and convenient if you're writing
in C++ and if you're programming at this level.
You generally write the code that looks like this once,
and you spend your entire life as a TensorFlow developer
writing the little part that I abstracted out
called BuildMyGraph.
And I think it's an understatement
to say that just manually writing protocol buffers
is very awkward.
So we very, very quickly decided this is not a good way to go
and built an API around it.
And the first version of the API was very explicit.
So you created a graph, and then every time you created an op,
you pass the graph as an argument,
and this is still fine because it's very explicit
that you're building a graph.
So you can have this mental model
that you're building a graph that then you're
going to give to a runtime to execute.
This is not really idiomatic Python code.
So it's also very easy to see how to make
this idiomatic Python code.
You just stick the graph in a global context manager
and add a bunch of operator overloads and things like that.
And you end up with code that looks
like what TensorFlow code looks like today, which
is a little unfortunate, because the same code, by reading it,
you can't really tell whether an object is a tensor,
and hence, only has a value doing
an execution of a session, and is
this the third quantity et cetera, et cetera, that
has a name that might have some known
properties about the shape, but not all.
Or if this is just a normal Python object or non-pi array.
And this creates a lot of confusion
and I think leads to a very unnatural programming model.
The session.run thing also has a granularity problem,
which is that in the way it was originally built,
the graph, like, the stuff that you pass a session of run,
is a quantum of all the stuff you want to execute.
And around it is this very rigid boundary
where you keep stuff in the host memory of your client program,
give it to session.run, and then get results back
into host memory of your client program.
So one example that I think is illustrative of why this is not
ideal is if you have a reinforcement learning agent
that's implemented over a recurrent neural network,
in that scenario, your agent's going to run a loop where it's
going to read an observation from your environment, which
is some arbitrary code that runs in your host,
and has some state.
The states initialize at 0.
They look at the observation.
It runs it for a neural network.
And that neural network spits out a new stage and an action
for the agent to perform in the environment.
You take that action, bring it to client memory,
give it to the C++ code for an environment, your Atari game,
or whatever.
That will run for a while and then give you back a new state.
You want to shift this new observation,
you want to ship this new observation in the old state
back to the RNN.
But if your RNN is running on another device, say, a GPU,
there was really no reason for you to ship your RNN state back
to your client and then from the client back to the device.
So the boundary for stuff you want to run here
is not really working.
The boundary for stuff you want to run
is not the same as the boundary for stuff
that wants to live in on a device
or wants to live in the host.
And this gets even more complicated
once you put automatic differentiation into the story,
because TensorFlow uses the symbolic representation
for your computation that we call a graph.
We do automatic differentiation on
this symbolic representation.
So now the graph not only has to be a quantum for stuff
you want to run, but it has to be a quantum for stuff you
differentiate.
So if you stick to this reinforcement learning agent
example, a popular thing that people
used to do before we have now substantially better
deep reinforcement learning algorithms is policy gradient.
And the simplest policy gradient, it's got reinforced.
And what it amounts to doing is it will run your agent
for your m time steps.
You'll get a probability for the agents
to take the actions that it actually took.
And you'll take the gradient of that probability,
multiply by the [? reward ?] your agent got,
and apply this to the weights.
And now not only do we want to avoid transferring the RNN
state back and forth between the host and your accelerator,
but also you want to back prop through a number of steps
that might not even be known before you
start your computation.
Another issue is that session.run has a kind
of like--
it asks for too much information every single time.
So what every training loop or inference loop or anything user
TensorFlow looks like is not--
well, not every, but what most look
like is not a single culture session.run,
but a bunch of culture session.run in the loop.
And in all those calls, you're executing the same tensors,
you're fetching the same tensors,
and you're feeding the same symbolic tensors slightly
different numerical values.
And because the session.run API doesn't
know that you're going to be calling
those things in a loop where most of the arguments-- where
some things don't change and some things do change,
it has to re-perform a bunch of validation.
And so we put a cache in front of that validation.
And a cache key becomes a performance problem.
Derek had the idea of just separating the stuff
the changes from the stuff that doesn't change
into this session.makecallable API
where you call it once with this stuff that doesn't change,
and you get back a function that you
call with just the stuff that changes.
So now all the validation that you're performing n times
is off the stuff that changed.
And the validation that you're performing only once
is of the stuff that stays the same.
This is not just a performance win,
but it's also kind of a usability win,
because just by looking at the call to your code,
you know what is fixed and what is variant.
And finally, the last like, very awkward thing about session.run
is that graph pruning is a very complicated model to program
to when you're writing in an imperative host programming
language.
So for example, I have my first function in there
where I create a variable, I assign a value to it,
I incremented a little bit, and then
it returns something that uses the variable time
some constant.
And if you just write code like this, because I didn't look
at the return value of a [INAUDIBLE] assign add,
that assignment will never happen.
Like, there's no way to make that assignment happen
in TensorFlow because you created a tensor
and you threw it away, and you did not
keep a reference to it so that you can session.run it later.
And you think well, that's crazy.
Why don't you just keep those references under the hood
and do something magical to fix it?
And the problem is that it's very easy for you
as a user to rely on the fact that this pruning is going
to be performed to try to encapsulate
your code a little better.
So design pattern that I've seen a lot
is that when you have some structure--
so for example, my fn2 there has a reinforcement
learning environment.
And that nth object is some complicated Python thing
that knows how to build a bunch of graphs.
And you can encapsulate that in a single function
in your code that returns how to get the current observation,
how to apply an action, and how to reset that environment.
So your code is now very concise,
but in practice, you have a function
that returns three things.
And you never want those three things to run together.
You always want, at most, one of them
to run at any point in time.
So this is a little frustrating, because we've
kind of locked ourselves out of being able to fix this problem.
And TensorFlow has a few partial solutions to this problem.
I think the most comprehensive [? parts ?] solution
to the problems in session.run is called a partial run.
But it's inherently limited, because it requires you
to have a fully enrolled graph.
It does not work with arbitrary control flow.
And it requires a complicated dance of like,
specifying everything you're likely you're
going to fetch in the future, then
the things you're going to fetch now,
and keeping them, passing tensor handles around.
And it's very, very easy to make mistakes when you're doing it.
Plus, what happens is that you as a user often
writes a Python function.
TensorFlow then runs the function to create a graph.
Then we take a graph, we validate it, we prune it,
we do a bunch of transformations.
And we hope that what we got out to run
is exactly the nodes that you had
intended to run in that Python function in the first place.
But because we have all these steps in the middle,
it's very easy to drop things and confuse things.
So all these usability problems are
inherent I think, to coupling this session run
API with this host programming language that
tries to make your code look very imperative,
like native Python code.
And so the way we break this and solve those problems
is with tf.function.
So what are the core ideas of tf.function?
It's that your functions inputs and outputs,
they live on devices.
They don't have to live on the host.
Another thing is that a function is differentiable,
and a function is an execution unit.
But it's not forced to be the whole execution
unit or the whole differential thing.
Like, you should be able to differentiate from many calls
to an execution.
And you should be able to make an execution unit out
of many functions.
So this way you get to break that like, single quantum
of work requirement of session.run
and be able to write your programs in a more
idiomatic way.
AUDIENCE: Could I clarify the first point?
I assume device also includes CPU device?
ALEX PASSOS: Yes, it also includes CPU.
AUDIENCE: --host memory.
ALEX PASSOS: It lives on host--
AUDIENCE: [INAUDIBLE].
ALEX PASSOS: No.
It lives on the host memory.
But it's not immediately accessible to your client
program.
Like, the act of running a function does
not immediately-- does not require
that its inputs are visible to the client program,
and does not immediately make its outputs
visible to the client program.
To make the outputs visible, you need to run an operation.
And to make the inputs--
to ship them into the runtime, you need to run an operation.
So we put the boundaries not at every call,
but at when you want-- when you have the data
and when you get the data.
Another fundamental property of tf.function
is that the notion of what should run
is not a property of control edges or graph pruning,
but is a property of the stuff that happens
in the body of the function.
So while we trace the Python code to build the function
graph, any stateful operation that ends in there must run.
And it's up to the TensorFlow runtime to run those operations
in an order that is indistinguishable from running
those operations in order as far as the user is concerned.
And finally, every bit of state that outlives the function call
should be an argument, either an explicit argument passed
by the user from Python, or an argument that's implicitly
capture like, closure capture or something like that that's just
passed to the runtime.
And by making all the state [INAUDIBLE]
function call an argument, we get to enforce property three
without too much complexity.
And incidentally, once you've looked at those requirements,
you kind of see why we have eager execution.
Because once you have the ability
to run functions like that, really, every single operation
should act like a function if you just run it,
which is why I think your execution is
important as a mental model, even if in practice, you
might want almost all of your code to live instead
of a graph for performance or for the playability to light
run times and things like that.
So once you do go that way, there
is a problem, because you now can
take an arbitrary piece of TensorFlow code
and run it eagerly or run it inside a tf function.
And this means that any semantic difference between those two
modes is going to break the abstraction barrier,
and can cause a lot of lost productivity, which
is why we want to do things like autograph, which is why we want
to do things like automatic control dependencies,
and try to have TensorFlow work as hard as it can to reduce
those differences and make them explicit
and raise errors as soon as we can so that we don't get--
lock ourselves out in a bad stage.
And the really big caveat, that is the easiest place for you
to make mistakes is when it comes to variables,
because variables in tfv1 is--
if you've watched the previous video, you should know--
they behave very differently from how
you'd expect a variables to behave in an eager way.
So this is one of those places where naively writing code can
very easily get you in trouble even with code
that looks very reasonable.
And my favorite short example is this, a function
that creates a variable and returns an operation that
uses the value of the variable.
If you run this into tfv1 or in graph mode, what will happen
is you will run this Python code once,
and then you'll call a session.run
on the result many times.
And moreover, as a side effect of running this code,
that variable is going to be added
to some global graphical action that
will allow you to modify its value later,
even though it's completely scoped inside this function.
So in tfv1, you run this function,
it uses a single variable, and you get to modify its value,
and then call session.run on the result
and get different numbers out.
But in eager mode, every time you run this function,
we create a new variable, do a matrix multiplication,
and throw it away.
So if you have code like this, your code
is something that's going to visibly see
the differences between TensorFlow v1 and TensorFlow
v2.
So we just disallow this.
And there are a few reasonable options
we could do with tf.function, which
is to say that tf.function should
follow the eager semantics for variables.
So when you have code like this, we
will insert a create variable op, insert a matmul,
insert a destroy a variable op, and then return
the value of the matmul.
We could also choose that tf.function
is going to follow v1 graph semantics for variables.
And every time you create a variable,
we reuse it based on the name or something like that.
These are very easy options to implement.
A third option that also very easy to implement
is just disallow creating variables in tf.function.
These are reasonable options relatively
straightforward to implement, and not what we chose to do.
What we choose to do is a compromise
that would allow us to turn more code into tf.functions
while avoiding allowing code that would behave differently
in Eager and in graph mode, and while avoiding breaking
the expectations of code that was written with tfv1 in mind
and then got wrapped into tf.function so it
works inside tfv2.
And the compromise we adopted is that if you just
try to create a variable inside of tf.function
that's disallowed and it create an exception.
However, if you guard your code so that the variable is only
created the first time the function is called,
we allow it.
And the reason why we allow this is
that if you run a function like the one in the bottom eagerly,
it'll create a variable a single time, use it many times.
And that will have exactly the same semantics
as just calling session.run many times on the result
of the function on top.
So this way by like, writing your code
in a way that promises that you will respect the semantics,
that you'll act in a way that does not
see the difference in semantics between eager and graph,
we allow it.
Right now we have a relatively big hammer
to detect these failures.
But I expect that over time, we'll
make this a little more precise, and we'll start
allowing more and more code.
So for example, an issue now is that if you create
a tf.function, and you pass something
like an optimizer as an argument to the function, trace it once.
The first time you use an optimizer,
the optimizer might create some variables for the internal
state of [? adam ?] or [? adigrad ?] or momentum
or something like that.
But now if you pass a different optimizer,
this different optimizer might try
to create its own variables.
And this is again, perfectly safe code,
because they're passing the optimize
as an argument, which means we're retracing the function.
So we're capturing the graph creation.
But currently, as of the recording of this video,
we raise an exception.
It's an exception to we can probably
stop raising if we're a little more precise.
And I'd like to do this at some point.
But the idea is that I want to expand the scope of code that
we allow that creates variables in tf.function until it
encompasses--
as long as it still encompasses code
that behaves the same in eager mode and in tfv1.
Just because this way there is less of a mistake.
Once there is no more tfv1 code out there,
then I think we can flip the flag
and allow all variable creation inside tf.function
with fully eager semantics.
But that's going to take a while.
AUDIENCE: How do you detect like,
if the code is in a disallowed format?
ALEX PASSOS: Ah, I will actually go for a slide of that later.
But essentially, what we do is we run it twice.
We run it once, see if you've created any variables.
If you haven't created any variables, you're safe.
But every time-- if you have created some variables,
retrace your Python code again, see
if it creates another new variables
and raise an exception.
And we also set it up that every time we ever need
to retrace your tf.function, if you do create variables
on a subsequent call, we'll raise an error.
Another issue is that the Python is a very dynamic language.
While TensorFlow graphs are very static--
TensorFlow graphs are not as statics as ZLAHLO graphs,
but they're still very static when
it comes to the types of things and the number of outputs
of an operation.
And sometimes also TensorFlow graphs
are static when it comes to shape in
that if you run the same graph building code, with input
tensors that have slightly different shapes,
we will generate different graphs.
Where the more about the shapes, sometimes they're
specialized to generate a faster graph that
knows more information statically, or can
do some assertions and validation statically, instead
of having you do them at runtime.
So tf.function has a choice, which
is to either trace a function once and raise an error if we
think you're calling it with arguments that
are incompatible with the argument
that we use to trace, or accept that we'll likely need to trace
the function many times, and set a policy for how we do this.
We chose to go with option two.
And the policy mostly looks like we use Nest--
the tf.Nest library to unpack your inputs.
Once you've unpacked your inputs,
we kind of split them into Python objects and tensors.
We replace the tensors by TensorSpec,
which is the [INAUDIBLE] in the tf public API.
And it just has a shape, a D type, and a name.
And then we re-morphed the thing into a structure,
and use that as a key into a dictionary.
AUDIENCE: What's the name here?
ALEX PASSOS: If you are in eager mode,
there is no such thing as a name.
But if you're building a graph, we do look at the name to--
the name of a tensor in graph mode, just
to try to preserve a little more information.
So we use this whole structure as a dictionary key.
We actually have to not quite use the exact structure.
We've replaced list with tuples and dictionaries
of lists of pairs, and a few other things
just to make sure that Python doesn't yell at us
we're trying to put unhashable things in a dictionary.
But the idea from a mile away, is that anything
that you pass through a tf.function that is a Python
object, if you change that in a way that a Python
dictionary would notice, that will trigger a tf.function
retracing.
If it's a tensor though, we do explicitly do not key
on the value of the tensor.
We key only on its shape and type.
And retrace only when the shape and type changes.
This is a little controversial, because there
are some types of Python values, specifically
scalars and non-pi arrays, that you
might want to treat as tensors.
And this is a decision that we might
want to revisit at some point, because it
leads to a lot of problems for our users.
But for now, we're conservative and retrace
when you change the identity of Python values.
So as I was saying, there are some downsides of this choice.
And the two biggest ones--
or the first, is that we do too much retracing,
as I just mentioned.
And the second one is that shapes are hard.
Specifically, the more we have static shapes,
the more efficient we can make our graphs.
And in practice, due to the hardware that we use
and the way we write, you kind of want to have your code--
your graphs mostly have static shapes in them
so that we can be as performance as possible.
However, it's often very convenient
to have things like a dynamic batch size
or a dynamic sequence length.
And those things might not even incur very large performance
penalties.
On GPUs, for example, dynamic batch sizes and static batch
sizes, they tend to consume about the same amount of time,
not necessarily on TPUs.
However, if we try to relax the shapes as you
call the function, retracing your code with a shape
that partially in normal [INAUDIBLE] dimensions
might call some graph building code to explode.
So we have an optional way for you to--
we have a few ways for you to control this now.
And we'll try to refine the default
policy to make it better.
But you can always choose the extremes.
So we give you essentially, three knobs control tracing.
So if you have a tf.function that you
build over pi function, if you want to force it to retrace,
you can always build another tf.function object.
And two separate tf.function objects share in those state.
This is a cheap way to force a retrace.
This gets you around the limitation
of not creating variables.
It gets you around the shapes and things like that.
You also have the flip side of this,
which is to prevent retraces, you have two options.
One is you can call get concrete function on the tf.function
object.
On that you pass a signature.
And you get a function back that you
can call with that signature.
And it will specialize on the particular properties
of that signature.
Or you can pass a signature when you define your tf.function.
That also works.
And finally, you have an experimental knob
whose behavior is likely to change in the future,
that if you set it to true, will try to relax shapes for you.
So if you know you're likely to have shapes with dynamic batch
size or dynamic sequence lane, and a few other cases
where running in your graph building code with partially
unknown shapes is perfectly fine,
then you can set this true and enjoy fewer re-tracings.
We might need to add newer knobs.
But I think our policy--
I think this is mostly fine.
And we'll iterate and refine the existing tracing policy
to make it better.
AUDIENCE: [INAUDIBLE] tracing happen
when you call the function the first time or every time?
ALEX PASSOS: We try to trace as little as we can.
So the first time you call it, we clearly have to trace.
There is no alternative, because you don't have a graph.
The second time we call it, if you call [? them with ?]
tensors with the same Python objects
and if tensors of compatible shapes and types,
we will not retrace.
But if you change the shapes and types of tensors,
then we're likely to have to retrace them.
AUDIENCE: Question.
So when the trace cache key, does it
include a global variable access [INAUDIBLE] function, or just
[INAUDIBLE]?
ALEX PASSOS: We do not put in the cache key
that variables access by the function
because I don't know how we would check this
without running the Python code.
AUDIENCE: So which means it may use
a change in a type of [INAUDIBLE] we
accessed [INAUDIBLE]?
ALEX PASSOS: Yes.
If you any kind of relying on Google Python state that
is not an argument to the tf function
might lead to breakage.
Yeah.
The Python is a funny language, because you can actually
check this.
You can take a Python function.
You can get the transitive closure of all the modules
and objects it has access to.
The problem is that this ends up being in the thousands
and thousands of symbols.
So we can't feasibly check whether the value
of any one of those has changed between function executions.
So this is kind of best effort.
And again, if it's a little bit of a caveat.
So if you have global state that you
want the tf.function to depend on,
put that representative state of a tf.variable.
Because if you change the value of a tf.variable,
a tf.function will see it.
AUDIENCE: Another question-- so in the actual tracing
is [INAUDIBLE] by running that function.
Certain functions have a side effect, say [INAUDIBLE] file.
Can this be executed twice?
ALEX PASSOS: Then again, if you want your side effects
to happen as the function is--
your Python code is only going to get
executed of all the tracing.
So if there are side effects that you care about,
you should make those side effects BFT side effects.
So don't use Python's file writing, use tf's file writing.
Don't use Python's file reading, use tf's file reading.
Don't use Python random number generations, use tf [INAUDIBLE]
number generators.
In general, anything that you try to make a tf thing,
is the way to make it work reliably tf.function.
And part of this is due to Autograph.
I'm not going to talk about Autograph here.
There will be another separate training talk on it,
because it's full of like, very interesting and cool
little bits.
AUDIENCE: I don't get how using the [INAUDIBLE] version
of these APIs [INAUDIBLE] side effects [INAUDIBLE]..
ALEX PASSOS: If you use a tf version of this API,
like, if here's a tf thing to write to your file
or to generate a random number, if you run it in graph mode,
we don't do anything.
We just create a symbolic expression that when evaluated,
will have the side effect.
AUDIENCE: So it actually doesn't execute it?
ALEX PASSOS: It doesn't execute it in graph mode.
In eager mode, it will execute it.
But in graph mode, it just builds a graph
that when executed, will have the desired side effect.
AUDIENCE: So just to [INAUDIBLE] decide
to use Python [INAUDIBLE] It's undefined behavior,
essentially.
ALEX PASSOS: Yes, it's undefined behavior.
And if you want to define it, you
need to control how often you trace the function.
You can choose also to force Python things to happen
using tf.pi function or tf.numbyfunction, which
will run Python code at function execution time
by explicitly delineating the Python code
that you want to be dynamic.
This has some limitations though,
because we're not very good at shipping Python code from one
host to another host.
So in general models that rely on pi function or non-pi
function, they're not serializable
and they do not run well in distributed settings.
So how do we make this work in practice?
And I think here, I want to give you
a walk-through of interesting pieces in the code,
some screenshots, some like, lightly rewritten
for readability bits, so that you
know what things to look for if you
want to understand or change the behavior of tf.function.
So the first structure that I think
is particularly interesting is the fun graph.
And the fun graph is a subclass of the TensorFlow graph
that overrides a lot of interesting behavior.
It's where the code to do automatic control dependencies
lives.
It's where the code to do Autograph lives.
It's also where we do closure capturing.
And closure capturing is maybe the most interesting part,
because in normal TensorFlow graphs,
you try to use a value from outside of the graph,
you immediately get an error.
But with functions, most programming functions,
you expect to be able to use values
from the defining context.
So fun graph has some logic to do that.
And it has two capturing modes, capturing by value
and capturing by reference.
By default, we capture by reference.
But we can turn capture by valuing off.
And the way we do this is that when
you try to create an operation in the graph,
we look at the inputs and capture them if we have to.
The way this is done is by creating a placeholder that
has the same shape and D type as the tensor
that you're trying to capture, and storing inside the fun
graph a map from the tensor that we captured
through the placeholder that we created,
so that later when we call the function,
we feed in that tensor as the value for the placeholder.
We do this for every external value.
Like, we do this with constants, for eager values,
for graph values.
The way this is setup, capturing a value
is something that is visible to the gradient code.
So you can differentiate a function with respect
to its invisible variable captures.
There's a lot of like, little subtle issues
there to get right.
But the general idea is that we're
going to create placeholders of the proper shape and type.
And at function call time, we're going
to try to pass the original argument as the placeholder.
And the nice thing about this is that the point where
you try to pass the vision argument
is a placeholder, if that is happening inside another fun
graph, and the original argument does not
belong to that graph, that will recursively
trigger a capture, which will recursively trigger a capture.
And this way will properly propagate value capture
throughout functions that call functions that call functions
that call functions.
And all of those are going to correctly handle
differentiation and a few other things.
So it should be mostly seamless.
Another thing the fun graph does is
it has the core code to take your Python function
and build a fun graph from it.
As you can see, it has many, many options.
There are lots all sorts of ways you can control this.
You can override shapes, you can capture by value,
you can add automatic control dependencies,
you can do Autograph, you can pass a signature.
But this is like the general workhorse
that we use every time we're chasing the Python
code that you pass this to create a tf.function.
And one of the particularly important things that it does
is do automatic control dependency.
And what this is trying to do is enforce that inside tf.function
program order is execution order as far as the TensorFlow
runtime is concerned.
When we also try to do this in a way that does not
harm performance-- but if you remember last week,
the video on resources and variables,
we're moving to a model in TensorFlow
where all stateful ops manipulate
an explicitly named resource.
And what this lets us do is that the first version
of the automatic control dependencies code was this.
Now it's a lot more complicated than that
because it tries to handle ops that
are stateful that do not have an explicitly declared resource.
And it tries to handle control flow v1.
So it's far messier.
But essentially, all you do is you
iterate over all your ops in the graph,
look at every input of an op.
If an input is a resource, you just
add a control edge from the last stop that
used this resource to this op.
And you overwrite a map to make this work.
And finally, you just return all the
like, at the bottom of the function,
for every resource, the last op that
was supposed to have used it so that we
can make those operations control outputs of a function.
And these control outputs are important because we can then
tell the TensorFlow runtime not to accidentally prune the side
effects that we intend to happen.
And so if you have operations in a function that an output does
not depend on, and that no side effects depend on,
we know that we can prune those.
And this also means that as we move
to a model where TensorFlow is more compiled instead
of interpreted, the fact that these controlled
dependencies enforce a linear order,
means it's relatively easy to--
well, easier to take TensorFlow code
and turn it into some intermediate representation
that will feed into a compiler, which you might hear about
in the MLIR talk.
So this just clarifies the semantics
and removes a lot of weird behaviors
you can get with tfv1.
And as you can see, just because we are tracing--
what this assumes is that code that--
we just want to execute code in program order.
And that's a far, far easier thing
to guarantee than a complicated pruning and partitioning
process like the one we have to rely on in tfv1.
The fun graph though is a Python only construct.
And to actually turn this fun graph into something
that we can execute, there's a thing in the CAPI
that takes up tf graph and generates a tf in the score
function out of it.
It again, has a bunch of options so that you control
your inputs, your outputs, you can uniquify things, et cetera.
But if you're looking for where do we turn a TensorFlow
graph into a function, this is the entry point
that you want to look at in our CAPI.
However, now we need to call those functions
that we've created.
And technically, you can call a function
just like you would call any op.
Once you've registered a function
with a intergraphs function library
or into the eager context, you can just
use TFE Execute or TF NewOperation
to define a function call, just like you would
define any operation execution.
Under the hood, this is going to use an op named
StatefulPartitionedCall.
So if you look at the source code for that operation,
you will see the place where we partition the function graph
into multiple devices, in case you
want to run your function over multiple GPUs or TPU cores.
It's also where we run a lot of [INAUDIBLE],, which is--
I hope there will be a separate training talk
about [INAUDIBLE],, but it's this cool thing
that runs all sorts of graph optimizations
and back-end specific rewrites and things like that.
And the place in the Python code where
we take you trying to call a function,
and we turn it into this partition call,
is this class called on underscore
EagerDefinedfunction.
So if it's search for that in TensorFlow source code,
you can try to read like, how do we exactly set up the function
call not correctly handled things
like, things that were captured, correctly handling gradients,
and things like that.
Differentiating functions then, is
built on top of the EagerDefinedFunction.
And the idea is that if you have a function that you want
to call, and that function's differential,
we need to generate three things under the hood.
One is what I call an inference version, which
is just a thing that runs the function
and returns the results.
That's what you're going to call if you're not trying
to differentiate though it.
But because we do a reverse mode automatic differentiation
in TensorFlow by default, to differentiate a function,
we need to run a backwards version of it.
And the backwards version might need to use intermediate values
from the forward pass.
So what we do is we generate a clone
of your inference function that returns
all the intermediate values that the grading code is
likely to need, and then we make another concrete function,
which is the backward thing.
And the interesting thing is that the forward
and the inference is just defined functions.
There are these things that just can call themselves.
But the backwards thing is a concrete function.
So it also has the [? diff ?] code,
which means you get to differentiate
the gradient of a gradient of a gradient, of a gradient,
of a gradient, of a gradient of a function, because it recurses
in that direction.
So we get a closed system for automatic differentiation,
which is really important, because as machine learning
research moves forward, generally,
more and more algorithms end up relying
on limited forms of higher order [INAUDIBLE]..
One thing that you might think about right
now is that a particularly important feature
to a lot of people who rely on reverse model to diff
is to not keep all the intermediate state alive
in between the forward and backward pass.
And you could make this a feature of tf.function,
but I think it should be implemented separately, built
on top of tf.custom gradients since this is just
another customization of the differentiation code.
And it's very easy to do this generically.
No, it does not depend on tf.function.
And this is being added to the v2 API
now, even though it exists in [INAUDIBLE] for a while.
So the rematerialize recompute gradients thingy
is not going to go away.
But it's also completely orthogonal [? to ?]
tf.function, which makes me a little happier,
because smaller pieces that compose tend to be nice.
And here, on top of this differentiation code,
we have the code that goes from an abstract function
to a concrete function, which is the thing that
does the function cache.
And here we start getting into a little bit of cruft,
because for a while, the tf function code base was
in Contrib Eager .
[INAUDIBLE]
And so it had this class function.function that did
the--
going from the cache key.
And the difference between the Contrib Eager [INAUDIBLE]
and the tf.function we have today,
is that we have fixed the story around variables
to be much more well behaved.
So if you look in our code base, you see the function of Py file
has a function class which does the cache key logic.
And the cache key is mostly implemented in C now.
There's this CAPI called TFE Pi Encode
Arg that takes a bunch of arguments,
replaces tensor with TensorSpecs,
and uses this to form additional key.
We did this in C because a little faster than doing it
in Python, and gives us a little more control
over how we handle lists and dictionaries
and things like that.
If you're interested, that's where you'd want to see it.
And finally, the last bit of the pile of complexity
that we've been looking at, is how we do the variable lifting
and initialization.
And this is handled in another class,
confusingly and also named function,
but in the different file called DefFunction.function.
Internally, it calls the other function class.
So at least we have proper layering.
But we do need to clean up the naming
of these things a little bit.
And the variable lifting is a little tricky.
So I'll give you a quick walkthrough of how it works.
First thing is that we define our own type
of variable, which is a subclass of the normal tf variable.
And the idea is that when you create this variable,
it inserts in the graph a control flow conditional, where
if it's not initialized, it initializes the variable.
This way, because the graph at table creation time
has this conditional, by the time
you get to use the variable, you've already passed this.
So if you run this graph, you're guarantee to only initialize
this variable once.
And now I see its value initialized.
But this is a little sad, because if you've
used TensorFlow for a while, you know that [INAUDIBLE]
is somewhat expensive.
So what we do is we try to--
the thing that I said earlier, we trace your code twice.
First, we trace of a thing that captures all the variables that
are created.
These are variable capturing scopes.
I went over them a little bit on last week video.
And the second time we trace, we trace of this scope
where if you try to create a variable, we raise an error.
And this is how we control the policy for only letting
you create variables once.
And now that we have these two versions of the function, one
that we call Stateful, one that we call Stateless FN,
we can put a cond in there where if all the variables are
initialized, we can call the one that does not
do any of the complicated computation,
while if any of the variables is not initialized,
we have to, for safety, call the function that has
all the complicated bits.
But now that we've built this monstrous graph that
has your function inside of it twice, one with conditionals,
one with [INAUDIBLE] conditionals,
the whole thing inside a conditional, ideally,
we would never want to execute this,
because conditionals again, are expensive.
And they have this particular property
where you pay for the nodes that don't
get executed in the current version of the TensorFlow
runtime.
So what we do on top of that is we
try to lift the initialization code out.
So we look at every variable.
And we call this lift to graph thingy,
where we try to copy into a separate graph
all the initializers of all the variables.
And this copy is set up in a way that
erases an exception we control if we ever
find a variable whose initializer depends
on the value of another variable,
or depends on a function argument,
or depends on something that we can't cleanly isolate.
So for the common case, where your variables are all
independent of each other, we don't actually
run any of those complicated graphs.
We just run the simple graph, because we
can run this initializer once.
It's only if we cannot live the initialization graph,
that stuff breaks.
And this lift to graph thing is actually a pretty nice
internal TensorFlow library that you
can use to manipulate graphs.
You give it some tensors, a graph, some sources.
And it will walk the grapher back, and copy all the things
that you need from that tensor to your target graph,
and return a map from every source
the target tensor of the copies it did.
So you can use this map to run things in a target graph
as if they were in the source graph.
So this is mostly it about the Python level runtime for code
for tf.function.
But I'm not going to talk about the TensorFlow runtime today.
This is for another meeting, because we're mostly
running out of time.
So any remaining questions before we go away?
AUDIENCE: One question on performance.
Let's say we have tf graph.
It's got multiple [INAUDIBLE] matches so I
can run different subgraphs.
They have a lot of overlapping.
Now if I compare that to a set of independent functions,
things like the size of the whole model will go up.
[INAUDIBLE]
ALEX PASSOS: So in principle, the size of the whole model
would go up.
But if you were to convert this to [INAUDIBLE] tf.function,
hopefully you'd convert this to functions that call each other.
And how with normal programming language code,
as soon as they have functions that call each other,
you can control the complexity of your code,
especially if you have calls to the same function repeated.
And you can end up with a smaller side.
So as far as the finding goes, in practice, using tf.function
often leads to smaller graphs, because you often
end up calling the same function multiple times.
At execution time, we can inline those functions
for performance.
I think right now we essentially always
in line will form tf.functions.
So if you are really calling like,
the power set of the nodes in your graph,
then you would see an explosion on the things.
But in most of the models I've seen, we can avoid this.
As far as I can tell, the performance overhead
for using tf.function now comes from the fact
that if you're creating variables,
you need to trace your code at least twice
and generate all those extra conditionals.
And this is something that I think
with a little more engineering, we
can make it only happen when it's strictly necessary instead
of always happening, and the back off
optimizations being optional.
If we have no more questions, then I think we're good.
Thank you.
[APPLAUSE]