TensorFlow內部：AutoGraph (Inside TensorFlow: AutoGraph)

字幕列表影片播放

DAN MOLDOVAN: Thank you all for coming.
My name is Dan Moldovan.
And today I will talk about some of the internals
and functionality of AutoGraph.
Now, this is definitely not an introductory talk.
And if you would like to learn more about the background
or the motivation behind AutoGraph,
here are a few resources that I believe can help.
The talk will be otherwise fairly fast paced, quite dense.
I'm hoping we'll be able to get through all of it in time.
But if not, I'm hoping the slides
will be able to serve as a good reference,
should you decide to come back and look at everything more
closely.
I should caution, though, that I am
oversimplifying a lot of things for the sake of brevity
and time.
But the essential things are in there.
The talk will be structured in roughly in three parts.
First I'll talk some about some of the more relevant
implementation details, which are useful to understanding
some of AutoGraph's behavior.
Then I'll describe the various ways in which
you can interact with it.
And lastly, I'll go through various use cases that
highlight what works, what doesn't work, common pitfalls,
how to stay away from them, and what are our plans
to eventually address them.
So let's begin with the implementation.
From a systems perspective, this is roughly
what AutoGraph looks like.
In broad strokes, we have the following.
Going from the bottom to the top,
we have an infrastructure for performing source code
transformations with various helpers.
And on top of that, we have individual transformations.
For instance, there is a separate transformation
that handles function calls.
Another one handles break statements.
And yet another transformation handles if statements.
And these transformations are independent and composable.
Many of these transformations then
replace your code with calls to special AutoGraph functions.
We call them overloads or operators.
The is reason that they are similar to Python's
overloaded operators.
Now, of those overloads, there are the most interesting ones,
the ones that specialize on creating TensorFlow ops.
And lastly, there's a high-level API
that glues them all together.
And this is typically what you usually
interact with as a user.
One last note that I should make is
that of all these pieces, only the TensorFlow specialized
overloads and perhaps the high-level APIs,
only these are specific to TensorFlow.
Everything else is fairly generic and reusable,
and we hope to eventually have them
in a separate library that can be used for other purposes
as well.
So one of the fundamental pieces of AutoGraph
is, of course, the source code transformation bit.
So let's look at that a bit more closely.
Source code transformation is essentially
what makes AutoGraph a transpiler.
It's unit of work is functions.
That is at runtime, a function is being
converted into a new function.
So let's look more closely at that process.
It is roughly, loosely speaking, a five-step process.
The first step is to obtain the source code of the function.
Now, the standard Python library makes that easy for us.
It provides that inspect module, which is built in,
and it lets us do that.
This also highlights one of the fundamental requirements
of AutoGraph.
In order to convert a function, that function
must expose its source code.
And that's typically true for almost all functions in Python,
although there are certain exceptions.
Normally, you can test this on your function
by calling the inspect get source.
If inspect get source returns data,
then AutoGraph should be fine with it.
The second step in this process is
to parse the code into an AST.
And once more, there is a standard Python API
for this, which is good.
We, in fact, use a thin layer on top of that.
It's a third-party library called Gast.
It's practically identical to AST,
but it handles all the version differences
between Python 2 and Python 3.
It's worth mentioning at this point
that AutoGraph operates entirely at AST level.
There is no lower-level intermediate representation.
And we never interact with the bytecode.
And that has some unique advantages.
Now, the third step does the bulk of the work.
And that's quite both literally and figuratively.
The standard Python library offers a mechanism that
helps us with that as well.
The AST module provides a mechanism
for visiting and transforming ASTs.
That mechanism uses the visitor pattern,
and it's sketched here.
Basically you get some callbacks whenever the visitor encounters
different types of nodes.
And on top of that, we have built an entire library
of such transformations, as we've
seen in the previous diagram.
These transformations are called in sequence.
Now, once transformed, the AST is unparsed back
into source code in the form of a string.
There isn't a standard library for doing that.
But thankfully there's a third-party library
called Astor, which does a decent job of that.
Essentially, it's lots of string concatenations.
There's nothing special about that.
Finally, the source code is being outputted into a file
and then loaded using a mechanism that's
identical to writing an import statement.
Once more, Python helps us with that,
with the standard module called Imp.
The special thing about Imp is that it only works
with files on disk, hence the need
to generate a temporary file.
I should also make a slight note that another mechanism that we
could have used would be Exec.
And we've been going back and forth
between using that and Imp.
There are pros and cons to using each.
So we might revisit this in the future.
A few other mechanisms that are worth mentioning, one of them
is the templating system that we developed to help us
with generating code.
It essentially lets us write templates
in the form of strings, code blocks of strings.
And they support placeholders, and they
let us generate more complex or new ASTs.
If you ever poke inside the transformations library,
you will see plenty of such templates.
Another important piece is the static analysis,
which is critical in supporting certain transformations,
and we'll see more about that in a bit.
The analysis itself can be viewed as just a simple walk
over the AST.
And it annotates nodes with relevant information.
Another important mechanism is caching.
Caching itself is completely transparent to the user.
But it does help us make sure that every function is
converted no more than once, loosely speaking.
This cache relies on the key assumption
that the conversion process is entirely static.
That is the generated code, what ends up
in the generated code does not depend
on any arguments or variables or any other state of Python.
Basically, if you look at some plain code on paper,
you would know exactly what the output code should be.
Next let's talk about some of the actual transformations that
are being made.
And before I proceed, I should clarify that I'll
use the word variable a lot.
These are meant to mean Python variables, not
to be confused with TensorFlow variables, which
are not involved here.
Once such transformation is simplifying the code
by replacing some statements with other statements, simpler
ones.
Such as, for instance, we replace break statements
with variables and additional if statements.
This essentially helps us avoid the need
to add to build special handlers in TensorFlow
for these statements, like break.
They're just easier to lower into something simpler.
And the process-- yes?
AUDIENCE: Does that imply that if I
do while n less than a million break,
that that's going to be very efficient when inefficiently
converted, because it will still loop over the maximal range?
DAN MOLDOVAN: It will not loop over the maximum range
because the while statement, as seen in this example,
will have its condition augmented.
So yes, the overhead is one maybe two extra conditionals,
not more than that.
Yes?
AUDIENCE: Are you guaranteed a [INAUDIBLE] variable names?
DAN MOLDOVAN: We have small modules.
It's not mentioned here [INAUDIBLE]..
We look at the function's globals, its closure.
And those indeed depend on the context variables.
So if you take a function, you convert it,
and you get a certain name.
And then suppose you create some other function.
And then you run the converted function.
It might clash.
That's very unlikely.
Because if you change the code that we transformed,
then the function will be re-converted, right?
So I'm not sure it's even possible to get a--
well, you could get a clash, in theory,
but you would have to work very hard to do that.
But, yeah, that's a very good observation.
That is one of the deviations from converted entirely static.
There are some minor exceptions.
So going back to the loading, I'm
not going to describe the entire process because it's
fairly straightforward.
I think an example would suffice.
For instance, here, notice that the break statement
was replaced with a did break variable.
And then we have a few extra conditionals,
like, for instance, this one at the bottom, if did not break,
i star equals to 2 to protect the code.
And the conversions for continue and return statements
are similar.
Another important type of conversion
is for function calls.
We do this for all function calls.
We replace them with a call to a special wrapper.
This wrapper, as its name suggests,
might decide to convert the function at the runtime,
to convert the target function at runtime.
But it may decide not to do that.
Many functions don't need to be converted.
But the important part is that we
replace all function calls because statically we
do not know what their type is.
And we do not know whether we should convert them or not.
So we defer that decision to runtime.
Another mention that's probably worth making here
is that from the graph's perspective,
functions are in line.
We don't in line.
We don't create any TF function.
We don't create any graph functions.
So from this perspective, AutoGraph is consistent with
[? V1-style ?] graph code.
AUDIENCE: What do you mean by runtime?
Do you mean when TensorFlow runs it
or when the Python user runs it?
DAN MOLDOVAN: That's a very good question.
There's more than one runtime.
In this case, I'm referring to the Python runtime.
Next, the quintessential transformation, we might say,
is converting control flow.
So let's look at if statements first.
The transformation itself is actually quite mechanical.
For example, the body of the if statement
becomes a new function, and we add some return values to it.
And we'll see more about that in a moment.
And the if itself becomes a call to a special AutoGraph
overload.
I'm, of course omitting the else block here
for the sake of brevity.
But it's equivalent with the main block,
main body of the if statement.
Once more, all the if statements are converted in this way.
And here we have an example for a statement.
Note that and there's nothing out of the ordinary here.
The body of the if becomes a body in a new function.
And the if statement is replaced with the function call.
Now, loops are ever so slightly more complicated
because they use state variable, but not by much.
Once more, the body of the loop is transformed
into a new function.
This time the function has arguments
representing the variables.
The conditional also becomes a function this time
because it depends on the loop variables.
And once more, the statement itself is
replaced with a function call.
Now, the more interesting question
is, how do we decide which of the loop
variables in your program?
We could of course take all the variables in scope
and make them loop variables.
But that would be quite inefficient.
The heuristic we use to do that is actually quite simple.
It relies on static analysis.
And in short, a loop variable must be
both of these two conditions.
First, it has to be modified inside the loop, which
is quite evident.
If the loop doesn't modify it, then it's invariant to it.
And the second condition is that it
has to be either live in or out of the loop.
Now what live means is--
live in means that the loop depends
on the value of the variable before it entered the loop.
AUDIENCE: It seems like if something is a loop condition,
it wouldn't have to be live into or out of the loop
to be a loop variable.
So is it either of these conditions?
DAN MOLDOVAN: If it's a loop condition,
then the variable would be live into the loop,
because it's read before anything else.
I'll show an example that hopefully clarifies that a bit.
The live out of the loop is similar.
If the variable is used after the loop, then it's live out.
So here we have an example.
To Josh's remark, a is both modified by the loop,
but also live into the loop because once
you enter the loop, the first thing that happens,
a is being read.
So the value of a before the loop is definitely relevant.
If a starts positive, then the loop will cycle.
If a is 0, then the loop will not.
So a's live into the loop.
b, on the other hand, is not modified by the loop.
So we can leave that one out.
And c is also interesting because it
is modified inside the loop, but it is not live.
That's because as soon as you enter the loop,
the c variable is being immediately overwritten.
So the fact that it had the value 3 before the loop
is completely irrelevant to the loop
because that value is being destroyed, regardless.
And this is a sketch of the resulting code.
And I'll leave it as an exercise to verify that,
so that is indeed correct.
Next as, I mentioned, the conversion process
is entirely static.
And all the statements are being converted.
All the function calls are being converted.
And that means that the overloads
must handle any type or value verification at runtime,
once more, at the Python runtime.
Yes?
AUDIENCE: So when you say modified in the loop,
I guess it's not only things that are assigned.
A function within the loop can also modify variables,
is that true?
DAN MOLDOVAN: That's a great observation.
And I will talk about that in a bit more detail.
In order for AutoGraph to correctly convert code--
this is only when it transforms the loop into a TensorFlow
loop--
the side effects, such as these modifications,
have to be visible.
So if you build a function that hides that modification,
then AutoGraph will not detect it.
That's an excellent observation.
I'll get to that.
I'll get to a specific example of that in a moment.
So as I was mentioning, there's the dynamic dispatch that's
handled by all operators.
An interesting observation here is
that if we were to convert pure Python code
in this way with Autograph, it would become quite slower
because every if statement will do an instance or some type
of [INAUDIBLE].
So you can imagine it would be much
lower than normal Python code.
However, in the case of TensorFlow for our purposes,
this overhead is peanuts compared to when creating ops.
So it doesn't really bother us in the case of building graphs.
So far, when describing the process,
I ignored an important piece in Python,
and that is the variable scoping.
So let's look at an example of that issue
with a simple conditional, with just increments
of value in an if statement.
Now, naively copying this block inside a function
won't work because due to Python's scoping rule,
x would become a variable local to the if true loop.
So any modification that you make to it
would be lost to the Python runtime.
In fact, you actually get an error here
because inside if 2x is a local variable, by incrementing it,
you're trying to access an undefined variable.
So the way we solved that was by renaming the variables
inside the function body.
And essentially what we're after is avoiding to modify directly
the x variable, because that's what
causes Python to consider it a variable local to the state,
to the function.
Now, a quick note on mutating variables.
So what I just showed, was valid for simple variables,
like x equals 1, and so on and so forth.
Mutating them, like this statement, where we say x.a,
equals to something, that is fine
because that will not cause x or x.a or anything
to become local to the function.
So x still points to the correct object, and we're safe.
So mutating objects, in this case, doesn't bother us.
But--
AUDIENCE: But with the x of a example in there,
you have the problem that because we
have to execute both branches of the function,
we're going to unconditionally mutate x, right?
DAN MOLDOVAN: That's exactly so, yes.
And that's exactly what I'm going into,
with a bit more detail on these mutations.
And, as Alex alluded, you have to be
careful about the effects of tracing when
running TensorFlow statements.
By you means we, AutoGraph.
So we have to handle that case.
So one of the complications with mutation
is probably best explained with this simple example.
Suppose we have a method that mutates itself.
It does some changes in a loop to one of its properties,
nothing out of the ordinary.
With a naive transformation of this loop, not doing anything
is fine.
It is correct if that loop was executed as a Python loop.
And the effective code that executed
looks kind of like this.
This is not what's generated.
That while loop at the bottom is actually, in fact,
an AutoGraph overload.
But the effective code that runs is this.
So we have a loop body function, a loop cond function.
And there's the why loop, which calls them,
as you might expect.
All this works fine if it's a Python loop.
However, it no longer works fine if it's a TensorFlow loop.
Why is that?
Well, I'll leave it as an exercise
to think what is the value of self.a
as this statement executes and after this tf while loop runs?
And also what happens to the tf while loop at runtime?
But I'll just go to a possible solution for this.
And that is to create a temporary--
a special loop variable corresponding to self.a itself.
And if we do this with a bit of careful handling,
so that we have self.a point to the correct value,
both inside the body and inside the cond and after
we have executed the while loop.
If we do this, the tf while loop will execute correctly,
and you will get the result that you would expect.
The problem with that is that it breaks Python semantics.
And that's something we do not want to happen.
To show that, let's consider that a is not
a trivial property, but it's a custom setter
that you defined in your code.
And let's consider that that custom setter
has some side effects.
For instance, it prints some messages.
Now, with this equivalent code, what would happen
when it ran as a Python loop?
Well, there's a lot of assignments.
There are too many assignments to self.a,
and that will trigger the side effects
in your custom property.
So you will see way too many prints, in this case.
And that's definitely something we do not
want to happen because we want to preserve Python semantics.
I'm sorry.
There was a question?
AUDIENCE: It seems like this transformation is just wrong,
though.
Shouldn't all the self.a's be replaced with self_a's?
DAN MOLDOVAN: That's a very good question.
If we did go ahead and replace all those self.a's, with
self_a's, then if you called any method--
suppose you called a method that itself, behind the scenes,
did some more modifications to self.a--
then that method would not capture the value of self.a,
right?
So we have to make sure-- we have
to put the proper values inside self.a
because some other code might need it.
So the way we solve this problem is
to put these external modifications
into separate functions.
We call them get state and set state because, in a way,
they capture the state of the Python execution at runtime
during tracing.
But what these allow us to do is they
allow us to do this kind of modifications
in the case of tf while loops.
And then in the case when we run regular Python loops,
we just don't call these methods.
So both paths are happy now.
Next, if there are no additional questions--
I know this is definitely one of the trickier parts of AutoGraph
itself.
But if there are no questions, let's go to the second part
and talk about the APIs that users can interact with.
First and foremost, the absolute recommendation
is that if you can use function, just use that.
It has certain advantages.
It can do automatic control dependencies.
There are some types of APIs that
only work inside tf function.
It also caches the graph, in addition to other things.
And it has additional smarts to handle exceptions.
So definitely function is recommended.
But if you really, really can't use tf function,
you can call AutoGraph directly.
There's a specialized API.
But keep in mind that it does not add automatic control
dependencies.
And it's also less user friendly.
Other APIs that you can use to tweak things
include the do not convert annotation,
which has the obvious effect.
Although, even in that case, it's still
preferable to use tf function with AutoGraph equals false.
Also, if you'd like to have a look at the generated code,
there's an API for that.
And as we experiment with new transformations,
there is this feature enumeration
that you can use to enable them, to enable transformations
that are not stable enough to release into production.
Now, a few words on debugging.
By far, the best way to debug a code is to run it Eagerly.
And there is a function in TensorFlow
that helps you do that.
Or you can just remove the tf function decorator.
But this one lets you do it without changes to the code.
And that causes tf functions to run Eagerly.
And you can use PDB and everything else inside there.
If running Eagerly is not an option,
then one way to peek inside what AutoGraph is doing
is to crank up the verbosity.
There's this API for that.
Increasing the verbosity is also useful when filing bugs.
There is a caution I should make.
Of course, increasing the verbosity
can cause quite a bit of log spam.
But it will also dump data in addition to code.
It will log argument function arguments and things like that.
So please be careful when sharing verbose logs.
Now, if you do enable the PDB inside AutoGraph code using,
for instance, PDB set trace, that will not crash.
It will work in some way.
Just remember that set trace is a function
and like every other function, will be converted
with an AutoGraph overload.
And what that means is that PDB will land somewhere
inside the AutoGraph API.
At the time of this talk, you have
to step out twice to get back into the generated code.
And the other caveat is that, of course,
you will land inside generated code.
Now a note on this generated code.
It definitely contains lots of boilerplate.
It is designed to be robust, not pretty.
And ideally, in a perfect world, you
should never have to deal with it.
You should never have to see that code.
And we're working towards achieving that.
But until then, if you do end up in a situation
where you have to deal with generated code,
even if you see it by accident or not,
or even if you have to actually deal with it,
please file a bug so that we can work towards avoiding
that kind of exposure.
The next section, I want to mention
some of the semantics that are related AutoGraph
because these dictate what you should and you should not
expect of it.
Now, rather than a detailed explanation,
I'll just list some broad guiding principles.
And the first such principle is that we
intend AutoGraph to preserve the semantics of existing
well-behaved code.
By well-behaved I mean, in general,
it runs without raising an exception.
So traditional pure Python code should be functionally not
changed under AutoGraph.
And the same should be valid about existing TF v1 graph
code.
Now, to be clear, if you give such code to AutoGraph,
it will transform it.
But you should not expect its functionality,
the functionality of the transformed code to change.
Then with respect to Eager code, AutoGraph obviously
supports a subset of Eager.
But the parts that it does support, then those
should also preserve functionality
as they go back and forth between Eager and AutoGraph.
That means that AutoGraph code should not
change its functionality when executed Eagerly.
And that essentially is what lets you remove the tf function
annotation without having to modify the code in any way.
So at least in theory, when you remove tf function,
the behavior of the code should not change.
Yes?
AUDIENCE: Does this actually happen
at all, Eagerly executing AutoGraph code?
I guess I sort of assumed that we just disabled
function at the same time we disabled AutoGraph.
DAN MOLDOVAN: Yeah, you disabled everything.
Basically you run the code exactly as it looks.
AUDIENCE: So this is something that could happen.
You could execute--
AUDIENCE: You said it had a flag that turned off.
DAN MOLDOVAN: Yes, there is a flag to turn it off,
or you could remove the tf function annotation.
Either of those should not change
the behavior of the code.
AUDIENCE: So that flag still does AutoGraph transformation.
It just runs it Eagerly?
DAN MOLDOVAN: It doesn't do the AutoGraph transformation
anymore.
AUDIENCE: OK.
So there is a mode where we can do the AutoGraph transformation
and then still run Eagerly?
DAN MOLDOVAN: There is, but not with tf function.
With tf function, it's either running graph without AutoGraph
or running Eager without AutoGraph.
AUDIENCE: Well, like the explicit convert API,
if I ran that--
DAN MOLDOVAN: Yes, with that one,
you could potentially create some graph-like code
and execute it Eagerly.
And that one should also preserve its functionality.
But that's, of course, in the Eager semantics, right?
Eager should execute graph code as if normal, right?
AUDIENCE: But there's no reason to do that, right?
DAN MOLDOVAN: No, this is just for kicks, I guess.
AUDIENCE: So if you want to maybe debug
an issue with AutoGraph, right?
AUDIENCE: It's true, yeah.
AUDIENCE: Theoretically, you could.
DAN MOLDOVAN: It could.
But even then, keep in mind that while there's
code that runs in graph and there's code that runs Eager,
so just make sure that those two are truly consistent, right?
An implication of these guiding principle
is that presupposes that code which does not
depend on TensorFlow for objects will not
run as a TensorFlow statement.
And that should make it hopefully
easy to reason about code.
So let me show you an example.
Of these for statements, for loops, the one at the top
is legal Python code.
It doesn't depend on any tensors.
Therefore it will run as a Python loop.
The other three will run as TensorFlow loops
because, well, one depends on the tf range.
The other one depends on the data set.
And the third one depends on distribution strategy.
In this last section, I'll go through some usage examples
which I believe are most interesting from a user's
perspective.
Now, I will focus a bit on use cases which
are illegal in AutoGraph because we
have lots and lots of samples of fairly complex code that works,
but we have fewer examples of code that doesn't work.
So here they are.
So I'll begin with control flow.
And as a warm up, I'll show the ideal code for AutoGraph.
This is definitely code that AutoGraph can handle.
And it's code that we like most.
And that's because it does its operations
in plain sight, no hidden side effects, no hidden triggers.
Everything is plain.
Another example that works well is using statements like break.
As we've seen, these are low words,
so AutoGraph can deal with those fine as well.
Now, here's an example of code that
has certain limitations when running as a TensorFlow control
flow.
So this if statement, it depends on a tensor,
therefore it will run as a tf cond.
However, notice that this x variable is only
initialized in the if branch and not initialized
in the else branch.
And TensorFlow, as we know, does not
have the notion of non-values or undefined variables.
So we cannot just do this in a tf cond.
And instead we raise an error.
And this is actually one of the better error messages
that we raise, where we explain that you have to initialize
x in the else branch.
AUDIENCE: So here you could do this
by making x into an optional value.
DAN MOLDOVAN: That is true, yes.
And that's why I'm very excited about optionals.
AUDIENCE: What if x is local to that branch?
What happens then?
DAN MOLDOVAN: Then it's fine, yes.
If it's local, then it's fine.
That's why we go through all that pain
to do liveness analysis and modifications,
just so that local variables don't trip it.
Now, the same restriction can extend to things
that you might not expect.
For instance, if we have a return value,
that would cause the if statement
to deal with an undefined value.
This is also illegal.
And the error message is pretty nice in this case as well.
It tells you that you have to return a value
from the else branch as well.
Another example of the same limitation, this time involving
object properties.
In this case, the error message is a bit confusing.
And we're working to actually fix it.
The error message is, actually, in my opinion,
it's very confusing because it's a tf cond that actually
would execute the else branch.
But there's no else branch.
So when is it trying to access it?
It's definitely a confusing error.
But it will be much more friendly hopefully soon.
One quick note that these limitations
around the non or undefined symbols
can easily be avoided by initializing your variables
early.
So if you initialize, for instance, our x,
add to the top with some default value,
then everything else would work nicely once more.
So they're fairly easy to prevent.
Now, if I could be pedantic for just a moment,
I would like to recommend that when
you have to deal with situations where you have some default
values, I definitely recommend that you
have a separate Boolean to represent the state of not
initialized or not valid.
I definitely recommend that over using magic values.
During that can save you a world of pain later on.
And this is not strictly--
it's totally unrelated to AutoGraph.
It's just a recommendation of a good practice in general.
Now, a more significant limitation
in TensorFlow control flow is around hidden side effects,
as we actually had a question alluding to this.
So let's look at this example where we have a simple helper
method that mutates the state of self,
then there is another method that calls this helper.
So when converting this larger method, this method f,
AutoGraph's static analysis, when
looking at the variables for that if statement,
has no way of seeing that self.a is
being modified because that's hidden inside the method.
And we do not do cross-function analysis.
Well, not yet, at least.
So that means that the tf cond will ultimately
end up not accounting for that modification to self.a,
and you get this rather obscure error message, which
is quite confused.
Once more, the error message can and hopefully will
be more helpful.
But the limitation itself remains.
Solutions are definitely possible.
And I think they would make a nice future project.
But for now, they are a matter of future development.
Now, a good defensive against these kinds of patterns
is to use a more functional style in your code.
Functional style, that means if your function modifies a value,
return it.
And that helps AutoGraph.
For instance, in this example, if we modify our code
like this to return, then the new value,
that's really put in self.a.
And then we do the assignment in plain,
inside the converted function.
Then things are happy once more, and everything works.
And this is my last bit of pedantry,
I hope, for this talk.
In general, functional style tends to be loved by machines.
Compilers and static analyzers have a much easier time
dealing with functional code, code
that takes its inputs as arguments and returns
everything that's modified.
And sometimes it could help the code become more readable, too.
Next up, a few examples around data sets,
which are quite satisfying, in my opinion,
because it shows that the underlying TF data
and distribution strategy APIs are powerful enough
to facilitate these kinds of conversions.
So the first example is that iterators, the TF data
iterators, work in almost all cases.
And we'll see the few exceptions in a second.
But essentially, you can take the iterator.
You can go through parts of it.
And then you can break out of the loop.
And then you can resume it.
And everything works as you would expect.
And if you're curious, the implementations for for loops
using TF data iterators in data sets,
they're quite an interesting read.
I think they're quite a nice feat.
The code, with all those callbacks,
might be difficult to follow.
But in my opinion, it's quite interesting.
And you can find it in the specialized control
for operators of AutoGraph.
Especially for data sets, we actually to handle a for loop,
we end up applying three operations in sequence,
scan, take while, and reduce.
And I think that's pretty nice.
Consuming an iterator with the next function also works.
And' just to be clear, this is code that runs as a graph code.
Next let's talk about handling runtime exceptions.
And since we were just talking about iterators,
let's talk about the common pattern in Python.
It's generally considered Pythonic, such a pattern,
where you just try things and catch an exception if they
don't work.
So instead of having a pattern where you say, if I can do foo,
do foo.
This pattern is do foo, tried do foo, except can't do it, right?
And one of the most common uses of such a pattern
is the use of iterators, where you have a loop.
And inside the try except block you just try to call next.
Now, for pure Python control flow under AutoGraph,
this works just fine.
It works the way you would expect.
Unfortunately, that doesn't work for TensorFlow control flow.
And that's because-- well, it's a dual reason.
One of them is that there is no exception catching
in graph mode.
There is no op that catches TensorFlow runtime exceptions.
Now, on the other hand, you could conceive that AutoGraph
could lower exceptions.
I mean, we lower return statements,
therefore, we should be able to lower exceptions as well.
However, that could make the code prohibitively
complex and slow, for instance, because any statement could
conceivably raise an exception.
You would have to wrap each line with an if statement.
So the lowering would not look very pretty, at least not
in the trivial case.
So the implication of this means that if you have a TensorFlow
control flow statement and you wrap a next call
into such a try except block, AutoGraph
will not complain about it.
It will leave the try except into the code.
It will not transform it.
But if you think about it, the effective code
will completely ignore it.
The runtime terms TensorFlow exception
says there is no caching inside the graph.
Any runtime error will bubble all the way
through the TensorFlow runtime.
So essentially what this means is
that if you put a try except inside graph code,
the exception will not be caught.
It will just fly past the accept statement.
It also means that if you do end up trying to catch exceptions,
you should do that in Eager mode outside of the TF function.
Because in Eager mode, you can catch them, right?
The runtime exception bubbles through the TensorFlow runtime.
And then it's captured by TF function
and re-raised as a Python exception.
So you can't catch it, just not inside the graph.
AUDIENCE: Should we add try catch up to TensorFlow?
DAN MOLDOVAN: I would love that.
I don't know what the implications
about the optimization and XLA are.
AUDIENCE: I think it's a little scary because
of the unknown semantics of what ops could be running
in parallel at the same time and only one of them
generates an error.
So it's [INAUDIBLE] the other ones get canceled or what.
DAN MOLDOVAN: That's true.
AUDIENCE: We can't actually use actual exceptions
to implement this, because we are building without.
AUDIENCE: Or we wouldn't use C++ exceptions.
It would have to be a new TensorFlow runtime language
feature.
DAN MOLDOVAN: Exception Tensor.
AUDIENCE: You could have an op that
takes three function [INAUDIBLE] and calls
another [INAUDIBLE] an exception,
then always calls the third one to simulate the behavior of try
catch finally.
That would have the consequence that Josh pointed out,
that the cancellation in TensorFlow
is very non-deterministic about what actually gets executed
in the presence of an error.
But, yeah.
It's a separate discussion.
I just thought I'd bring it up.
DAN MOLDOVAN: We've definitely stirred the hornet's nest
on this.
Yes?
AUDIENCE: So a question about the previous slides.
How can you tell whether a loop is a TensorFlow iterator
and whether a loop is just a Python iterator?
DAN MOLDOVAN: That's an excellent question.
In this particular case, I'm just
implying that stop condition returns a tensor.
In general, we look at the condition of the loop.
If the condition object is a tensor,
then it will be at the if while loop.
If it's a Python Boolean or while not,
then it will be a plain Python loop.
And it will be unrolled.
Does this answer your question?
So going past adding a TF catch exception to TensorFlow,
now there are good news about this.
And that is that in most of the code,
you can avoid having to catch exceptions altogether.
For instance, with data sets, you
can transpose the computation a bit.
And instead of having a while loop,
you can have a for loop over the data set.
And that will make sure that the loop stops
when the data set is exhausted.
And then you move the condition inside the if statement
because we do support break statements.
So these two pieces of code, the one in the right
and one in the left, are functionally equivalent.
And if you squint, I think it's actually even shorter.
So I dare say, it's actually cleaner.
Next up, let's discuss a bit about collections.
So again, in normal pure Python code,
this code snippet is, as you probably
expected, it's very common, where you just take a list
and operate on it.
We do that in Python a lot, right?
And as with any other pure Python code,
this works just fine under AutoGraph.
But if the loop is a TensorFlow loop,
things don't work anymore.
And once more you get the rather obscure error message
that we will hopefully fix soon.
But in general, the loop is that you cannot operate on a Python
collection from TensorFlow control flow.
That just doesn't align with things.
And it's not supported by AutoGraph.
Instead, it's a good practice to use specialized TensorFlow
collections, like, for instance, TensorArray.
And it's a good idea to do it even
when you work in Eager mode because that would mean
that you don't have to modify the code if you ever
want to go to AutoGraph.
AUDIENCE: You don't want to do that transformation yourself,
to switch it to a TensorArray?
DAN MOLDOVAN: The problem with that transformation
is that it's difficult to do.
AUDIENCE: It's sort of retroactive [INAUDIBLE]..
DAN MOLDOVAN: The main problem, if we look at this back slide,
there is this L being initialized on an empty Python
list.
First and foremost, we don't know the type of that list.
And we don't know--
at this line, it's unclear whether the user even
wanted a Python list or a TensorFlow list.
So we'd be forced to make assumptions
that would violate the semantics of normal Python code.
It's definitely a challenge.
That's why we resorted to the rule
that, OK, if you want a TensorFlow list,
please be explicit about this.
And we'll offer as much syntactic sugar
around that as possible.
But you have to explicitly request it.
Let's see a few other examples that
need special attention, this time around loops
that change the type or shape of their variables.
So first off, a probably familiar example
of a type, a TensorFlow while loop, which you probably
are already aware that TensorFlow limits
the degrees to which tensors can change shapes or D
types inside the loop itself.
And you typically get error message of this kind,
that some tensor enters the loop in this shape
and has a different shape after one iteration.
Now, there are ways to deal with this, one of which
is a specifying shaping variance for the loop.
And we're working to add support for that in AutoGraph
as well, with a special function call
that lets you specify them.
Another thing that we're working to address
is making the error message more useful as well.
For example here, it would be nicer if the error message was
saying something about a variable a, rather
than some obscure cond 0.
More subtle effect of changing types in a loop is shown here.
This variable a starts as a plain Python value.
And then inside the loop, it becomes a tensor.
Now, according to AutoGraph dispatch loop, when we first
execute the loop, when we execute the first iteration,
it would appear that it's a Python
loop because a is a Python scalar, therefore, Python loop.
But after the first iteration, it
would appear that the loop is, in fact, a TensorFlow loop.
So we're working to improve the error message here
as well to be explicit that that happens.
But in the future, we hope to actually
just deal with that directly.
So for instance, you can envision
that we could cycle through the loop a couple of times.
And if we decide that the Python loop should become a TensorFlow
loop, we should just do that.
And then we'd only have a few unrolled iterations
before the loop.
Now, going back to exceptions and to errors,
let me show you a few examples of how AutoGraph modifies them
so that they don't point to generated code.
Now, first, graph construction errors
are being modified by expanding their error message.
And that is purely the error message itself.
We don't touch the trace back of that error.
That will still point to a generated code.
However, the error message has this stack trace-like message
that helps you locate the cause of the error
in the original source code.
And if you think about it, this is
very similar to the way TensorFlow runtime errors have
a second stack trace showing you the location of where
the op was created.
Now, this augmentation of the error message
is done using a mechanism that wraps the original exception
into a new one.
And unfortunately, we don't have time today
to discuss a lot of details about how it's done.
But what's important to mention is
that most of the time, the type of the error does not change.
So if you raise a value-- if the TensorFlow runtime--
sorry.
If your op raises a value error, then the users
will see a value error as well.
It's just that its message will be changed.
However, sometimes when we cannot replace errors using
the same type, you might see staging error instead.
And that typically happens when you have constructors that are
complex, and we cannot figure out how to call the constructor
in a way that keeps the data.
Most errors just have a simple init constructor
with just a string message.
And there we can just create a new error of the same type
with an expanded message.
But where we can't do that, you will see this staging error.
The original type of the exception
can still be recovered.
So you can still inspect the exception
to find the my special exception type and its original message
and so forth.
Now, lastly, runtime errors are modified in a similar fashion
so that they don't point to generated code.
In this case, the error message is not actually
changed from what was originally raised.
We just simply replace the references.
So here, you will see--
I ran this code in ipython.
And you'll that's why you see this ipython input.
That's the reference to the cell.
What's important is that you don't
see a reference to some temporary file that
contains generated code.
Now, once more--
I probably repeated that quite a bit in the talk--
we try as much as possible to remove
the references to generated code from error messages.
If you do see any messages where that's still not the case,
please do file a bug.
One last example that I want to show
is how decorators are handled.
And in general in Python, decorators
have just syntactic sugar.
They are just higher order functions
that get executed when the code is loaded.
From any perspective, that's why decorators are actually
difficult to detect because they're not
materialized in the AST.
At least most of the time they're not.
Anyway, when you convert decorated functions
with AutoGraph, the decorator will be converted.
And that's the reason why you will usually see
the source code of the wrapper.
For instance, if you have this decorator that just replaces
the function with a wrapper, and you
tried to convert the decorated function,
you will see the source code of the wrapper instead.
That should be no cause for alarm
because the recursive conversion will step into the wrapper
and convert the target function as well.
So things do work as expected.
That's it.
That's it for now.
That's all I had for today.
There are a lot of other topics that I
didn't cover for lack of time.
And I hope these and many other examples
will be discussed in more detail, in more
comprehensive reference guide that is currently in the works.
And that's it.
Thank you very much.
[APPLAUSE]