Placeholder Image

字幕列表 影片播放

  • ALEX PASSOS: I'm Alex Passos, and I'm

  • here again to talk about functions not sessions.

  • This function is the new way of using graphs

  • in TensorFlow in TF2.

  • All the material I'm going to cover here,

  • the design and the motivation, is mostly

  • described in one of the RFCs in the TensorFlow community GitHub

  • repo.

  • So if you go to GitHub.com/tenso rflow/communities/rfcs,

  • you will see on our see an RFC with exactly this title

  • in there where we go for a bunch of the motivation and a bunch

  • of the high-level design.

  • So here I'm mostly going to focus

  • on some nitty gritty details of the motivation

  • and more details about the implementation.

  • And things that if you're working on TensorFlow

  • and you're using functions to do something

  • or you're curious about function internals,

  • I hope to at least point you to the right

  • places to start reading the code to understand what's happening.

  • I'm mostly going to focus today on the high-level Python

  • side of things.

  • And there's another training session later.

  • I think the title's going to be eager execution runtime.

  • That's going to focus more on the C++ side of stuff.

  • So I think to understand functions,

  • it helps if you understand where we're coming from,

  • which is the session about run world in TensorFlow one.

  • And I think in TF1, when TF was originally designed,

  • it was designed as a C++ runtime first and only later came

  • a Python API.

  • And as far as a C++ runtime goes,

  • the API of graphs and sessions is pretty reasonable.

  • So you build a graph by some function

  • that the runtime does not care about,

  • and then you connect to the runtime

  • by opening this session.

  • This connection is important because a runtime can

  • be local, can be distributed.

  • There are all sorts of in between things.

  • And to actually run computation, you just call session at run.

  • Because you have a graph, you give it

  • the names of your inputs, the names of your outputs,

  • the names of particular nodes that you want to run.

  • And the runtime will go do its thing and return to you

  • the results in C++ normal arrays that you can use to manipulate

  • your data.

  • So this is very nice and convenient if you're writing

  • in C++ and if you're programming at this level.

  • You generally write the code that looks like this once,

  • and you spend your entire life as a TensorFlow developer

  • writing the little part that I abstracted out

  • called BuildMyGraph.

  • And I think it's an understatement

  • to say that just manually writing protocol buffers

  • is very awkward.

  • So we very, very quickly decided this is not a good way to go

  • and built an API around it.

  • And the first version of the API was very explicit.

  • So you created a graph, and then every time you created an op,

  • you pass the graph as an argument,

  • and this is still fine because it's very explicit

  • that you're building a graph.

  • So you can have this mental model

  • that you're building a graph that then you're

  • going to give to a runtime to execute.

  • This is not really idiomatic Python code.

  • So it's also very easy to see how to make

  • this idiomatic Python code.

  • You just stick the graph in a global context manager

  • and add a bunch of operator overloads and things like that.

  • And you end up with code that looks

  • like what TensorFlow code looks like today, which

  • is a little unfortunate, because the same code, by reading it,

  • you can't really tell whether an object is a tensor,

  • and hence, only has a value doing

  • an execution of a session, and is

  • this the third quantity et cetera, et cetera, that

  • has a name that might have some known

  • properties about the shape, but not all.

  • Or if this is just a normal Python object or non-pi array.

  • And this creates a lot of confusion

  • and I think leads to a very unnatural programming model.

  • The session.run thing also has a granularity problem,

  • which is that in the way it was originally built,

  • the graph, like, the stuff that you pass a session of run,

  • is a quantum of all the stuff you want to execute.

  • And around it is this very rigid boundary

  • where you keep stuff in the host memory of your client program,

  • give it to session.run, and then get results back

  • into host memory of your client program.

  • So one example that I think is illustrative of why this is not

  • ideal is if you have a reinforcement learning agent

  • that's implemented over a recurrent neural network,

  • in that scenario, your agent's going to run a loop where it's

  • going to read an observation from your environment, which

  • is some arbitrary code that runs in your host,

  • and has some state.

  • The states initialize at 0.

  • They look at the observation.

  • It runs it for a neural network.

  • And that neural network spits out a new stage and an action

  • for the agent to perform in the environment.

  • You take that action, bring it to client memory,

  • give it to the C++ code for an environment, your Atari game,

  • or whatever.

  • That will run for a while and then give you back a new state.

  • You want to shift this new observation,

  • you want to ship this new observation in the old state

  • back to the RNN.

  • But if your RNN is running on another device, say, a GPU,

  • there was really no reason for you to ship your RNN state back

  • to your client and then from the client back to the device.

  • So the boundary for stuff you want to run here

  • is not really working.

  • The boundary for stuff you want to run

  • is not the same as the boundary for stuff

  • that wants to live in on a device

  • or wants to live in the host.

  • And this gets even more complicated

  • once you put automatic differentiation into the story,

  • because TensorFlow uses the symbolic representation

  • for your computation that we call a graph.

  • We do automatic differentiation on

  • this symbolic representation.

  • So now the graph not only has to be a quantum for stuff

  • you want to run, but it has to be a quantum for stuff you

  • differentiate.

  • So if you stick to this reinforcement learning agent

  • example, a popular thing that people

  • used to do before we have now substantially better

  • deep reinforcement learning algorithms is policy gradient.

  • And the simplest policy gradient, it's got reinforced.

  • And what it amounts to doing is it will run your agent

  • for your m time steps.

  • You'll get a probability for the agents

  • to take the actions that it actually took.

  • And you'll take the gradient of that probability,

  • multiply by the [? reward ?] your agent got,

  • and apply this to the weights.

  • And now not only do we want to avoid transferring the RNN

  • state back and forth between the host and your accelerator,

  • but also you want to back prop through a number of steps

  • that might not even be known before you

  • start your computation.

  • Another issue is that session.run has a kind

  • of like--

  • it asks for too much information every single time.

  • So what every training loop or inference loop or anything user

  • TensorFlow looks like is not--

  • well, not every, but what most look

  • like is not a single culture session.run,

  • but a bunch of culture session.run in the loop.

  • And in all those calls, you're executing the same tensors,

  • you're fetching the same tensors,

  • and you're feeding the same symbolic tensors slightly

  • different numerical values.

  • And because the session.run API doesn't

  • know that you're going to be calling

  • those things in a loop where most of the arguments-- where

  • some things don't change and some things do change,

  • it has to re-perform a bunch of validation.

  • And so we put a cache in front of that validation.

  • And a cache key becomes a performance problem.

  • Derek had the idea of just separating the stuff

  • the changes from the stuff that doesn't change

  • into this session.makecallable API

  • where you call it once with this stuff that doesn't change,

  • and you get back a function that you

  • call with just the stuff that changes.

  • So now all the validation that you're performing n times

  • is off the stuff that changed.

  • And the validation that you're performing only once

  • is of the stuff that stays the same.

  • This is not just a performance win,

  • but it's also kind of a usability win,

  • because just by looking at the call to your code,

  • you know what is fixed and what is variant.

  • And finally, the last like, very awkward thing about session.run

  • is that graph pruning is a very complicated model to program

  • to when you're writing in an imperative host programming

  • language.

  • So for example, I have my first function in there

  • where I create a variable, I assign a value to it,

  • I incremented a little bit, and then

  • it returns something that uses the variable time

  • some constant.

  • And if you just write code like this, because I didn't look

  • at the return value of a [INAUDIBLE] assign add,

  • that assignment will never happen.

  • Like, there's no way to make that assignment happen

  • in TensorFlow because you created a tensor

  • and you threw it away, and you did not

  • keep a reference to it so that you can session.run it later.

  • And you think well, that's crazy.

  • Why don't you just keep those references under the hood

  • and do something magical to fix it?

  • And the problem is that it's very easy for you

  • as a user to rely on the fact that this pruning is going

  • to be performed to try to encapsulate

  • your code a little better.

  • So design pattern that I've seen a lot

  • is that when you have some structure--

  • so for example, my fn2 there has a reinforcement

  • learning environment.

  • And that nth object is some complicated Python thing

  • that knows how to build a bunch of graphs.

  • And you can encapsulate that in a single function

  • in your code that returns how to get the current observation,

  • how to apply an action, and how to reset that environment.

  • So your code is now very concise,

  • but in practice, you have a function

  • that returns three things.

  • And you never want those three things to run together.

  • You always want, at most, one of them

  • to run at any point in time.

  • So this is a little frustrating, because we've

  • kind of locked ourselves out of being able to fix this problem.

  • And TensorFlow has a few partial solutions to this problem.

  • I think the most comprehensive [? parts ?] solution

  • to the problems in session.run is called a partial run.

  • But it's inherently limited, because it requires you

  • to have a fully enrolled graph.

  • It does not work with arbitrary control flow.

  • And it requires a complicated dance of like,

  • specifying everything you're likely you're

  • going to fetch in the future, then

  • the things you're going to fetch now,

  • and keeping them, passing tensor handles around.

  • And it's very, very easy to make mistakes when you're doing it.

  • Plus, what happens is that you as a user often

  • writes a Python function.

  • TensorFlow then runs the function to create a graph.

  • Then we take a graph, we validate it, we prune it,

  • we do a bunch of transformations.

  • And we hope that what we got out to run

  • is exactly the nodes that you had

  • intended to run in that Python function in the first place.

  • But because we have all these steps in the middle,

  • it's very easy to drop things and confuse things.

  • So all these usability problems are

  • inherent I think, to coupling this session run

  • API with this host programming language that

  • tries to make your code look very imperative,

  • like native Python code.

  • And so the way we break this and solve those problems

  • is with tf.function.

  • So what are the core ideas of tf.function?

  • It's that your functions inputs and outputs,

  • they live on devices.

  • They don't have to live on the host.

  • Another thing is that a function is differentiable,

  • and a function is an execution unit.

  • But it's not forced to be the whole execution

  • unit or the whole differential thing.

  • Like, you should be able to differentiate from many calls

  • to an execution.

  • And you should be able to make an execution unit out

  • of many functions.

  • So this way you get to break that like, single quantum

  • of work requirement of session.run

  • and be able to write your programs in a more

  • idiomatic way.

  • AUDIENCE: Could I clarify the first point?

  • I assume device also includes CPU device?

  • ALEX PASSOS: Yes, it also includes CPU.

  • AUDIENCE: --host memory.

  • ALEX PASSOS: It lives on host--

  • AUDIENCE: [INAUDIBLE].

  • ALEX PASSOS: No.

  • It lives on the host memory.

  • But it's not immediately accessible to your client

  • program.

  • Like, the act of running a function does

  • not immediately-- does not require

  • that its inputs are visible to the client program,

  • and does not immediately make its outputs

  • visible to the client program.

  • To make the outputs visible, you need to run an operation.

  • And to make the inputs--

  • to ship them into the runtime, you need to run an operation.

  • So we put the boundaries not at every call,

  • but at when you want-- when you have the data

  • and when you get the data.

  • Another fundamental property of tf.function

  • is that the notion of what should run

  • is not a property of control edges or graph pruning,

  • but is a property of the stuff that happens

  • in the body of the function.

  • So while we trace the Python code to build the function

  • graph, any stateful operation that ends in there must run.

  • And it's up to the TensorFlow runtime to run those operations

  • in an order that is indistinguishable from running

  • those operations in order as far as the user is concerned.

  • And finally, every bit of state that outlives the function call

  • should be an argument, either an explicit argument passed

  • by the user from Python, or an argument that's implicitly

  • capture like, closure capture or something like that that's just

  • passed to the runtime.

  • And by making all the state [INAUDIBLE]

  • function call an argument, we get to enforce property three

  • without too much complexity.

  • And incidentally, once you've looked at those requirements,

  • you kind of see why we have eager execution.

  • Because once you have the ability

  • to run functions like that, really, every single operation

  • should act like a function if you just run it,

  • which is why I think your execution is

  • important as a mental model, even if in practice, you

  • might want almost all of your code to live instead

  • of a graph for performance or for the playability to light

  • run times and things like that.

  • So once you do go that way, there

  • is a problem, because you now can

  • take an arbitrary piece of TensorFlow code

  • and run it eagerly or run it inside a tf function.

  • And this means that any semantic difference between those two

  • modes is going to break the abstraction barrier,

  • and can cause a lot of lost productivity, which

  • is why we want to do things like autograph, which is why we want

  • to do things like automatic control dependencies,

  • and try to have TensorFlow work as hard as it can to reduce

  • those differences and make them explicit

  • and raise errors as soon as we can so that we don't get--

  • lock ourselves out in a bad stage.

  • And the really big caveat, that is the easiest place for you

  • to make mistakes is when it comes to variables,

  • because variables in tfv1 is--

  • if you've watched the previous video, you should know--

  • they behave very differently from how

  • you'd expect a variables to behave in an eager way.

  • So this is one of those places where naively writing code can

  • very easily get you in trouble even with code

  • that looks very reasonable.

  • And my favorite short example is this, a function

  • that creates a variable and returns an operation that

  • uses the value of the variable.

  • If you run this into tfv1 or in graph mode, what will happen

  • is you will run this Python code once,

  • and then you'll call a session.run

  • on the result many times.

  • And moreover, as a side effect of running this code,

  • that variable is going to be added

  • to some global graphical action that

  • will allow you to modify its value later,

  • even though it's completely scoped inside this function.

  • So in tfv1, you run this function,

  • it uses a single variable, and you get to modify its value,

  • and then call session.run on the result

  • and get different numbers out.

  • But in eager mode, every time you run this function,

  • we create a new variable, do a matrix multiplication,

  • and throw it away.

  • So if you have code like this, your code

  • is something that's going to visibly see

  • the differences between TensorFlow v1 and TensorFlow

  • v2.

  • So we just disallow this.

  • And there are a few reasonable options

  • we could do with tf.function, which

  • is to say that tf.function should

  • follow the eager semantics for variables.

  • So when you have code like this, we

  • will insert a create variable op, insert a matmul,

  • insert a destroy a variable op, and then return

  • the value of the matmul.

  • We could also choose that tf.function

  • is going to follow v1 graph semantics for variables.

  • And every time you create a variable,

  • we reuse it based on the name or something like that.

  • These are very easy options to implement.

  • A third option that also very easy to implement

  • is just disallow creating variables in tf.function.

  • These are reasonable options relatively

  • straightforward to implement, and not what we chose to do.

  • What we choose to do is a compromise

  • that would allow us to turn more code into tf.functions

  • while avoiding allowing code that would behave differently

  • in Eager and in graph mode, and while avoiding breaking

  • the expectations of code that was written with tfv1 in mind

  • and then got wrapped into tf.function so it

  • works inside tfv2.

  • And the compromise we adopted is that if you just

  • try to create a variable inside of tf.function

  • that's disallowed and it create an exception.

  • However, if you guard your code so that the variable is only

  • created the first time the function is called,

  • we allow it.

  • And the reason why we allow this is

  • that if you run a function like the one in the bottom eagerly,

  • it'll create a variable a single time, use it many times.

  • And that will have exactly the same semantics

  • as just calling session.run many times on the result

  • of the function on top.

  • So this way by like, writing your code

  • in a way that promises that you will respect the semantics,

  • that you'll act in a way that does not

  • see the difference in semantics between eager and graph,

  • we allow it.

  • Right now we have a relatively big hammer

  • to detect these failures.

  • But I expect that over time, we'll

  • make this a little more precise, and we'll start

  • allowing more and more code.

  • So for example, an issue now is that if you create

  • a tf.function, and you pass something

  • like an optimizer as an argument to the function, trace it once.

  • The first time you use an optimizer,

  • the optimizer might create some variables for the internal

  • state of [? adam ?] or [? adigrad ?] or momentum

  • or something like that.

  • But now if you pass a different optimizer,

  • this different optimizer might try

  • to create its own variables.

  • And this is again, perfectly safe code,

  • because they're passing the optimize

  • as an argument, which means we're retracing the function.

  • So we're capturing the graph creation.

  • But currently, as of the recording of this video,

  • we raise an exception.

  • It's an exception to we can probably

  • stop raising if we're a little more precise.

  • And I'd like to do this at some point.

  • But the idea is that I want to expand the scope of code that

  • we allow that creates variables in tf.function until it

  • encompasses--

  • as long as it still encompasses code

  • that behaves the same in eager mode and in tfv1.

  • Just because this way there is less of a mistake.

  • Once there is no more tfv1 code out there,

  • then I think we can flip the flag

  • and allow all variable creation inside tf.function

  • with fully eager semantics.

  • But that's going to take a while.

  • AUDIENCE: How do you detect like,

  • if the code is in a disallowed format?

  • ALEX PASSOS: Ah, I will actually go for a slide of that later.

  • But essentially, what we do is we run it twice.

  • We run it once, see if you've created any variables.

  • If you haven't created any variables, you're safe.

  • But every time-- if you have created some variables,

  • retrace your Python code again, see

  • if it creates another new variables

  • and raise an exception.

  • And we also set it up that every time we ever need

  • to retrace your tf.function, if you do create variables

  • on a subsequent call, we'll raise an error.

  • Another issue is that the Python is a very dynamic language.

  • While TensorFlow graphs are very static--

  • TensorFlow graphs are not as statics as ZLAHLO graphs,

  • but they're still very static when

  • it comes to the types of things and the number of outputs

  • of an operation.

  • And sometimes also TensorFlow graphs

  • are static when it comes to shape in

  • that if you run the same graph building code, with input

  • tensors that have slightly different shapes,

  • we will generate different graphs.

  • Where the more about the shapes, sometimes they're

  • specialized to generate a faster graph that

  • knows more information statically, or can

  • do some assertions and validation statically, instead

  • of having you do them at runtime.

  • So tf.function has a choice, which

  • is to either trace a function once and raise an error if we

  • think you're calling it with arguments that

  • are incompatible with the argument

  • that we use to trace, or accept that we'll likely need to trace

  • the function many times, and set a policy for how we do this.

  • We chose to go with option two.

  • And the policy mostly looks like we use Nest--

  • the tf.Nest library to unpack your inputs.

  • Once you've unpacked your inputs,

  • we kind of split them into Python objects and tensors.

  • We replace the tensors by TensorSpec,

  • which is the [INAUDIBLE] in the tf public API.

  • And it just has a shape, a D type, and a name.

  • And then we re-morphed the thing into a structure,

  • and use that as a key into a dictionary.

  • AUDIENCE: What's the name here?

  • ALEX PASSOS: If you are in eager mode,

  • there is no such thing as a name.

  • But if you're building a graph, we do look at the name to--

  • the name of a tensor in graph mode, just

  • to try to preserve a little more information.

  • So we use this whole structure as a dictionary key.

  • We actually have to not quite use the exact structure.

  • We've replaced list with tuples and dictionaries

  • of lists of pairs, and a few other things

  • just to make sure that Python doesn't yell at us

  • we're trying to put unhashable things in a dictionary.

  • But the idea from a mile away, is that anything

  • that you pass through a tf.function that is a Python

  • object, if you change that in a way that a Python

  • dictionary would notice, that will trigger a tf.function

  • retracing.

  • If it's a tensor though, we do explicitly do not key

  • on the value of the tensor.

  • We key only on its shape and type.

  • And retrace only when the shape and type changes.

  • This is a little controversial, because there

  • are some types of Python values, specifically

  • scalars and non-pi arrays, that you

  • might want to treat as tensors.

  • And this is a decision that we might

  • want to revisit at some point, because it

  • leads to a lot of problems for our users.

  • But for now, we're conservative and retrace

  • when you change the identity of Python values.

  • So as I was saying, there are some downsides of this choice.

  • And the two biggest ones--

  • or the first, is that we do too much retracing,

  • as I just mentioned.

  • And the second one is that shapes are hard.

  • Specifically, the more we have static shapes,

  • the more efficient we can make our graphs.

  • And in practice, due to the hardware that we use

  • and the way we write, you kind of want to have your code--

  • your graphs mostly have static shapes in them

  • so that we can be as performance as possible.

  • However, it's often very convenient

  • to have things like a dynamic batch size

  • or a dynamic sequence length.

  • And those things might not even incur very large performance

  • penalties.

  • On GPUs, for example, dynamic batch sizes and static batch

  • sizes, they tend to consume about the same amount of time,

  • not necessarily on TPUs.

  • However, if we try to relax the shapes as you

  • call the function, retracing your code with a shape

  • that partially in normal [INAUDIBLE] dimensions

  • might call some graph building code to explode.

  • So we have an optional way for you to--

  • we have a few ways for you to control this now.

  • And we'll try to refine the default

  • policy to make it better.

  • But you can always choose the extremes.

  • So we give you essentially, three knobs control tracing.

  • So if you have a tf.function that you

  • build over pi function, if you want to force it to retrace,

  • you can always build another tf.function object.

  • And two separate tf.function objects share in those state.

  • This is a cheap way to force a retrace.

  • This gets you around the limitation

  • of not creating variables.

  • It gets you around the shapes and things like that.

  • You also have the flip side of this,

  • which is to prevent retraces, you have two options.

  • One is you can call get concrete function on the tf.function

  • object.

  • On that you pass a signature.

  • And you get a function back that you

  • can call with that signature.

  • And it will specialize on the particular properties

  • of that signature.

  • Or you can pass a signature when you define your tf.function.

  • That also works.

  • And finally, you have an experimental knob

  • whose behavior is likely to change in the future,

  • that if you set it to true, will try to relax shapes for you.

  • So if you know you're likely to have shapes with dynamic batch

  • size or dynamic sequence lane, and a few other cases

  • where running in your graph building code with partially

  • unknown shapes is perfectly fine,

  • then you can set this true and enjoy fewer re-tracings.

  • We might need to add newer knobs.

  • But I think our policy--

  • I think this is mostly fine.

  • And we'll iterate and refine the existing tracing policy

  • to make it better.

  • AUDIENCE: [INAUDIBLE] tracing happen

  • when you call the function the first time or every time?

  • ALEX PASSOS: We try to trace as little as we can.

  • So the first time you call it, we clearly have to trace.

  • There is no alternative, because you don't have a graph.

  • The second time we call it, if you call [? them with ?]

  • tensors with the same Python objects

  • and if tensors of compatible shapes and types,

  • we will not retrace.

  • But if you change the shapes and types of tensors,

  • then we're likely to have to retrace them.

  • AUDIENCE: Question.

  • So when the trace cache key, does it

  • include a global variable access [INAUDIBLE] function, or just

  • [INAUDIBLE]?

  • ALEX PASSOS: We do not put in the cache key

  • that variables access by the function

  • because I don't know how we would check this

  • without running the Python code.

  • AUDIENCE: So which means it may use

  • a change in a type of [INAUDIBLE] we

  • accessed [INAUDIBLE]?

  • ALEX PASSOS: Yes.

  • If you any kind of relying on Google Python state that

  • is not an argument to the tf function

  • might lead to breakage.

  • Yeah.

  • The Python is a funny language, because you can actually

  • check this.

  • You can take a Python function.

  • You can get the transitive closure of all the modules

  • and objects it has access to.

  • The problem is that this ends up being in the thousands

  • and thousands of symbols.

  • So we can't feasibly check whether the value

  • of any one of those has changed between function executions.

  • So this is kind of best effort.

  • And again, if it's a little bit of a caveat.

  • So if you have global state that you

  • want the tf.function to depend on,

  • put that representative state of a tf.variable.

  • Because if you change the value of a tf.variable,

  • a tf.function will see it.

  • AUDIENCE: Another question-- so in the actual tracing

  • is [INAUDIBLE] by running that function.

  • Certain functions have a side effect, say [INAUDIBLE] file.

  • Can this be executed twice?

  • ALEX PASSOS: Then again, if you want your side effects

  • to happen as the function is--

  • your Python code is only going to get

  • executed of all the tracing.

  • So if there are side effects that you care about,

  • you should make those side effects BFT side effects.

  • So don't use Python's file writing, use tf's file writing.

  • Don't use Python's file reading, use tf's file reading.

  • Don't use Python random number generations, use tf [INAUDIBLE]

  • number generators.

  • In general, anything that you try to make a tf thing,

  • is the way to make it work reliably tf.function.

  • And part of this is due to Autograph.

  • I'm not going to talk about Autograph here.

  • There will be another separate training talk on it,

  • because it's full of like, very interesting and cool

  • little bits.

  • AUDIENCE: I don't get how using the [INAUDIBLE] version

  • of these APIs [INAUDIBLE] side effects [INAUDIBLE]..

  • ALEX PASSOS: If you use a tf version of this API,

  • like, if here's a tf thing to write to your file

  • or to generate a random number, if you run it in graph mode,

  • we don't do anything.

  • We just create a symbolic expression that when evaluated,

  • will have the side effect.

  • AUDIENCE: So it actually doesn't execute it?

  • ALEX PASSOS: It doesn't execute it in graph mode.

  • In eager mode, it will execute it.

  • But in graph mode, it just builds a graph

  • that when executed, will have the desired side effect.

  • AUDIENCE: So just to [INAUDIBLE] decide

  • to use Python [INAUDIBLE] It's undefined behavior,

  • essentially.

  • ALEX PASSOS: Yes, it's undefined behavior.

  • And if you want to define it, you

  • need to control how often you trace the function.

  • You can choose also to force Python things to happen

  • using tf.pi function or tf.numbyfunction, which

  • will run Python code at function execution time

  • by explicitly delineating the Python code

  • that you want to be dynamic.

  • This has some limitations though,

  • because we're not very good at shipping Python code from one

  • host to another host.

  • So in general models that rely on pi function or non-pi

  • function, they're not serializable

  • and they do not run well in distributed settings.

  • So how do we make this work in practice?

  • And I think here, I want to give you

  • a walk-through of interesting pieces in the code,

  • some screenshots, some like, lightly rewritten

  • for readability bits, so that you

  • know what things to look for if you

  • want to understand or change the behavior of tf.function.

  • So the first structure that I think

  • is particularly interesting is the fun graph.

  • And the fun graph is a subclass of the TensorFlow graph

  • that overrides a lot of interesting behavior.

  • It's where the code to do automatic control dependencies

  • lives.

  • It's where the code to do Autograph lives.

  • It's also where we do closure capturing.

  • And closure capturing is maybe the most interesting part,

  • because in normal TensorFlow graphs,

  • you try to use a value from outside of the graph,

  • you immediately get an error.

  • But with functions, most programming functions,

  • you expect to be able to use values

  • from the defining context.

  • So fun graph has some logic to do that.

  • And it has two capturing modes, capturing by value

  • and capturing by reference.

  • By default, we capture by reference.

  • But we can turn capture by valuing off.

  • And the way we do this is that when

  • you try to create an operation in the graph,

  • we look at the inputs and capture them if we have to.

  • The way this is done is by creating a placeholder that

  • has the same shape and D type as the tensor

  • that you're trying to capture, and storing inside the fun

  • graph a map from the tensor that we captured

  • through the placeholder that we created,

  • so that later when we call the function,

  • we feed in that tensor as the value for the placeholder.

  • We do this for every external value.

  • Like, we do this with constants, for eager values,

  • for graph values.

  • The way this is setup, capturing a value

  • is something that is visible to the gradient code.

  • So you can differentiate a function with respect

  • to its invisible variable captures.

  • There's a lot of like, little subtle issues

  • there to get right.

  • But the general idea is that we're

  • going to create placeholders of the proper shape and type.

  • And at function call time, we're going

  • to try to pass the original argument as the placeholder.

  • And the nice thing about this is that the point where

  • you try to pass the vision argument

  • is a placeholder, if that is happening inside another fun

  • graph, and the original argument does not

  • belong to that graph, that will recursively

  • trigger a capture, which will recursively trigger a capture.

  • And this way will properly propagate value capture

  • throughout functions that call functions that call functions

  • that call functions.

  • And all of those are going to correctly handle

  • differentiation and a few other things.

  • So it should be mostly seamless.

  • Another thing the fun graph does is

  • it has the core code to take your Python function

  • and build a fun graph from it.

  • As you can see, it has many, many options.

  • There are lots all sorts of ways you can control this.

  • You can override shapes, you can capture by value,

  • you can add automatic control dependencies,

  • you can do Autograph, you can pass a signature.

  • But this is like the general workhorse

  • that we use every time we're chasing the Python

  • code that you pass this to create a tf.function.

  • And one of the particularly important things that it does

  • is do automatic control dependency.

  • And what this is trying to do is enforce that inside tf.function

  • program order is execution order as far as the TensorFlow

  • runtime is concerned.

  • When we also try to do this in a way that does not

  • harm performance-- but if you remember last week,

  • the video on resources and variables,

  • we're moving to a model in TensorFlow

  • where all stateful ops manipulate

  • an explicitly named resource.

  • And what this lets us do is that the first version

  • of the automatic control dependencies code was this.

  • Now it's a lot more complicated than that

  • because it tries to handle ops that

  • are stateful that do not have an explicitly declared resource.

  • And it tries to handle control flow v1.

  • So it's far messier.

  • But essentially, all you do is you

  • iterate over all your ops in the graph,

  • look at every input of an op.

  • If an input is a resource, you just

  • add a control edge from the last stop that

  • used this resource to this op.

  • And you overwrite a map to make this work.

  • And finally, you just return all the

  • like, at the bottom of the function,

  • for every resource, the last op that

  • was supposed to have used it so that we

  • can make those operations control outputs of a function.

  • And these control outputs are important because we can then

  • tell the TensorFlow runtime not to accidentally prune the side

  • effects that we intend to happen.

  • And so if you have operations in a function that an output does

  • not depend on, and that no side effects depend on,

  • we know that we can prune those.

  • And this also means that as we move

  • to a model where TensorFlow is more compiled instead

  • of interpreted, the fact that these controlled

  • dependencies enforce a linear order,

  • means it's relatively easy to--

  • well, easier to take TensorFlow code

  • and turn it into some intermediate representation

  • that will feed into a compiler, which you might hear about

  • in the MLIR talk.

  • So this just clarifies the semantics

  • and removes a lot of weird behaviors

  • you can get with tfv1.

  • And as you can see, just because we are tracing--

  • what this assumes is that code that--

  • we just want to execute code in program order.

  • And that's a far, far easier thing

  • to guarantee than a complicated pruning and partitioning

  • process like the one we have to rely on in tfv1.

  • The fun graph though is a Python only construct.

  • And to actually turn this fun graph into something

  • that we can execute, there's a thing in the CAPI

  • that takes up tf graph and generates a tf in the score

  • function out of it.

  • It again, has a bunch of options so that you control

  • your inputs, your outputs, you can uniquify things, et cetera.

  • But if you're looking for where do we turn a TensorFlow

  • graph into a function, this is the entry point

  • that you want to look at in our CAPI.

  • However, now we need to call those functions

  • that we've created.

  • And technically, you can call a function

  • just like you would call any op.

  • Once you've registered a function

  • with a intergraphs function library

  • or into the eager context, you can just

  • use TFE Execute or TF NewOperation

  • to define a function call, just like you would

  • define any operation execution.

  • Under the hood, this is going to use an op named

  • StatefulPartitionedCall.

  • So if you look at the source code for that operation,

  • you will see the place where we partition the function graph

  • into multiple devices, in case you

  • want to run your function over multiple GPUs or TPU cores.

  • It's also where we run a lot of [INAUDIBLE],, which is--

  • I hope there will be a separate training talk

  • about [INAUDIBLE],, but it's this cool thing

  • that runs all sorts of graph optimizations

  • and back-end specific rewrites and things like that.

  • And the place in the Python code where

  • we take you trying to call a function,

  • and we turn it into this partition call,

  • is this class called on underscore

  • EagerDefinedfunction.

  • So if it's search for that in TensorFlow source code,

  • you can try to read like, how do we exactly set up the function

  • call not correctly handled things

  • like, things that were captured, correctly handling gradients,

  • and things like that.

  • Differentiating functions then, is

  • built on top of the EagerDefinedFunction.

  • And the idea is that if you have a function that you want

  • to call, and that function's differential,

  • we need to generate three things under the hood.

  • One is what I call an inference version, which

  • is just a thing that runs the function

  • and returns the results.

  • That's what you're going to call if you're not trying

  • to differentiate though it.

  • But because we do a reverse mode automatic differentiation

  • in TensorFlow by default, to differentiate a function,

  • we need to run a backwards version of it.

  • And the backwards version might need to use intermediate values

  • from the forward pass.

  • So what we do is we generate a clone

  • of your inference function that returns

  • all the intermediate values that the grading code is

  • likely to need, and then we make another concrete function,

  • which is the backward thing.

  • And the interesting thing is that the forward

  • and the inference is just defined functions.

  • There are these things that just can call themselves.

  • But the backwards thing is a concrete function.

  • So it also has the [? diff ?] code,

  • which means you get to differentiate

  • the gradient of a gradient of a gradient, of a gradient,

  • of a gradient, of a gradient of a function, because it recurses

  • in that direction.

  • So we get a closed system for automatic differentiation,

  • which is really important, because as machine learning

  • research moves forward, generally,

  • more and more algorithms end up relying

  • on limited forms of higher order [INAUDIBLE]..

  • One thing that you might think about right

  • now is that a particularly important feature

  • to a lot of people who rely on reverse model to diff

  • is to not keep all the intermediate state alive

  • in between the forward and backward pass.

  • And you could make this a feature of tf.function,

  • but I think it should be implemented separately, built

  • on top of tf.custom gradients since this is just

  • another customization of the differentiation code.

  • And it's very easy to do this generically.

  • No, it does not depend on tf.function.

  • And this is being added to the v2 API

  • now, even though it exists in [INAUDIBLE] for a while.

  • So the rematerialize recompute gradients thingy

  • is not going to go away.

  • But it's also completely orthogonal [? to ?]

  • tf.function, which makes me a little happier,

  • because smaller pieces that compose tend to be nice.

  • And here, on top of this differentiation code,

  • we have the code that goes from an abstract function

  • to a concrete function, which is the thing that

  • does the function cache.

  • And here we start getting into a little bit of cruft,

  • because for a while, the tf function code base was

  • in Contrib Eager .

  • [INAUDIBLE]

  • And so it had this class function.function that did

  • the--

  • going from the cache key.

  • And the difference between the Contrib Eager [INAUDIBLE]

  • and the tf.function we have today,

  • is that we have fixed the story around variables

  • to be much more well behaved.

  • So if you look in our code base, you see the function of Py file

  • has a function class which does the cache key logic.

  • And the cache key is mostly implemented in C now.

  • There's this CAPI called TFE Pi Encode

  • Arg that takes a bunch of arguments,

  • replaces tensor with TensorSpecs,

  • and uses this to form additional key.

  • We did this in C because a little faster than doing it

  • in Python, and gives us a little more control

  • over how we handle lists and dictionaries

  • and things like that.

  • If you're interested, that's where you'd want to see it.

  • And finally, the last bit of the pile of complexity

  • that we've been looking at, is how we do the variable lifting

  • and initialization.

  • And this is handled in another class,

  • confusingly and also named function,

  • but in the different file called DefFunction.function.

  • Internally, it calls the other function class.

  • So at least we have proper layering.

  • But we do need to clean up the naming

  • of these things a little bit.

  • And the variable lifting is a little tricky.

  • So I'll give you a quick walkthrough of how it works.

  • First thing is that we define our own type

  • of variable, which is a subclass of the normal tf variable.

  • And the idea is that when you create this variable,

  • it inserts in the graph a control flow conditional, where

  • if it's not initialized, it initializes the variable.

  • This way, because the graph at table creation time

  • has this conditional, by the time

  • you get to use the variable, you've already passed this.

  • So if you run this graph, you're guarantee to only initialize

  • this variable once.

  • And now I see its value initialized.

  • But this is a little sad, because if you've

  • used TensorFlow for a while, you know that [INAUDIBLE]

  • is somewhat expensive.

  • So what we do is we try to--

  • the thing that I said earlier, we trace your code twice.

  • First, we trace of a thing that captures all the variables that

  • are created.

  • These are variable capturing scopes.

  • I went over them a little bit on last week video.

  • And the second time we trace, we trace of this scope

  • where if you try to create a variable, we raise an error.

  • And this is how we control the policy for only letting

  • you create variables once.

  • And now that we have these two versions of the function, one

  • that we call Stateful, one that we call Stateless FN,

  • we can put a cond in there where if all the variables are

  • initialized, we can call the one that does not

  • do any of the complicated computation,

  • while if any of the variables is not initialized,

  • we have to, for safety, call the function that has

  • all the complicated bits.

  • But now that we've built this monstrous graph that

  • has your function inside of it twice, one with conditionals,

  • one with [INAUDIBLE] conditionals,

  • the whole thing inside a conditional, ideally,

  • we would never want to execute this,

  • because conditionals again, are expensive.

  • And they have this particular property

  • where you pay for the nodes that don't

  • get executed in the current version of the TensorFlow

  • runtime.

  • So what we do on top of that is we

  • try to lift the initialization code out.

  • So we look at every variable.

  • And we call this lift to graph thingy,

  • where we try to copy into a separate graph

  • all the initializers of all the variables.

  • And this copy is set up in a way that

  • erases an exception we control if we ever

  • find a variable whose initializer depends

  • on the value of another variable,

  • or depends on a function argument,

  • or depends on something that we can't cleanly isolate.

  • So for the common case, where your variables are all

  • independent of each other, we don't actually

  • run any of those complicated graphs.

  • We just run the simple graph, because we

  • can run this initializer once.

  • It's only if we cannot live the initialization graph,

  • that stuff breaks.

  • And this lift to graph thing is actually a pretty nice

  • internal TensorFlow library that you

  • can use to manipulate graphs.

  • You give it some tensors, a graph, some sources.

  • And it will walk the grapher back, and copy all the things

  • that you need from that tensor to your target graph,

  • and return a map from every source

  • the target tensor of the copies it did.

  • So you can use this map to run things in a target graph

  • as if they were in the source graph.

  • So this is mostly it about the Python level runtime for code

  • for tf.function.

  • But I'm not going to talk about the TensorFlow runtime today.

  • This is for another meeting, because we're mostly

  • running out of time.

  • So any remaining questions before we go away?

  • AUDIENCE: One question on performance.

  • Let's say we have tf graph.

  • It's got multiple [INAUDIBLE] matches so I

  • can run different subgraphs.

  • They have a lot of overlapping.

  • Now if I compare that to a set of independent functions,

  • things like the size of the whole model will go up.

  • [INAUDIBLE]

  • ALEX PASSOS: So in principle, the size of the whole model

  • would go up.

  • But if you were to convert this to [INAUDIBLE] tf.function,

  • hopefully you'd convert this to functions that call each other.

  • And how with normal programming language code,

  • as soon as they have functions that call each other,

  • you can control the complexity of your code,

  • especially if you have calls to the same function repeated.

  • And you can end up with a smaller side.

  • So as far as the finding goes, in practice, using tf.function

  • often leads to smaller graphs, because you often

  • end up calling the same function multiple times.

  • At execution time, we can inline those functions

  • for performance.

  • I think right now we essentially always

  • in line will form tf.functions.

  • So if you are really calling like,

  • the power set of the nodes in your graph,

  • then you would see an explosion on the things.

  • But in most of the models I've seen, we can avoid this.

  • As far as I can tell, the performance overhead

  • for using tf.function now comes from the fact

  • that if you're creating variables,

  • you need to trace your code at least twice

  • and generate all those extra conditionals.

  • And this is something that I think

  • with a little more engineering, we

  • can make it only happen when it's strictly necessary instead

  • of always happening, and the back off

  • optimizations being optional.

  • If we have no more questions, then I think we're good.

  • Thank you.

  • [APPLAUSE]

ALEX PASSOS: I'm Alex Passos, and I'm

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

TensorFlow內部:函數,而不是會話 (Inside TensorFlow: Functions, not sessions)

  • 3 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字