Placeholder Image

字幕列表 影片播放

  • ALEX PASSOS: I'm Alex Passos, and I'm

  • here again to talk about functions not sessions.

  • This function is the new way of using graphs

  • in TensorFlow in TF2.

  • All the material I'm going to cover here,

  • the design and the motivation, is mostly

  • described in one of the RFCs in the TensorFlow community GitHub

  • repo.

  • So if you go to GitHub.com/tenso rflow/communities/rfcs,

  • you will see on our see an RFC with exactly this title

  • in there where we go for a bunch of the motivation and a bunch

  • of the high-level design.

  • So here I'm mostly going to focus

  • on some nitty gritty details of the motivation

  • and more details about the implementation.

  • And things that if you're working on TensorFlow

  • and you're using functions to do something

  • or you're curious about function internals,

  • I hope to at least point you to the right

  • places to start reading the code to understand what's happening.

  • I'm mostly going to focus today on the high-level Python

  • side of things.

  • And there's another training session later.

  • I think the title's going to be eager execution runtime.

  • That's going to focus more on the C++ side of stuff.

  • So I think to understand functions,

  • it helps if you understand where we're coming from,

  • which is the session about run world in TensorFlow one.

  • And I think in TF1, when TF was originally designed,

  • it was designed as a C++ runtime first and only later came

  • a Python API.

  • And as far as a C++ runtime goes,

  • the API of graphs and sessions is pretty reasonable.

  • So you build a graph by some function

  • that the runtime does not care about,

  • and then you connect to the runtime

  • by opening this session.

  • This connection is important because a runtime can

  • be local, can be distributed.

  • There are all sorts of in between things.

  • And to actually run computation, you just call session at run.

  • Because you have a graph, you give it

  • the names of your inputs, the names of your outputs,

  • the names of particular nodes that you want to run.

  • And the runtime will go do its thing and return to you

  • the results in C++ normal arrays that you can use to manipulate

  • your data.

  • So this is very nice and convenient if you're writing

  • in C++ and if you're programming at this level.

  • You generally write the code that looks like this once,

  • and you spend your entire life as a TensorFlow developer

  • writing the little part that I abstracted out

  • called BuildMyGraph.

  • And I think it's an understatement

  • to say that just manually writing protocol buffers

  • is very awkward.

  • So we very, very quickly decided this is not a good way to go

  • and built an API around it.

  • And the first version of the API was very explicit.

  • So you created a graph, and then every time you created an op,

  • you pass the graph as an argument,

  • and this is still fine because it's very explicit

  • that you're building a graph.

  • So you can have this mental model

  • that you're building a graph that then you're

  • going to give to a runtime to execute.

  • This is not really idiomatic Python code.

  • So it's also very easy to see how to make

  • this idiomatic Python code.

  • You just stick the graph in a global context manager

  • and add a bunch of operator overloads and things like that.

  • And you end up with code that looks

  • like what TensorFlow code looks like today, which

  • is a little unfortunate, because the same code, by reading it,

  • you can't really tell whether an object is a tensor,

  • and hence, only has a value doing

  • an execution of a session, and is

  • this the third quantity et cetera, et cetera, that

  • has a name that might have some known

  • properties about the shape, but not all.

  • Or if this is just a normal Python object or non-pi array.

  • And this creates a lot of confusion

  • and I think leads to a very unnatural programming model.

  • The session.run thing also has a granularity problem,

  • which is that in the way it was originally built,

  • the graph, like, the stuff that you pass a session of run,

  • is a quantum of all the stuff you want to execute.

  • And around it is this very rigid boundary

  • where you keep stuff in the host memory of your client program,

  • give it to session.run, and then get results back

  • into host memory of your client program.

  • So one example that I think is illustrative of why this is not

  • ideal is if you have a reinforcement learning agent

  • that's implemented over a recurrent neural network,

  • in that scenario, your agent's going to run a loop where it's

  • going to read an observation from your environment, which

  • is some arbitrary code that runs in your host,

  • and has some state.

  • The states initialize at 0.

  • They look at the observation.

  • It runs it for a neural network.

  • And that neural network spits out a new stage and an action

  • for the agent to perform in the environment.

  • You take that action, bring it to client memory,

  • give it to the C++ code for an environment, your Atari game,

  • or whatever.

  • That will run for a while and then give you back a new state.

  • You want to shift this new observation,

  • you want to ship this new observation in the old state

  • back to the RNN.

  • But if your RNN is running on another device, say, a GPU,

  • there was really no reason for you to ship your RNN state back

  • to your client and then from the client back to the device.

  • So the boundary for stuff you want to run here

  • is not really working.

  • The boundary for stuff you want to run

  • is not the same as the boundary for stuff

  • that wants to live in on a device

  • or wants to live in the host.

  • And this gets even more complicated

  • once you put automatic differentiation into the story,

  • because TensorFlow uses the symbolic representation

  • for your computation that we call a graph.

  • We do automatic differentiation on

  • this symbolic representation.

  • So now the graph not only has to be a quantum for stuff

  • you want to run, but it has to be a quantum for stuff you

  • differentiate.

  • So if you stick to this reinforcement learning agent

  • example, a popular thing that people

  • used to do before we have now substantially better

  • deep reinforcement learning algorithms is policy gradient.

  • And the simplest policy gradient, it's got reinforced.

  • And what it amounts to doing is it will run your agent

  • for your m time steps.

  • You'll get a probability for the agents

  • to take the actions that it actually took.

  • And you'll take the gradient of that probability,

  • multiply by the [? reward ?] your agent got,

  • and apply this to the weights.

  • And now not only do we want to avoid transferring the RNN

  • state back and forth between the host and your accelerator,

  • but also you want to back prop through a number of steps

  • that might not even be known before you

  • start your computation.

  • Another issue is that session.run has a kind

  • of like--

  • it asks for too much information every single time.

  • So what every training loop or inference loop or anything user

  • TensorFlow looks like is not--

  • well, not every, but what most look

  • like is not a single culture session.run,

  • but a bunch of culture session.run in the loop.

  • And in all those calls, you're executing the same tensors,

  • you're fetching the same tensors,

  • and you're feeding the same symbolic tensors slightly

  • different numerical values.

  • And because the session.run API doesn't

  • know that you're going to be calling

  • those things in a loop where most of the arguments-- where

  • some things don't change and some things do change,

  • it has to re-perform a bunch of validation.

  • And so we put a cache in front of that validation.

  • And a cache key becomes a performance problem.

  • Derek had the idea of just separating the stuff

  • the changes from the stuff that doesn't change

  • into this session.makecallable API

  • where you call it once with this stuff that doesn't change,

  • and you get back a function that you

  • call with just the stuff that changes.