字幕列表 影片播放 列印英文字幕 ALEX PASSOS: I'm Alex Passos, and I'm here again to talk about functions not sessions. This function is the new way of using graphs in TensorFlow in TF2. All the material I'm going to cover here, the design and the motivation, is mostly described in one of the RFCs in the TensorFlow community GitHub repo. So if you go to GitHub.com/tenso rflow/communities/rfcs, you will see on our see an RFC with exactly this title in there where we go for a bunch of the motivation and a bunch of the high-level design. So here I'm mostly going to focus on some nitty gritty details of the motivation and more details about the implementation. And things that if you're working on TensorFlow and you're using functions to do something or you're curious about function internals, I hope to at least point you to the right places to start reading the code to understand what's happening. I'm mostly going to focus today on the high-level Python side of things. And there's another training session later. I think the title's going to be eager execution runtime. That's going to focus more on the C++ side of stuff. So I think to understand functions, it helps if you understand where we're coming from, which is the session about run world in TensorFlow one. And I think in TF1, when TF was originally designed, it was designed as a C++ runtime first and only later came a Python API. And as far as a C++ runtime goes, the API of graphs and sessions is pretty reasonable. So you build a graph by some function that the runtime does not care about, and then you connect to the runtime by opening this session. This connection is important because a runtime can be local, can be distributed. There are all sorts of in between things. And to actually run computation, you just call session at run. Because you have a graph, you give it the names of your inputs, the names of your outputs, the names of particular nodes that you want to run. And the runtime will go do its thing and return to you the results in C++ normal arrays that you can use to manipulate your data. So this is very nice and convenient if you're writing in C++ and if you're programming at this level. You generally write the code that looks like this once, and you spend your entire life as a TensorFlow developer writing the little part that I abstracted out called BuildMyGraph. And I think it's an understatement to say that just manually writing protocol buffers is very awkward. So we very, very quickly decided this is not a good way to go and built an API around it. And the first version of the API was very explicit. So you created a graph, and then every time you created an op, you pass the graph as an argument, and this is still fine because it's very explicit that you're building a graph. So you can have this mental model that you're building a graph that then you're going to give to a runtime to execute. This is not really idiomatic Python code. So it's also very easy to see how to make this idiomatic Python code. You just stick the graph in a global context manager and add a bunch of operator overloads and things like that. And you end up with code that looks like what TensorFlow code looks like today, which is a little unfortunate, because the same code, by reading it, you can't really tell whether an object is a tensor, and hence, only has a value doing an execution of a session, and is this the third quantity et cetera, et cetera, that has a name that might have some known properties about the shape, but not all. Or if this is just a normal Python object or non-pi array. And this creates a lot of confusion and I think leads to a very unnatural programming model. The session.run thing also has a granularity problem, which is that in the way it was originally built, the graph, like, the stuff that you pass a session of run, is a quantum of all the stuff you want to execute. And around it is this very rigid boundary where you keep stuff in the host memory of your client program, give it to session.run, and then get results back into host memory of your client program. So one example that I think is illustrative of why this is not ideal is if you have a reinforcement learning agent that's implemented over a recurrent neural network, in that scenario, your agent's going to run a loop where it's going to read an observation from your environment, which is some arbitrary code that runs in your host, and has some state. The states initialize at 0. They look at the observation. It runs it for a neural network. And that neural network spits out a new stage and an action for the agent to perform in the environment. You take that action, bring it to client memory, give it to the C++ code for an environment, your Atari game, or whatever. That will run for a while and then give you back a new state. You want to shift this new observation, you want to ship this new observation in the old state back to the RNN. But if your RNN is running on another device, say, a GPU, there was really no reason for you to ship your RNN state back to your client and then from the client back to the device. So the boundary for stuff you want to run here is not really working. The boundary for stuff you want to run is not the same as the boundary for stuff that wants to live in on a device or wants to live in the host. And this gets even more complicated once you put automatic differentiation into the story, because TensorFlow uses the symbolic representation for your computation that we call a graph. We do automatic differentiation on this symbolic representation. So now the graph not only has to be a quantum for stuff you want to run, but it has to be a quantum for stuff you differentiate. So if you stick to this reinforcement learning agent example, a popular thing that people used to do before we have now substantially better deep reinforcement learning algorithms is policy gradient. And the simplest policy gradient, it's got reinforced. And what it amounts to doing is it will run your agent for your m time steps. You'll get a probability for the agents to take the actions that it actually took. And you'll take the gradient of that probability, multiply by the [? reward ?] your agent got, and apply this to the weights. And now not only do we want to avoid transferring the RNN state back and forth between the host and your accelerator, but also you want to back prop through a number of steps that might not even be known before you start your computation. Another issue is that session.run has a kind of like-- it asks for too much information every single time. So what every training loop or inference loop or anything user TensorFlow looks like is not-- well, not every, but what most look like is not a single culture session.run, but a bunch of culture session.run in the loop. And in all those calls, you're executing the same tensors, you're fetching the same tensors, and you're feeding the same symbolic tensors slightly different numerical values. And because the session.run API doesn't know that you're going to be calling those things in a loop where most of the arguments-- where some things don't change and some things do change, it has to re-perform a bunch of validation. And so we put a cache in front of that validation. And a cache key becomes a performance problem. Derek had the idea of just separating the stuff the changes from the stuff that doesn't change into this session.makecallable API where you call it once with this stuff that doesn't change, and you get back a function that you call with just the stuff that changes.