TensorFlow Federated（TFF）。去中心化數據的機器學習（TF Dev Summit '19) (TensorFlow Federated (TFF): Machine Learning on Decentralized Data (TF Dev Summit ‘19))

字幕列表影片播放

[MUSIC PLAYING]
KRZYSZTOF OSTROWSKI: All right, TensorFlow Federated--
what's exciting about TFF is that it enables everyone
to experiment with computations on decentralized data.
And decentralized data is exciting,
because it's everywhere.
It's in intelligent home devices,
in sensor networks, in distributed medical databases.
And of course, there's a ton of it
on personal devices like cell phones.
And we love our cell phones.
We want them to be intelligent.
This data could help.
Traditionally, the way we implement intelligence
is on the server.
So here we have a model on the server.
The clients all talk to the server to make predictions.
So all the data accumulates on the server as well.
So the model, the data, it's all in one place-- super easy.
The downside of this is that all this back and forth
communication can hurt user experience
due to network latency, lack of connectivity, shortened battery
life.
And of course, there's a ton of data
that would be really useful in implementing intelligence
but that, for various reasons, you may choose not to collect.
So what can we do?
Well, one idea is take all the TensorFlow machinery
and put it on-device.
So here we have each client independently
training its own model using only its own local data.
No communication necessarily-- great.
Well, maybe not so great--
actually, we realize that, very often, there's
just not enough data on each individual device
to learn a good model.
And unlike before, even though there
might be millions of clients, you
can't benefit from the data.
We can mitigate this by pre-training the model
on the server on some proxy data.
But just think of a smart keyboard.
If, today, everyone starts using a new word,
then a smart model trained on yesterday's data
won't be able to pick it up.
So this technique has limitations.
OK, so now what?
Do we just give up?
We have to choose between more intelligence versus more
privacy?
Or can we have both?
Until a few years ago, we didn't know the answer.
It turns out the answer is yes, we can.
In fact, it's very simple.
It goes like this-- you start with the model on the server.
You distribute it to some of the clients.
Now each line trains the model locally using
its own local data--
and that doesn't have to mean training to convergence.
It could be just training a little bit--
produces a new model, locally trained,
and sends it to the server.
And in practice, we would send updates and not models,
but that's an implementation detail.
All right, so now server gets locally trained models
from all the clients.
And now is the crazy part.
We just average them out--
so simple.
So OK, the average model, trivially, it
reflects the training from every client, right?
So it's good.
But how do we know it's a good model, that this procedures is
doing something meaningful?
In fact, you would think it's too simple.
There's just no way-- no way-- this can possibly work.
And you would be correct.
It's not enough to do it once.
You have to earn it.
So we repeat the process.
The combined model becomes the initial model
for the next round.
And so it goes in rounds.
In every round, the combined model
gets a little bit better thanks to the data
from all the clients.
And now hundreds or thousands, many, many rounds later,
your smart keyboard begins to show signs of intelligence.
So this is quite amazing.
It's mind-boggling that something this incredibly
simple can actually work in practice.
And yet it does.
And then it gets even more crazy.
You can do things like compress the update
from each client down to one bit,
or add some random noise to it to implement differential
privacy.
Many extensions are possible.
And it still works.
And you can apply it to other things than learning.
For example, you can use it to compute a statistic
over sensitive data.
So experimenting with all the different things
you can do with federated learning
is actually a lot of fun.
And TFF is here basically just so that everyone
can have fun doing it.
It is open source.
It's inspired by our experiences with federal learning
at Google, but now generalized to non-learning use cases
as well.
We're doing it in the open, in the public.
It's on GitHub.
We just recently started.
So now is actually a great time to jump in and contribute,
because you can have influence on the way this
goes from the early stages.
We want to create an ecosystem, so TFF is all
about composability.
If you're building a new extension,
you should be able to combine it with all of the existing ones.
If you're interfacing a new platform for deployment,
you should be able to deploy all of the existing code to it.
So we've made a number of design decisions
to really promote composability.
And speaking of deployments, in order
to enable flexibility in this regard,
TFF compiles all your code into an abstract representation,
which, today, you can run in a simulator,
but that, in the future, could potentially
run on real devices--
no promises here.
In the first release, we only provide a simulation runtime.
I mentioned that TFF was all about having fun experimenting.
In our past work on federated learning--
and that's before TFF was born--
we've discovered certain things that consistently
get in the way of having fun.
And the worst offender was really all the different types
of logic getting interleaved.
So it's model logic, communication, checkpointing,
differential privacy.
All this stuff gets mixed up, and it gets very confusing.
So in order to avoid this, to preserve the joy of creation
for you, we've designed programming abstractions
that will allow you to write your federated learning
code at a similar level as when you write pseudocode
or draw in a whiteboard.
You'll see an example of this later in the talk.
And I hope that it will work for you.
OK, so what's in the box?
You get two sets of interfaces.
The upper layer allows you to create
a system that can perform federated training
or evaluation using your existing model.
And this sits on top of a layer of lower-level modularity
abstractions that allow you to express
and simulate custom types of computations.
And this layered architecture is designed
to enable a clean separation of concerns
so that developers who specialize in different areas,
whether that be federated learning,
machine learning, compiler theory, or systems integration,
can all independently contribute without stepping
on each other's toes.
OK, federated learning, we've talked about this as an idea.
Now let's look at the code.
We provide interfaces to represent federated data sets
for simulations and a couple of data sets for experiments.
If you have a Keras model, you can wrap it like this with
a one-liner for use with TFF--
very easy.
And now we can use one of the build functions
we provide to construct various kinds
of federated computations.
And those are essentially abstract representations
of systems that can perform various federated tasks.
And I'll explain what that means in a minute.
Training, for instance, is represented
as a pair of computations, one of them that constructs
the initial state of a federated training system,
and the other one that executes a single round
of federated averaging.
And those are still kind of abstract.
But you can also invoke them just like functions in Python.
And when you do, they, by default,
execute in a local simulation runtime.
So this is actually how you can write little experiment loops.
You can do things like pick a different set of clients
in each round and so on.
The state of the system includes the model of the training.
So this is how you can very easily
simulate federated evaluation of your model.
All of this sits on top of FC API, which
is basically a language for constructing distributed
systems.
It is embedded in Python.
So you just write Python code as usual.
It does introduce a couple abstract, new concepts
that are worth explaining.
So maybe let's take a deep dive and look.
All right, first concept--
imagine you have a group of clients again.
Each of them has a temperature sensor
that generates a reading, some floating-point number.
I'm going to refer to the collective of all these sensor
readings as a federated value, a single value.
So you can think of a federated value as a multi-set.
Now in TFF, values like this are first-class citizens,
which means, among other things, that they have types.
The types of those kinds of values consist of the identity
of the group of devices that are hosting the value--
we call that the placement--
and the local type, type of the local data items that are
hosted by each member of the group.
All right, now let's throw the server into the mix.
There's a number on the server.
We can also give it a federated type.
In this case, I'm dropping the curly braces
to indicate that there's actually just one number,
not many.
OK, now let's introduce a distributed aggregation
protocol that runs among these system participants.
So let's say it computes the number on the server based
on all the numbers on the client's.
Now in TFF, we can think of that as a function
even though the inputs and outputs of that
function reside in different places--
the inputs on the clients and the output on the server.
Indeed, we can give it a functional-type signature
that looks like this.
So in TFF, you can think of distributed systems,
or components of distributed systems,
distributed protocols, as functions, simply.
We also provide a library of what
we call federated operators that represent, abstractly,
very common types of building blocks like, in this case,
computing an average among client values
and putting their result in the server.
Now with all this that I've just described,
you can actually draw system diagrams in code, so to speak.
It goes like this-- you declare the federated type that
represents the inputs to your distributed system.
Now you pass it as an argument to a special function decorator
to indicate that, in a system you're constructing,
this is going to be the input.
Now in the body of the decorated function,
you invoke all the different valid operators
to essentially populate your data flow diagram like this.
It works conceptually in very much the same way
as when you construct non-eager TensorFlow graphs.
OK, now let's look at something more complex and more exciting.
So again, we have a group of clients.
They have temperature sensors.
Suppose you want to compute what fraction of your clients
have temperatures exceeding some threshold.
So in this system, in this computation I'm constructing,
there are two inputs.
One is the temperature readings in the clients.
The other output is the threshold in the server.
And again, the inputs can be in different places,
and that's OK.
All right, how do I execute this?
First, we probably want to just broadcast the thresholds
to all the clients.
So that's our first federated operator in action.
Now that each client has both the threshold
and its own local temperature reading,
you can run a little bit of TensorFlow
to compute 1 if it's over the threshold, 0 otherwise.
OK, you can think of this as basically a map
step in MapReduce.
And the result of that is a federated float, yet
another one.
OK, now we have all these ones and zeros.
Actually, the only thing that remains to do
is to perform a distributed aggregation
to compute the average of those ones and zeros
and place the result in the server.
OK, that's the third federated operator in our system.
And that's it.
That's a complete example.
Now let's look at how this example works in the code.
Again, you declare the federated types of your inputs.
You pass them as inputs to the other arguments to the function
decorator.
And now, in the body of the decorated function,
you simply invoke all the federated operators
you need in the proper sequence so the broadcast, the map,
and the average are all there.
And that piece of TensorFlow that
was a parameter to the mapping operator
is expressed using ordinary TensorFlow ops just as normal.
And this is the complete example.
It's working code that you can copy-paste into a code lab
and try it out.
OK, so this example obviously has nothing
to do with federated learning.
However, in tutorials on our website,
you can find examples of fully implemented federated training
and federated evaluation code that look, basically,
just like this.
They also [INAUDIBLE] some variable renaming.
So they also fit on one screen.
So yeah, in TFF, you can express your federated learning logic
very concisely in a way that you can just look at it
and understand what it does.
And it's actually really easy to modify.
Yeah, and I personally--
I feel it's liberating to be able to express
my ideas at this level without getting bogged down in all
the unnecessary detail.
And this empowers me to try and create
an experiment with new things.
And I hope that you will check it out, and try it,
and that you'll feel the same.
And that's all I have.
Everything you've seen is on GitHub.
As I mentioned, there are many ways
to contribute depending on what your interests are.
Thank you very much.
[MUSIC PLAYING]