機器學習在你的設備上。選項（Google I/O'19 (Machine Learning on Your Device: The Options (Google I/O'19))

字幕列表影片播放

[MUSIC PLAYING]
DANIEL SITUNAYAKE: Hey, everybody.
So my name's Daniel.
LAURENCE MORONEY: And I'm Lawrence.
DANIEL SITUNAYAKE: And we have an awesome talk
for you, this afternoon.
So I've been super excited watching
the keynote, because there's just been so much stuff
that is relevant to what we're going
to be talking about today, which is running machine
learning on devices.
So we've seen the kind of amazing things
that you can do if you're running ML at the edge
and on device.
But there are a ton of options that developers have
for doing this type of stuff.
And it can be a little bit hard to navigate.
So we decided to put this session together
to give you an overview.
We're going to be showing you all the options,
walking you through some code, and showing you
some awesome demos.
So hopefully, you'll enjoy.
So first, we're going to walk through using TensorFlow
to train models and then export saved models, which
you can then convert to deploy on devices.
We're then going to see a number of different ways
that you can deploy to Android and iOS devices.
And finally, we're going to talk about
some new and super exciting hardware devices that you
can use to run your models.
So first of all, I want to give you
an overview of some different technologies and the device
types they each allow you to support.
So to begin with, we have ML Kit,
which is designed to make it super easy to deploy
ML inside of mobile apps.
We then have TensorFlow.js, which basically
lets you target any device that has a JavaScript interpreter,
whether that's in browser or through Node.js.
So that even supports embedded platforms.
And finally, TensorFlow Lite gives you
high performance inference across any device or embedded
platform, all the way from mobile phones
to microcontrollers.
So before we get any further, let's
talk a little bit about TensorFlow itself.
So TensorFlow is Google's tool chain for absolutely everything
to do with machine learning.
And as you can see, there are TensorFlow tools
for basically every part of the ML workflow,
from loading data through to building models
and then deploying them to devices and servers.
So for this section of the talk, we're
going to focus on building a model with TensorFlow
and then deploying it as a TensorFlow Lite model.
There are actually tons of ways to get up
and running with TensorFlow on device.
So the quickest way is to try out our demo apps and sample
code.
And we also have a big library of pretrained models
that you can drop into your apps that are ready to use.
You can also take these and retrain them
based on your own data using transfer learning.
You can, as you've seen this morning,
use Federated Learning to train models
based on distributed data across a pool of devices.
And you can finally build models from scratch,
which is what Laurence is now going to show off.
LAURENCE MORONEY: Thank you, Daniel.
Quick question for everybody.
How many of you have ever built a machine learn model?
Oh, wow.
DANIEL SITUNAYAKE: Wow.
LAURENCE MORONEY: Oh, wow.
Big round of applause.
So hopefully, this isn't too basic for you,
what I'm going to be showing.
But I want to show just the process
of building a model and some of the stops
that you can then do to prepare that model
to run on the mobile devices that Daniel was talking about.
Can we switch to the laptop, please?
Can folks at the back read that code?
Just wave your hands if you can.
OK, good.
Wave them like this if you need it bigger.
OK, some do, or you just want to stretch.
Let's see.
How's that?
OK, cool.
So I'm just going to show some very basic TensorFlow
code, here.
And I wanted to show the simplest
possible neural network that I could.
So for those of you who've never built something in machine
learning or have never built a machine learn model,
the idea is like with a neural network,
you can do some basic pattern matching
from inputs to outputs.
We're at Google I/O, so I'm going
to talk about inputs and outputs a lot.
And in this case, I'm creating the simplest
possible neural network I can.
And this is a neural network with a single layer
and a single neuron in that neural network.
And that's this line of code, right here.
The Keras.layers.Dense units equal 1.
Input shape equals 1.
And I'm going to then train this neural network on some data,
and that's what you can see in the line
underneath-- the Xs and the Ys.
Now, there is a relationship between these data points.
Can anybody guess what that relationship is?
There's a clue in the 0 and 32.
AUDIENCE: Temperature.
LAURENCE MORONEY: Yeah, a temperature conversion, right?
So the idea is I could write code that's like 9 over 5
plus whatever, plus 32.
But I want to do it as a machine learn model, just
to give as an example.
So in this case, I'm going to create this model.
And with this model, I'm just training it
with six pairs of data.
And then what it will do is it will start then
trying to infer the relationship between these data points.
And then from that, going forward,
it's a super simple model to be able to do a temperature
conversion.
So how it's going to work is it's going to make a guess.
And this is how machine learning actually works.
It just makes a wild guess as to what the relationship
between these data is.
And then it's got something called a loss function.
And what that loss function is going to do
is it's going to see how good or how bad that guess actually is.
And then based on the data from the guess
and the data from the loss function,
it then has an optimizer, which is this.
And what the optimizer does is it creates another guess,
and then it will measure that guess to see how well
or how badly it did.
It will create another guess, and it will measure that,
and so on, and so on, until I ask it to stop
or until it does it 500 times, which is what this line of code
is actually doing.
So if I'm going to create this model quite simply,
we'll see it's going to train.
There was one that I created earlier,
so my workbook's taking a moment to get running.
My network connection's gone down.
Hang on.
Let me refresh and reload.
You love it when you dry run a demo, and it works great.
We get that warning.
I'll run that.
I'll do that.
And now, it starts training, hopefully.
There we go.
It's starting to train, now.
It's going through all these epochs.
So it's going to do that 500 times.
And then at the end of the 500 times,
it's going to have this trained model.
And then this trained model, I'm just
going to ask it to predict.
So for example, if I give it 100 degrees centigrade,
what's that going to be in Fahrenheit?
The real answer is 212, but it's going
to give me 211 and something, because this
isn't very accurate.
Because I've only trained it on six points of data.
So if you think about it, there's a linear relationship
between the data from Fahrenheit to centigrade on those six,
but the computer doesn't know that.
It doesn't go linearly forever.
It could go like this, or it could change.
So it's giving me a very high probability
that for 100 degrees centigrade, it would be 212 Fahrenheit.
And that comes out as 211 degrees centigrade.
So I've just built a model.
And what we're going to take a look at next
is, how do I get that model to work on mobile?
Can we switch back to the slides, please?
So the process is pretty simple.
The idea is like using Keras or using
an estimator in TensorFlow.
You build a model.
You then save that model out in a file format
called SavedModel.
And in TensorFlow 2, we're standardizing on that file
format to make it easier for us to be able to go across
different types of runtimes, like JavaScript,
TFX in the web, or TensorFlow Lite.
By the way, the QR code on this slide is to the workbook
that I just showed a moment ago.
So if you want to experiment with that workbook
for yourself, if you're just learning, please go ahead
and do so.
It's a public URL, so feel free to have fun with it.
And I put a whole bunch of QR codes
in the rest of the slides.
Now, once you've done that, in TensorFlow Lite,
there's something called the TensorFlow Lite Converter.
And that will convert our SavedModel
into a TensorFlow Lite model.
So the process of converting means
it's going to shrink the model.
It's going to optimize the model for running on small devices,
for running on devices where battery life is
a concern, and things like that.
So out of that process, I get a TensorFlow Lite model,
which I can then run on different devices.
And here's the code to actually do that.
So we've got a little bit of a breaking change
between TensorFlow 1 and TensorFlow 2.
So in the workbook that was on that QR code,
I've put both pieces of code on how to create the SavedModel.
And then once you've done that, the third line from the bottom
here is the TF Lite Converter.
And all you have to do is say here's
the SavedModel directory.
Run the TF Lite Converter from SavedModel in that directory,
and it will generate a .tflite file for me.
And that .tflite file is what I can then use on mobile.
So let's take a look at that in action,
if we can switch back to the laptop.
So all I'm going to do within the same workbook
is I'm going to run that code that I just saw.
And I'm using TensorFlow 1.x and Colab here.
And we shall see that it actually
has saved out a model for me in this directory.
And I need that in the next piece of code,
because I have to tell it the directory that it got saved to.
So I'll just paste that in, and then I'll run this out.
And we can see the TF Lite Converter is what
will do the conversion for us.
So if I run that, it gives me the number 612.
Can anybody guess why it gives me the number 612?
It's not an HTTP code.
I thought it was that, at first, too.
That's actually just the size of the model.
So the model that I just trained off those six pieces of data,
when that got compiled down, it's a 612 byte model.
So if I go in there, you can see I saved it
in /tmp/model.tflite.
And if in my Colab, if I go and I look at /tmp directory,
we'll see model.tflite is there.
And I could download that then to start using it
in my mobile apps if I like.
Can we switch back to the slides, please?
So now, we have the model.
We've trained the model.
Obviously, the models you're going to train
are hopefully a little bit more complicated than the one
that I did.
You've been able to convert that model to TF Lite.
And now, what can you do with that model,
particularly on mobile?
Well, there's three sets of options that I want to cover.
The first one, if you were at the developer keynote,
you probably saw ML Kit.
And ML Kit is super cool.
For me, in particular, it uses the Firebase programming API.
Any Firebase fans, here?
Yeah, woo!
The Firebase API for programming,
I find particularly cool.
It's got a really nice asynchronous API.
And when you think about it, when I'm using a model,
I'm going to be passing data to the model.
The model is going to run some inference,
and it's going to send something back to me.
So it's perfect for Firebase, and it's
perfect for that asynchronous API that Firebase gives us.
If we don't want to use Firebase-- and remember,
Firebase ships with a bunch of models that work out
of the box for vision detection and some of the AutoML
stuff that we saw today.
But you can also ship your custom TF Lite model
into Firebase if you want.
But if you don't want to use Firebase, or maybe your model
is going to be deployed in a country where Firebase isn't
supported, or you want it to work completely offline,
and things like that, then the idea is you can still deploy
a model directly to your app.
And I'm going to show a TensorFlow Lite for that,
and getting low level, and using TensorFlow Lite directly
instead of going through the ML Kit wrapper.
And then finally, there's the mobile browser.
So you can actually deploy a model.
You can convert it to JSON, and you
can deploy it to run, actually, in a mobile browser, which
I find pretty cool.
So first, let's take a look at ML Kit.
So ML Kit is Google's solution for Firebase developers
and for mobile developers who want
to have machine learning models running in their applications.
Any ML Kit users here, out of interest?
Oh, not many.
Wow.
Well, you're in for a treat if you haven't used it yet.
Go check out the Firebase booth, the Firebase sandbox.
They got some really cool stuff that you can play with.
But just to show how it works, the idea
is that in the Firebase console, you can either
pick one of the preexisting models that Firebase gives you,
or you can upload the model that you just created.
So in Firebase, you've got the option
to say a custom model I've uploaded-- here,
you can see one that I did a couple of weeks ago
of this model that I uploaded.
It's now in the Firebase console,
and I can use it within my Firebase app.
And I can use it alongside a lot of the other Firebase goodies
like analytics.
Or a really cool one is A/B testing.
So I can have two versions of my model.
I could A/B test to see which works best.
Those kind of services are available to Firebase
developers, and when integrated with machine learning,
I find that makes it pretty cool.
And then once I've done that, now,
when I start building my application,
I do get all of the goodness of the Firebase programming API.
So if this is on Android, the idea is with TensorFlow Lite,
there's a TensorFlow Lite runtime
object that we'll often call the interpreter.
And here, you can see, I'm just calling interpreter.run.
I'm passing it my inputs.
So in this case, if it's Fahrenheit to centigrade
conversion, I'm just going to pass it a float.
And then in its onSuccessListener,
it's going to give me a call back when the model has
finished executing.
So it's really nice in the sense that it
can be very asynchronous.
If you have a really big model that
might take a long time to run, instead of you locking up
your UI thread, it's going to be working nice and asynchronously
through ML Kit.
So in my addOnSuccessListener, I'm adding a SuccessListener.
It's going to give me a call back with the results.
And then that result, I can parse
to get my output from the machine learn model.
And it's really as simple as that.
And in this case, I'm passing it in a float.
It's converting the temperature.
It's sending a float back to me.
And that's why my getOutput array is a float array
with a single element in it.
That's one thing if you haven't worked in machine learning
and if you haven't built machine learning models before, one
of the things that you'll encounter a lot
is that when you're passing data in,
you pass data in as tensors.
But when you are mapping those tensors to a high level
programming language, like Java or Kotlin,
you tend to use arrays.
And when it's passing stuff back to you,
it's passing back a tensor.
And again, they tend to map to arrays,
and that's why in the code here, you're seeing arrays.
So iOS.
Any iOS fans, here?
Oh, a few.
Hey, nobody booed.
You said they would boo.
[CHUCKLING]
So in iOS, it also works.
So for example, again, I have my interpreter in iOS.
This is Swift code.
I'll call the .run method on my interpreter.
I'll pass it the inputs, and I will get the outputs back.
And again, in this very simple model,
I'm just getting a single value back.
So it's just my outputs at index 0 I'm going to read.
If you're doing something more complex,
your data in and your data out structures
are going to be a bit more complex than this.
But as Daniel mentioned earlier on,
we have a bunch of sample applications
that you can dissect to take a look at how they actually
do it.
So that's ML Kit.
And that's a rough look at how it
can work with the custom models that you build
and convert to run in TensorFlow Lite.
But let's take a look at the TensorFlow Lite runtime itself.
So now, if I'm building an Android application,
and I've built my model, and I don't
want to depend on an exterior service like the Firebase
service to deploy the model for me,
I want to bundle the model with my app.
And then, however, the user gets the app, via the Play Store
or via other means, the model is a part of that.
Then it's very easy for me to do that.
So that .tflite file that I created earlier on, all I have
to do is put that in my assets folder in Android as an asset,
just like any other-- like any image, or any JPEG,
or any of those kind of things.
It's just an asset.
But the one thing that's really important,
and it's the number one bug that most people will
hit when they first start doing this,
is that when Android deploys your app to the device
to run it, it will zip up.
It will compress everything in the Assets folder.
The model will not work if it is compressed.
It has to be uncompressed.
So when you build out Gradle, you just specify aaptOptions.
You say noCompress "tflite", and then it
won't compress the tflite file for you.
And then you'll be able to run it and do inference.
So many times, I've worked with people building their first TF
Lite application, it failed unloading it
into the interpreter.
They had no idea why.
And it's they've forgotten to put this line.
So if you only take one thing away from this talk,
take this slide away, because it will
solve a lot of your problems when
you get started with TF Lite.
Then of course, still in build.gradle,
all you have to do is, in your dependencies,
you add the implementation of the TensorFlow-lite runtime.
And what that's going to do is it's
going to give you the latest version of TensorFlow Lite,
and then that will give you the interpreter that you
can use it.
And this QR code is a link.
I've put the full app that I'm going
to show in a moment on GitHub, so you can go and have a play
and hack around with it if you like.
So now, if I want to actually do inference--
so this is Kotlin code.
And there's a few things to take a look at here
on how you'll do inference and how you'll actually
be able to get your model up and running to begin with.
So first of all, there are two things that I'm declaring here.
Remember earlier, we tend to use the term interpreter
for the TF Lite runtime.
So I'm creating a TF Lite object, which
I'm going to call interpreter.
Sorry, I'm going to create an interpreter object, which
I'm going to call TF Lite.
And then I'm going to create a MappedByteBuffer object, which
is the TF Lite model.
Now earlier, remember, I said you put the TF Lite
model into your Assets folder.
How you read it out of the Assets folder
is as a MappedByteBuffer.
I'm not going to show the code for that in the slides,
but it's available on the download,
if you want have a look at it for yourself.
And then you're also going to need a TF Lite Options object.
And that TF Lite Options object is
used to set things like the number of threads
that you wanted to execute on.
So now, to instantiate your model
so that you can start using it, it's as easy as this.
So first of all, I'm going to call a loadModelfile function.
That loadModelfile function is what
reads the TF Lite model out of the Assets folder
as a MappedByteBuffer.
And it gives me my MappedByteBuffer
called tflitemodel.
In my options, I'm going to say, for example,
I just want this to run on one thread.
And then when I instantiate my interpreter like this,
by giving it that MappedByteBuffer of the model
and giving it the options, I now have an interpreter
that I can run inference on in Android itself.
And what does the inference look like?
It will look something like this.
So remember earlier, when I mentioned
a neural network takes in a number of inputs as tensors,
it gives you a number of outputs as tensors.
Those tensors, in a higher level language like Kotlin or Java
or Swift, will map to arrays.
So even though I'm feeding in a single float,
I have to feed that in as an array.
So that's why here, my input value
is a float array with a single value in it.
So if I want to convert 100, for example,
that's going to be a float array with a single value containing
100.
And that F is for float, not for Fahrenheit.
When I was rehearsing these slides before,
somebody was like, oh, how'd you put Fahrenheit
into code like that.
But it's a float.
It's not Fahrenheit.
And then when I'm reading, we have
to get down a little low level here,
because the model's going to send me out a stream of bytes.
I know that those bytes map to a float, but Kotlin doesn't.
Java doesn't.
So those stream of bytes, I know they're mapping to a float.
And a float has 4 bytes, so I'm going to create a byte buffer.
I'm going to allocate 4 bytes to that byte buffer,
and I'm just going to set its order to be native order.
Because there's different orders like
big-endian and little-endian.
But when you're using TF Lite, always just use native order.
And then to do my inference, I call tflite.run.
I give it my input value.
I give it my output value.
It'll read from the input value.
It'll write to the output file.
And then on the output file, if I want to get my prediction,
it's written those 4 bytes.
I have to rewind them, and then I'm going to read a float.
And what Kotlin will do is say, OK, I'm
taking those 4 bytes out of that buffer.
And I'm going to give you back a float from them.
So that's how I would do an inference.
It seems very complex for a very simple task, like float in,
float out, but the structure is the same regardless
of how complex your input is and how complex your output is.
So while this might seem to be the 20 pound hammer for the one
pound nail, it's also the same hammer
when you have a 20 pound nail.
So that was Android, iOS is very similar.
So in iOS, all I have to do is I put my model in my application.
So I just put my TF Lite model.
It's an asset like any other.
And then in code, first of all, I create a pod.
And in my pod file, I'll have a pod for TensorFlow Lite.
I've spoken at I/O for the last five years,
and this is my first time ever showing C++ code.
I'm kind of geeking out a little bit.
Right now, it supports objective C++.
We do have a Swift wrapper in some of our sample
applications, but the Swift wrapper, right now,
only works in a few scenarios.
We're working at generalizing that.
So for now, I'm just going to show C++ code.
Any C++ fans, here?
Oh, wow.
More than I thought.
Nice.
So then your C++ code, it's exactly the same model as I was
just showing.
So first of all, I'm going to create an interpreter,
and I'm going to call that interpreter.
And now, I'm going to create two buffers.
So these are buffers of unsigned ints.
One buffer is my input buffer that I call ibuffer.
The other buffer is my output buffer that I call obuffer.
And both of these, the interpreter,
I'm just going to say, hey, use a typed_tensor for these.
So that's my input.
That's my output.
And when I tflite.run, it's going to read from the input,
write to the output.
Now, I have an output buffer, and I can just get my inference
back from that output buffer.
So that was a quick tour of TF Lite, how
you can build your model, save it as a TF Lite,
and I forgot to show--
oh, no.
I did show, sorry, where you can actually download as a TF Lite.
But I can demo it now running an Android.
So if we can switch back to the laptop?
So I'm going to go to Android Studio.
And I've tried to make the font big enough.
We can all see the font.
And let me just scroll that down a little bit.
So this was the Kotlin code that I showed a moment ago,
and the simplest possible application that I could build
is this one.
So it's a simple Android application.
It's got one button on it.
That button says do inference.
When you push that, it's hard code.
It'll pass 100 to the model and get back
the response for the model.
I have it in debug mode.
I have some breakpoints set.
So let's take a look at what happens.
So once I click Do Inference, I'm hitting this breakpoint now
in Android Studio.
And I've set up my input inputVal,
and we can see my inputVal is containing just 100.
And if I step over, my outputVal has been set up.
It's a direct byte buffer.
Right now, position is 0.
Its capacity is 4.
I'm going to set its order, and then I'm
going to pass the inputVal containing
100, the outputVal, which is my empty 4 byte
buffer, to tflite.run.
Execute that, and the TF Lite interpreter has done its job,
and it's written back to my outputVal.
But I can't read that yet.
Remember earlier, the position was 1.
The limit was 4.
The capacity was 4.
It's written to it, now.
So that buffer is full.
So when I rewind, now, we can see
my position has gone back to 0.
So I know I can start reading from that buffer.
So I'm going to say outputVal.getFloat,
and we'll see the prediction that comes back is that 211.31.
So that model has been wrapped by the TF Lite runtime.
I've given it the input buffer.
I've given that the output buffer.
I've executed it, and it's given me back that code.
And actually, there's one really cool Kotlin language
feature that I want to demonstrate, here.
I don't know if anybody has seen this before.
This might be the first time we've actually
ever shown this on stage.
But if I want to run this model again,
you'll notice that there are line numbers here.
All I have to do is type goto 50.
I'm seeing who's still awake.
Of course, it's gosub 50.
So that's just a very quick and very simple
example of how this would run in Android.
And again, that sample is online.
It's on GitHub.
I've put it on GitHub so that we can have a play with it.
All right, if we can switch back to the slides?
So the third of the options that I had mentioned-- first
was ML Kit.
Second was to TF Lite.
The third of the options was then
to be able to use JavaScript and to be able to run
your model in a browser.
So TensorFlow.js is your friend.
So the idea is that with TensorFlow.js,
in your Python, when you're building the model,
you've PIP installed a library called TensorFlow.js.
And then that gives you a command
called TensorFlow.js converter.
With TensorFlow.js converter, if you'd
saved that as a saved model, as we showed earlier on,
you just say, hey, my input format's a SavedModel.
Here's the directory the SavedModel is in.
Here's the directory I want you to write it to.
So now, once it's done that, it's
actually going to take that SavedModel
and convert that into a JSON object.
So now, in a super, super simple web page-- and this QR code,
again, has that web page--
now, all I have to do is say, here's
the URL of that model.json.
And I will say const model = await tf.loadLayersModel,
giving it that URL.
So if you're using TensorFlow.js in your browser
with that script tag right at the top of the page, now,
that model is loaded from a JSON serialization.
And I can start running inference
on that model in the browser.
So here was how I would use it.
Again, I'm setting up my inputs.
And JavaScript, you know earlier,
I was saying you're going to pass in arrays,
and you get out arrays, except in a high level language?
Sorry, you pass in tensors.
You get out tensors.
High level language tends to wrap them in arrays.
TensorFlow.js actually gives you a tensor 2D object,
and that's what I'm using here.
So the tensor 2D object takes two parameters.
The first parameter is the array that you want to pass in,
and you can see here that array is just the value 10.
It's a single item array.
And then the second parameter is the shape of that array.
So here, the first parameter is a 10.
The second parameter is 1, 1.
And that's the shape of that array.
It's just a 1 by 1 array.
So once I've done that, and I have my input, now,
if I want to run inference using the model, all I have to do
is say model.predictinput.
And it will give me back my results.
In this case, I was alerting the results.
But in my demo, I'm actually going to write it.
So if we can switch back to the demo box?
And I have that super simple web page hosted on the web.
And I've put that model in there, and it's going to run.
This is a slightly different model.
I'll show training that model in a moment.
This was just the model where y equals 2x minus 1.
So I'm doing an inference where x equals 10.
And if x equals 10, y equals 2x minus 1 will give you 19.
And when I train the model on six items of data,
it says 18.97.
So again, all I do is, in Python, I can train the model.
With TensorFlow.js, I can then convert that model
to a JSON object.
And then in TensorFlow.js, I can instantiate a model off
of that JSON and then start doing predictions
in that model.
If we can switch back to the demo machine for a moment?
Oh, no.
I'm still on the demo machine, aren't I?
I can show that in action in a notebook.
I lost my notebook.
So this notebook is also available,
where I gave those QR codes.
And this notebook, again, is a very similar one
to the one I showed earlier on super, super simple neural
network, single layer with a single neuron.
I'm not going to step through all of it now.
But the idea is if you PIP install TensorFlow.js right now
in Google Colab, it will upgrade Google Colab from TensorFlow
1.13 to TensorFlow 2.
So if you run through this and install that,
you'll see that happening.
And then once you have TensorFlow 2 on your machine,
then you can use the TensorFlow.js converter,
as shown here, giving it the input format
and giving it the SavedModel from the directory
as I'd done earlier on.
And it will write out to /temp/linear.
The one thing to take note of, though,
if you are doing this yourself is that when it writes to that,
it won't just write the JSON file.
It also writes a binary file.
So when you upload the JSON file to the web server--
to be able to create a model off of that JSON file, make sure
the binary file is in the same directory as the JSON file.
Or the model off of the JSON file
is going to give you some really weird results.
That's also the number one bug that I found when people
have been using TensorFlow.js.
It's that they will convert to the JSON file.
They'll upload it to their server.
They'll have no idea what that random binary file was,
and they're getting indeterminate results back
from that model.
So make sure when you do that, you get that model.
I don't know if I have one here that I prepared earlier
that I can show you what it looks like.
I don't.
It's empty, right now.
But when you run this and you write it out,
you'll see that the model.json and a binary file are there.
Make sure you upload both of them to use it.
Can we switch back to the slides, please?
So that was a quick summary, where
we saw that model that you build using Python can be converted
to TensorFlow Lite.
You can save it as a SavedModel.
You can convert it to TensorFlow Lite and use it in ML Kit,
or use it directly in TensorFlow Lite itself.
Alternatively, if you want to convert it to .js,
then it will save it out as a JSON file.
You can convert it to a JSON file
and then use that in JavaScript.
So that's the summary of being able to use
those models on mobile devices.
But now, Daniel is going to tell us
all about going beyond phones and the web.
So thank you, Daniel.
Thank you.
DANIEL SITUNAYAKE: Thank you, Laurence.
[APPLAUSE]
Awesome.
So like Lawrence says so far, we've talked about phones.
But these aren't the only devices that we use every day.
So I'm going to talk about some new tools
that our team has designed to help developers use machine
learning everywhere.
So our homes and cities are filled
with devices that contain embedded computing power.
And in fact, every year, literally billions of devices
are manufactured that contain small but highly capable
computation devices called microcontrollers.
So microcontrollers are at the core
of most of our digital gadgets, everything
from the buttons on your microwave
through to the electronics controlling your car.
And our team started to ask, what
if developers could deploy machine learning
to all of these objects?
So at the TensorFlow Dev Summit, we
announced an experimental interpreter
that will run TensorFlow models on microcontrollers.
So this is actually a new frontier for AI.
We have super cheap hardware with super huge battery life
and no need for an internet connection,
because we're doing offline inference.
So this enables some incredible potential applications,
where AI can become truly personal while still preserving
privacy.
We want to make it ridiculously easy for developers
to build these new types of products,
so we've actually worked with SparkFun
to design a microcontroller development
board that you can buy today.
It's called the SparkFun Edge, and it's
powered by an ultra efficient ARM processor that's packed
with sensors and I/O ports.
So you can use it to prototype embedded machine learning code.
And we have example code available
that shows how you can run speech recognition
in a model that takes up less than 20 kilobytes of memory,
which is crazy.
So I'm now going to give you a quick demo of the device,
and I'll show you what some of this code
looks like for running inference.
So you should remember before we do this, all of this
is all available on our website, along with documentation
and tutorials.
And the really cool thing, while you're here at I/O,
you should head over to the Codelabs area.
And you can try hands on development
with the SparkFun Edge boards.
So let's switch over to the camera, here.
LAURENCE MORONEY: I think that actual image was bigger
than 20 kilobytes.
DANIEL SITUNAYAKE: Yeah, definitely.
It's kind of mind-blowing that you
can fit a speech model into such a small amount of memory.
So this is the device itself.
So it's just a little dev board.
I'm going to slide the battery in.
So the program we have here, basically,
is just running inference.
And every second of audio that comes in,
it's running through a little model that looks
for a couple of hot words.
You can see this light is flashing.
It flashes once every time inference is run.
So we're getting a pretty decent frame right,
even though it's a tiny, low powered microcontroller
with a coin cell battery.
So what I'm going to do now is take my life in my hands
and try and get it to trigger with the hot words.
And hopefully, you'll see some lights flash.
Yes, yes, yes.
First time, not lucky.
Yes, yes, yes.
Yes, yes, yes.
So it's not working so great when we've got the AC going,
but you saw the lights lighting up there.
And I basically got a really simple program
that looks at the confidence score
that we get from the model that the word yes was detected.
And the higher the confidence, the more lights appear.
So we got three lights there.
So it's pretty good.
Let's have a look at the code.
So if we can go back to the slides?
So all we do, basically, to make this work is
we've got our model, which is just a plain old TensorFlow
Lite model that you trained however you wanted to
with the rest of our TensorFlow tool chain.
And we have this model available as an array
of bytes within our app.
We're going to pull in some objects
that we're going to use to run the interpreters.
So first of all, we create a resolver,
which is able to pull in the TensorFlow ops
that we need to run the model.
We then create some memory that is allocated
for some of the working processes that
are going to happen as we input data
and run some of the operations.
And then we build and interpret the object,
which we pass all this stuff into,
that is actually going to execute the model for us.
So the next thing we do is basically
generate some features that we're
going to pass into the model.
So we have some code not pictured
here, which takes audio from the microphones that
are on the board and transforms that into a spectrogram
that we then feed into the model.
Once we have done that, we invoke the model,
and we get an output.
So the output is just another tensor,
and we can look through that tensor
to find which of our classes were able to be matched.
And hopefully, in this case, it was
the yes that showed up as the highest probability.
So all of this code is available online.
We have documentation that walks you through it.
And like I said, the device is available here, in I/O,
in the Codelabs labs area, if you'd like to try it yourself.
So tiny computers are great, but sometimes, you just
need more power.
So imagine you have a manufacturing plant that
is using computer vision to spot faulty parts on a fast moving
production line.
So we recently announced the Coral platform,
which provides hardware for accelerated
inference at the Edge.
So these are small devices still,
but they use something called the Edge
TPU to run machine learning models incredibly fast.
So one of our development boards here
can run image classification on several simultaneous video
streams at 60 frames per second.
So it's super awesome.
We have these devices available to check out
in the Codelabs area, as well.
And in addition, in the ML and AI sandbox,
there's a demo of showing a use case,
spotting faulty parts in manufacturing.
So once again, it's super easy to run TensorFlow Lite models
on Coral devices.
And this example shows how you can load a model,
grab camera input, run inference, and annotate
an output image in just a few lines of code.
So all of this, again, is available
online on the Coral site.
So we've shown a ton of exciting stuff today, and all of it
is available on TensorFlow.org and the Coral site, right now.
So you'll be able to find example code, example apps,
pretrained models, and everything
you need to get started with deploying to device.
And I've got some links up here for you.
But while you're here at I/O, there
are a ton of other opportunities to play with on device ML.
So we have a couple of sessions that I'd like to call out,
here.
We have the TensorFlow Lite official talk, tomorrow,
which is going to go into a lot more depth around TensorFlow
Lite and the tools we have available for on device
inference and converting models.
And we also have a talk on What's
New in Android ML, which is this evening at 6:00 PM.
So you should definitely check both of those out.
And in the Codelabs area, we have a load of content.
So if you're just learning TensorFlow,
we have a six part course you can take to basically go end
to end from nothing to knowing what you're talking about.
And then we have a couple of Codelabs
you can use for on device ML, and I
think there's a Coral codelab, as well.
So thank you so much for showing up.
And I hope this has been exciting,
and you've got a glimpse of how you can do on device ML.
Like you saw in the keynote, there's
some amazing applications, and it's up to you
to build this amazing, new future.
So thank you so much for being here.
[MUSIC PLAYING]