Placeholder Image

字幕列表 影片播放

  • ANDREW SELLE: So today I'm going to talk about TensorFlow Lite,

  • and I'll give you an intro to what that is.

  • My name's Andrew Selle.

  • I'm a software engineer down in Google in Mountain View land.

  • All right, so introduction.

  • We're going to be talking about machine learning on-device.

  • So how many of you are interested on running

  • your machine learning models on-device?

  • How many already do that?

  • OK, so a lot.

  • So that's great.

  • So you already know that machine learning on-device

  • is important.

  • So I'm going to give you a scenario that's

  • perhaps a little bit out of the ordinary.

  • Suppose I'm going camping.

  • I don't have any power.

  • I don't have any network connectivity.

  • I'm out in the mountains.

  • And I want to detect bears.

  • So I want to hang my cell phone on the outside of my tent

  • so that if I'm in the middle of the night

  • and the bear is coming for me, I'll know.

  • I don't know what to do if that happens,

  • but that's a great example of what you do with on-device ML.

  • So you basically want to have low latency.

  • You don't want to wait for the bear to already be upon you.

  • You want some early warning.

  • You want to make sure it works without a data connection.

  • The data can stay on the device and you have access

  • to the sensors.

  • So there's a lot more kind of practical and important use

  • cases than that.

  • But that kind of sets the stage for it.

  • And so people are, of course, doing a lot more ML on-device.

  • So that's why we started TensorFlow Lite.

  • So in short, TensorFlow Lite is our solution

  • for running on-device machine learning

  • with low latency and a small binary size

  • but on many platforms.

  • So here's an example of what TensorFlow Lite can do.

  • So let's play our video.

  • So, suppose we have these objects

  • and we want to recognize them.

  • Well, I know what they are, but I want my phone

  • to be able to do that.

  • So we use an image classification model.

  • So here we have a marker.

  • In the bottom you see kind of the confidence.

  • So this is an image recognition.

  • It's happening in real time.

  • It can do kind of ordinary office objects,

  • but also important objects like TensorFlow logos.

  • And it also runs on multiple platforms.

  • So here we have it running on an Android phone and an IOS phone.

  • But we're not just limited to phones.

  • We can also do things like this Android Things Toolkit.

  • And there is going to be a couple more later, which

  • I'll show you some more.

  • So now that we've seen it in action,

  • let's talk a little bit more about what TensorFlow Lite is.

  • Do you want to see the video again?

  • I don't know.

  • OK, there we go.

  • The unfortunate thing about on-device

  • though is it's much harder.

  • It has tight memory constraints, you need to be low energy,

  • and you don't have as much computation as you do available

  • on the cloud.

  • So TensorFlow Lite has a few properties

  • that are going to sort of deal with these problems.

  • So the first one is it needs to be portable.

  • You know, normal PCs we run on, but we also

  • run on mobile phones, we run on Raspberry Pi, or other Linux

  • SOCs, IoT type devices.

  • And we also want to go to much smaller devices

  • like microcontrollers.

  • So, next slide.

  • The internet doesn't like me today.

  • Well, we'll skip that slide, whatever that was.

  • But in any case, basically, this portability

  • is achieved through the TensorFlow Lite file format.

  • So once you have a trained TensorFlow model, like you

  • author, you could author it with Swift,

  • you could author it with Python, whatever you do,

  • you produce a saved model in a graph form.

  • The serialized form is then converted to TensorFlow Lite

  • format, which then is your gateway to running on all

  • these different platforms.

  • And we have one more special converter

  • which allows you to go to Core ML

  • if you want to target IOS in this especially way.

  • The second property is that it's optimizable.

  • We have model compression, we have quantization,

  • we have CPU kernel fusion.

  • These are all optimization techniques

  • to ensure that we have the best performance, as well

  • as a small size.

  • And this is achieved through the architecture.

  • So we have a converter, which we already talked about,

  • and then we have an interpreter core.

  • The interpreter core delegates to kernels

  • that know how to do things, just like in TensorFlow.

  • But unlike in TensorFlow these are

  • optimized for mobile and small devices with NEON on ARM.

  • An additional thing that we have that TensorFlow

  • doesn't have is the notion of delegation.

  • So we can delegate to GPUs or Edge TPUs or accelerators.

  • And this basically gives TensorFlow Lite

  • the chance to give part of the graph

  • to a hardware accelerator that can do processing

  • in a special way.

  • So one of those we've talked about before is in an API.

  • We're excited to announce that in 2018 Q4

  • we're looking to release our OpenGL-based GPU

  • delegate, which will give us better performance on GPUs,

  • and will also accelerate things like MobileNet

  • and another vision-based models.

  • So that's really exciting.

  • In addition, at Cloud Next there was an announcement

  • about Edge TPUs.

  • And Edge TPUs are also special as well

  • because they give us the ability to do

  • high performance per watt, and also

  • fit into a small footprint.

  • So for example, the device is there on that penny there,

  • but it's on a development board.

  • So you can put it in many different form factors as well.

  • Then the third property is that it's parametrically sized.

  • So we know the TensorFlow Lite needs

  • to fit on small devices, especially very small devices

  • like MCUs, And there, you might need to only include

  • the ops that you need.

  • So our base interpreter is about 80 kilobytes,

  • and with all built-in ops it's 750 kilobytes.

  • We're moving to a world where you can parameterize

  • what you put into the TensorFlow Lite interpreter,

  • so you can trade off the ability to handle

  • new models that use new ops, and the ability to only ship what

  • you need in your application.

  • So, we introduced TensorFlow Lite last year

  • and we've been asking users what they think of TensorFlow Lite.

  • And they really love the cross platform deployment

  • that they can deploy to IOS, to Android

  • with the same kind of format.

  • They like that they can decouple the distribution of the binary

  • from the distribution of the model.

  • And they like the inference speed increases.

  • And they're really excited about the hardware acceleration

  • roadmap.

  • But the biggest feedback that we got

  • is that we should focus on ease of use, we should add more ops,

  • and we should work on model optimization,

  • and we should provide more documentation.

  • And so we've listened.

  • So what I want to do in this talk

  • is focus on the user experience.

  • But before we do that, let's look

  • at some of the users that are already using it so far.

  • We have a lot of first party users

  • and a lot of third party users that are excited about it.

  • And I hope after this talk you'll be interested as well.

  • So, the user's experience.

  • I'm a new user and I want to use TensorFlow Lite.

  • How do I do it?

  • Well, I think of it as kind of like learning to swim.

  • You can think of two things you might do.

  • You might wade, where you don't really have to swim.

  • But it's really easy to get started

  • and you get to cool off.

  • The second thing is that you can swim

  • where you can do custom models.

  • So we're going to talk about both of those.

  • But before we get into that, there's

  • an easier thing and then a harder thing.

  • So the easier thing is to dip your toes, which are demos.

  • And the harder thing is to just dive full in

  • and have full kind of mastery of the whole water,

  • and that would be optimizing models.

  • And we'll talk about those as well.

  • So as far as using demo apps, you can go to our website,

  • you can download the demos and compile it and run it.

  • It'll give you a flavor of what can be done.

  • I showed you one of those demo apps.

  • You can try it for yourself.

  • The next step is to use a pre trained model.

  • So the demo app uses a pre trained model.

  • So you can use that model in your application.

  • So if you have something that could benefit from say ImageNet

  • style classification, you can just take that model

  • and include it.

  • Another thing that's really useful is retraining.

  • So let me show you a retraining workflow, which

  • is you take a pre trained model and you kind of customize it.

  • Great, so we're running this video.

  • We're showing you the scissors and the Post-It Notes

  • as before, and here's an application

  • that I built on PC that allows me to do retraining.

  • But we're running inference with TensorFlow Lite.

  • So it knows the scissors, it knows the Post-It Notes,

  • but what if we got to a really important object, one

  • that we haven't seen before, perhaps

  • like something everybody has, the a steel TensorFlow logo.

  • How is it going to do on that?

  • Well, not well is the unfortunate thing.

  • But the good thing about machine learning is we can fix it.

  • We just need some examples.

  • So here, this app allows us to collect examples.

  • And you could imagine putting this into a mobile app

  • where you just move your phone around

  • and it captures a bunch of examples.

  • I'm going to do the same thing, except on Linux.

  • It's a little bit more boring, but it does a job.

  • So we want to get a lot of examples

  • to get a lot of generalization.

  • So once we're happy with that, we'll hit the Train button

  • and that's going to do some machine learning.

  • And once it's converged, we're going to have a new model.

  • It's going to convert it to TensorFlow Lite, which

  • is going to be really great.

  • We can test it out.

  • See if it's now detecting this.

  • And indeed it is.

  • That's good.

  • The other cool thing about that is now

  • that we have this TF Lite flat buffer model,

  • we can now run it on our device and it works as well.

  • All right, great.

  • So, now that we've done pre train models and we've done--

  • let's get into full on swimming.

  • There's basically four steps that we need to work on.

  • The first one is building and training

  • the model, which we've already talked about.

  • You could do that with, again, Swift would be a great way

  • to do that.

  • The second one is converting the model.

  • Third one is validating the model.

  • And the fourth is deploying model.

  • Let's dive into them.

  • Well, we're not going to dive yet.

  • We'll swim into them.

  • OK.

  • Build and train the model.

  • So the first thing to do is to get a saved model of your model

  • and then use the converter.

  • This can be invoked in Python.

  • So you could have your training script

  • and you could have the last thing

  • you do is to just always convert it to TensorFlow Lite.

  • I in fact recommend that, because that

  • will allow you to make sure that it's

  • convertible right from the start.

  • So you give it the saved model in,

  • and you provide the TF Lite buffer out.

  • Great.

  • And then when you're done with that, it will convert.

  • Except we don't have all the ops.

  • So sometimes it won't.

  • So sometimes you want to visualize the TensorFlow model.

  • So a lot of models do work, but some of them

  • are going to have missing ops.

  • So as we've said, we've listened to your feedback

  • and to address this we've provided these visualizers

  • so you can understand your models better.

  • They're kind of analogous to TensorBoard.

  • In addition, we've also added 75 built-in ops,

  • and we're announcing a new feature,

  • which will allow us to run TensorFlow

  • kernels in TensorFlow Lite.

  • So basically, this will allow you

  • to run normal TensorFlow kernels that we

  • don't have a built-in op for.

  • There is a trade to that, because that increases

  • your binary size considerably.

  • However, it's a great way to get started,

  • and it will kind of allow you to get into using TensorFlow Lite

  • and deploy your model if binary size is not

  • your primary constraint.

  • OK, great.

  • Once you have your model working and converted,

  • you will definitely want to validate it.

  • Every step of machine learning it's

  • extremely important to make sure it's still

  • running the way you think.

  • So if you have it working in your Python test bench

  • and it's running, you need to make sure it's also

  • running in your app.

  • This is just good practice, that end-to-end things

  • are producing the right answer.

  • In addition, you might want to do profiling

  • and you might want to look at what your size is.

  • Once you've done that, you want to convey this model

  • to the next phase, which is optimization.

  • We're not going to talk about that later,

  • but that's what you would do with those results.

  • OK, so how do you deploy your Model we have several APIs.

  • In the previous time I talked about this,

  • we had c++ and Java.

  • Kind of in May or so we introduced a Python API.

  • And I'm excited to talk about our C API, which

  • is a way in which we're going to implement

  • all of our different APIs, similar to how TensorFlow does

  • it.

  • In addition, we're also introducing an experimental C

  • Sharp API, which allows you to use it in a lot of toolkits

  • that are C Sharp based.

  • The most notable of which--

  • which is a feature request--

  • was Unity.

  • So if you want to integrate it into say a game.

  • OK, and then third, Objective C to get a more idiomatic

  • traditional IOS experience.

  • Great.

  • Let me just give you an example of some code here.

  • So the basic idea is you give it the file

  • name of the flat buffer model, you fill in the inputs,

  • and you call invoke.

  • Then you read out the outputs.

  • That's how all the APIs has work, no matter

  • what language they are.

  • So this was Python.

  • The same is true in Java.

  • The same is true in c++.

  • Perhaps the C one is a little bit more verbose,

  • but it should be pretty intuitive.

  • OK, now that we know how to swim,

  • let's go into diving into models.

  • How do we optimize a model?

  • So once you have your model working,

  • you might want to get the best performance possible,

  • and you might want to leverage custom hardware.

  • This traditionally implies modifying the model.

  • So the way in which we're going to do this is we're

  • going to put this into our [INAUDIBLE] loop.

  • We had our four steps before as part of swimming,

  • but now we have the additional diving model

  • where we do optimization.

  • And how does optimization work?

  • We're introducing this thing called the model optimization

  • toolkit, and the cool thing about it

  • is it allows you to optimize your model either post

  • training or during training.

  • So that means that you can do a lot of things

  • without retraining the model, but to get the--

  • let me just give an example, which is right now

  • we're doing quantization.

  • So there's two ways to do quantization.

  • One is, you take your model and look at what ranges it uses

  • and then just say we're going to quantize this model right now

  • by just setting those ranges.

  • So that's called post training quantization.

  • So all you need to do that is add a flag to the conversion.

  • I showed you the Python converter before.

  • There's also a command line one.

  • But both of them have this option to quantize the weights.

  • In addition, if you want to do a training time quantization,

  • we introduced the tool kit for doing this.

  • This is now kind of put under the model optimization toolkit,

  • and this will basically create you a new training graph

  • that when you run it, it will give you

  • the most optimal training quantized graph that you could.

  • It kind of takes advantage-- it takes into account that it

  • is going to be quantized.

  • So basically, the loss function is aware of the quantization.

  • So that's good.

  • OK.

  • So, just one more thing that I want to talk about,

  • which is roadmap.

  • We're looking actively into things like on-device training,

  • which will, of course, require us to investigate control flow.

  • We're adding more techniques to the optimization toolkit.

  • We'd also like to provide more hardware acceleration support.

  • And the last thing is for TensorFlow 2.0,

  • we're moving out of contrib and into TensorFlow.

  • So we'll be under TensorFlow slash Lite.

  • OK, so a couple demos.

  • I wanted to show a couple of models that were using--

  • so TensorFlow Lite.

  • So for example, here's one model that

  • allows you to see the gaze.

  • So it's running in real time.

  • It basically puts boxes around people

  • and kind of gives a vector of which direction

  • they're looking.

  • And this is running in real time on top of TensorFlow

  • Lite on an Edge TPU.

  • Let me show you another one.

  • Oh, sorry.

  • OK.

  • It's very tricky.

  • There we go.

  • Here's another one that's kind of interesting.

  • Again, this is using a variant of SSD.

  • It's basically three autonomous agents,

  • or two autonomous agents and one human driven agent.

  • Two of the agents are trying to catch the other one

  • and the other one's trying to avoid.

  • And they're all input in this SSD.

  • Basically, the upshot of this is that it uses SSD that's

  • accelerated with Edge TPUs.

  • It's about 40% faster using to Edge TPUs and TF Lite.

  • And I have one more demo, which is an app that's

  • using TF Lite called Yummly.

  • And basically, this is able to give you

  • recipes based on what it sees.

  • So let's just see it in action.

  • So this was originally built on TF Mobile,

  • but then moved to TF Lite.

  • So, this is their demo.

  • So essentially, you point your phone at what's in your fridge

  • and it will tell you what to make with it.

  • This is good for me, because I don't

  • have any creativity on cooking and I have

  • a lot of random ingredients.

  • So we're really excited by what people are using TF Lite for.

  • I want to show you one more demo,

  • which I just made last week with some of my colleagues.

  • And it's basically running on a microcontroller.

  • So this is basically a microcontroller

  • with a touch screen that has only one

  • megabyte of flash memory and 340 kilobytes of RAM.

  • So this is sort of pretty small, and we're

  • doing speech recognition on it.

  • So I say, yes.

  • And it says, yes.

  • It says no.

  • It says no.

  • And if I say some random thing, it says unknown

  • So pre-recorded.

  • Unfortunately, I don't have the sound on yet.

  • But this is just showing that we can run the same interpreter

  • code on these really small devices.

  • So we can go all the way to IoT, which I think is super exciting

  • and will introduce a whole new set of applications

  • that are possible.

  • So with that, I already told you this, which

  • we're moving out of contrib.

  • I'd like you guys to try out TensorFlow.

  • Send us some information.

  • If you're interested in discussing it,

  • go on to our mailing list, tflite@tensorflow.org.

  • And we're really excited to hear about new use cases

  • and to hear feedback.

  • So thank you.

  • We're going to-- both of us are going to be at the booth

  • over in the grand ballroom.

  • So if you want to talk to us more about either Swift

  • or TensorFlow Lite, that would be a great time to do it.

  • And thank you.

  • [APPLAUSE]

ANDREW SELLE: So today I'm going to talk about TensorFlow Lite,

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

TensorFlow Lite (TensorFlow @ O'Reilly AI Conference, San Francisco '18) (TensorFlow Lite (TensorFlow @ O’Reilly AI Conference, San Francisco '18))

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字