字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] SARAH SIRAJUDDIN: I'm Sarah. I'm the engineering lead for TensorFlow Lite. And I'm really happy to be back at I/O again, talking about TensorFlow Lite again. TIM DAVIS: Woo-hoo. Thank you to all of you for joining us here today. I'm Tim. I'm the product manager for TensorFlow Lite. And today we're here to tell you about doing machine learning on mobile and IoT devices. SARAH SIRAJUDDIN: So I expect that most of you here are already familiar with what machine learning is, so I won't be going into that. But let's talk instead about what is TensorFlow. TensorFlow is Google's open source, cross-platform machine learning framework. It allows you to build your models and deploy them to servers, browsers, and all the way to edge devices. It's a full end-to-end ecosystem which goes all the way from research to production, from training in the data center to deployment and edge like, I said. TensorFlow has support for multiple languages-- Swift, JavaScript, and Python. TIM DAVIS: So what is TensorFlow Lite, and how does it fit in? TensorFlow Lite is TensorFLow's cross-platform framework for deploying ML on mobile devices and embedded systems. You can take your existing TensorFlow models and convert them over to TensorFlow Lite easily. But I wanted to walk you through why this is so important to TensorFlow. There has been a global explosion in Edge ML, driven by the need for user experiences that require low latency and closer knit interactions. Further drivers, like poor network connectivity in many geographical regions around the world and user privacy requirements, have all fueled the need for ML on device. This has led to a whole revolution of machine learning and product innovation in nearly every single industry vertical. We see it all over the place, driven by on-device machine learning running in environments that utilize small compute, that have a constrained memory capacity, and consume low power. So that's why the TensorFlow team decided to invest heavily in making it easy to develop, build, and deploy ML that is cross-platform capable. TensorFlow Lite can be deployed on Android, iOS, Linux, and other platforms. It's easier than ever to use TensorFlow and convert your model to TensorFlow Lite and deploy it anywhere. SARAH SIRAJUDDIN: So at this point, you might be wondering what can you do with TensorFlow Lite. We really want you to be able to solve any kind of problem that you can imagine directly on the device. And later in this talk, we will be talking about the many ways developers are using TensorFlow Lite. But first, I want to show you a video of a fun demo that we built, which was featured in yesterday's dev note-- developer keynote. This highlights the cutting edge of what is possible with TensorFlow Lite. The demo is called Dance Like. Let's roll the video. [VIDEO PLAYBACK] [MUSIC PLAYING] - Dance Like enables you to learn how to dance on a mobile phone. - TensorFlow can take our smartphone camera and turn it into a powerful tool for analyzing body pose. - We had a team at Google that had developed an advanced model for doing pose segmentation. So we were able to take their implementation, convert it into TensorFlow Lite. Once we had it there, we could use it directly. - To run all the AI and machine learning models to detect body parts, it's a very computationally expensive process where we need to use the on-device GPU. TensorFlow Library made it possible so that we can leverage all these resources, the compute on the device, and give a great user experience. - Teaching people to dance is just the tip of the iceberg. Anything that involves movement would be a great candidate. - So that means people who have skills can teach other people those skills. And AI is just this layer that really just interfaces between the two things. When you empower people to teach people, I think that's really when you have something that is game-changing. [END PLAYBACK] TIM DAVIS: All right, cool. So to build Dance Like, as I talked about yesterday, we built ourselves this audacious goal of running five on-device tasks in parallel, in real time, without sacrificing performance. And I want to walk you through what they were. So we're running two body part segmentation models. We're matching the segmentation models in real time. We're running dynamic time warping. We're playing a video and encoding a video. And let me emphasize this again-- this is all running on device. And to show you, I'm actually going to do a live demo. I spend a lot of I/O dancing. And if we just cut to the app, what you'll see is there's a few dancers you can choose from, so real time in slow mo. I'm going to like slow mo because I'm a beginner dancer. And so I can fire up some dance moves. You can see me-- the pose model running. And basically what's happening now is it's segmenting me out from the background and identifying different parts of my body. And then, as I follow along with the dancer, a second segmentation model starts running, but this time on the dancer. So now there's two segmentation models running by the GPU. And that produces the matching score that you see up in the top right-hand corner. And what that's doing is giving me some feedback on how well I'm matching the dancer. It's pretty cool, right? But we went further. Dancing is cool, and dancing in slow mo isn't that cool. So what we thought we would do is we would use dynamic time warping to sync my slow mo moves with the real-time dancer. And so what you get is an effect where the user, all running on device, can output this type of content. So you can come and try this in the AI Sandbox. You can see for yourself. You can get a video. You can share it. And it's really, really cool. And it's all because of TensorFlow Lite. SARAH SIRAJUDDIN: How awesome was that? And props to Tim for agreeing to dancing on stage at I/O, and not once, but twice. So besides dancing, what are some other use cases that developers are using TensorFlow Lite for? The major on-device use cases that we see are typically related to image and speech, so things like segmentation, object detection, image classification, or speech recognition. But we are also seeing a lot of new and emerging use cases come up in the areas around content generation and text prediction. TensorFlow Lite is now on more than 2 billion devices around the world, running on many different apps. Many of Google's own largest apps are using it, as are apps from many other external companies. So this is a sampling of some of the apps which are using TensorFlow Lite-- Google Photos, GBoard, YouTube, Assistant, along with several global companies, like Uber and Airbnb. TIM DAVIS: So TensorFlow Lite also powers ML Kit, which is our out-of-the-box solution for deploying Google's best proprietary models on device. You would have heard about this yesterday, too. We are also powering that in the back end. SARAH SIRAJUDDIN: So now let's move on to how you can get started with using TensorFlow Lite yourself. It's fairly simple to get started. I'm going to walk you through how you can use an off-the-shelf model, or retrain your model, or use a custom model that you may have built for your own specific use case with TensorFlow Lite. And once you've done that, it's really about validation and optimizing your performance for latency, size, and accuracy. So first, let's dive into how you can get started. As a new user, the simplest way to get started is to download a pretrained model from our model repository on tensorflow.org. We have models for popular use cases like image classification, object detection, estimating poses, and smart reply. And this is an area where we plan to keep adding more and more models, so please check back often. These models that are hosted there are already in the TensorFlow Lite model format, so you can use these directly in the app. Now if you did not find a model which is a good use for your-- which is a good fit for your use case, you can try retraining. And this technique is also frequently called transfer learning. And the idea here is that you can reuse a model that was trained for another task as a starting point for a model for a different task. And the reason why this is useful is that training a model from scratch can sometimes take days. But transfer learning can be done in short order. Note that if you do retrain a model, you will still need to convert that retrained model into TensorFlow Lite's format before you use it in an app. And later in this talk, I will show you how you do that conversion. OK, so once we have a model in TensorFlow Lite format, how do you use it in an app? First you load the model. Then you preprocess your data into a format that your model will accept. Then you change your application code to invoke the TensorFlow Lite inference library. And finally, you use the result of the inference in your code. So let's walk through some code which shows this. This is code which was taken from the image classifier example. This is hosted on our website. It's in Java, written for the Android platform. You can see that the first thing that we do here is that we load the model, and then we construct the TensorFlow Lite interpreter. We then load the image data and preprocess it. And you'll notice that we're using a byte buffer. And the reason we're doing that is to optimize for performance. Next step is to run inference and classify the images. And that's it. That's all you need to do to get an image classifier on Android. I do want to highlight that the example that I have run through is in Java, but TensorFlow Lite also has bindings for Objective-C, C++, Swift, as well as Python. So the next thing I want to move on to is how you can use your own custom model with TensorFlow Lite. So the high-level steps here are that you train your model with TensorFlow. You write it out into the saved model format. And then you would need to convert that into TensorFlow Lite format using TensorFlow Lite's converter. And then you make your changes in your app to use the model like I walked you through just now. So this is a code snippet showing how you can convert a saved model into TensorFlow Lite model format. As you can see, it's fairly simple to do. It's a matter of constructing the converter with your saved model, and then invoking the convert function on it. Please check out our website for more documentation on this. And it also has documentation on how you can do this with TensorFlow 2.0, which is the newest release coming out from TensorFlow. Speaking of conversion, we've heard feedback from our users that TensorFlow Lite conversion is sometimes hard. Developers sometimes run into issues that the ops that their models are using are not supported with TensorFlow Lite. Or they might be using semantics, which we don't support yet, for example, control flows. Rest assured that we've heard this feedback, and we are actively working to improve it. We are building a brand new converter. This converter is based on MLIR, which is Google's latest compiler infrastructure. And our new converter will be a lot more extensible and easy to use and debug. So that was all about how you can convert and deploy your model with TensorFlow Lite. I want to walk through some advanced techniques, which you may use if they are useful for you. So many developers who use TensorFlow Lite care deeply about keeping the binary footprint small. Selective registration is one feature that can really help here. And the idea is that you can only link in the ops that your model is using. And this thereby makes the size of the binary small. So let's see how this works in code. You create a custom op resolver, and you replace TensorFlow Lite's built-in op resolver with it. And then, in your build file, you specify your model and the custom op resolver that you just created. And then TensorFlow Lite will scan over your model, create a repository of ops that are used in your model. And then, when you build the interpreter, only those ops are linked. And this in turn will reduce the size of the binary. Another advanced feature that I want to mention is TensorFlow Select. It allows developers to access many of TensorFlow's ops via TensorFlow Lite. The caveat, though, is that it does increase the size of the binary. But if your use case is not very sensitive to the size of the binary, and you're running into issues that you need ops which TensorFlow Lite doesn't support yet, I highly recommend that you check this out. So this is how it works in code. It's a small modification to how you would convert your model to TensorFlow Lite. This is pretty much the same code that you would use for normal conversion. The only difference here is that in target ops, you specify the set of TensorFlow Select ops. You can find a lot more documentation on how to do this on our website. It also has information on how this is working under the hood as well as usage examples. We get a lot of requests to support control flows in TensorFlow Lite. These are constructs like loops and conditionals. We're actively working on this, and we hope to share more about this in the coming few months with you. And the last thing that I want to cover in this section is on-device training. This is an exciting new area, and we believe that this will open up many new opportunities for research and product innovation. We are working to add training support to TensorFlow Lite. And at this point, I would guess that this would be available towards the end of the year for developers to try out. TIM DAVIS: Great. Thanks, Sarah. So now that you have your model up and running, now you need to validate it and get it running fast. To get started, we recommend benchmarking your model with our benchmark tooling. This will enable you to validate your model's accuracy, size, and performance, and make adjustments depending on the results. Before I do get into that, I wanted to share the key performance goal of TensorFlow Lite, and that is to make your models run as fast as possible on CPUs, GPUs, DSPs, and NPUs. Fast is what we care about. So if you don't know what all those terms mean, I'll explain them now. Most phones have a CPU, and many have a GPU and a DSP. CPU is typically the best option for simple ML models. GPUs are usually really great for high-energy processing at fast speeds. And DSPs tend to be best for low-powered, complex models that require very fast execution. It depends on the use case and experimentation with you, but the great thing for TensorFlow Lite is that it allows you to execute your ML on all of them. And our team has worked incredibly hard to have optimal performance across all these different architectures. For example, MobileNet V1 achieves 83 millisecond inference speed on a Pixel 3 with a single CPU thread. Drop that to just 15 milliseconds when you delegate that across to the GPU. So we have a lot more CPU optimizations coming in our pipeline to get even better performance across 2019, in addition to more op support on ARM and Intel architectures. So what about the delegation API is such an important mechanism inside TensorFlow? But you're probably thinking, what is this magical API and how does it work? So the delegation API delegates part of your graph to another executor at runtime. It accelerates any or all parts of the graph If it can get better performance and falls back to CPU when it can't. So here's a great way to visualize it. A graph is made up of a series of operations. And for operations supported on a particular architecture, TensorFlow Lite will accelerate those if you ask it to. If certain operations aren't supported on that delegate, it will fall back to the CPU automatically. So now that we've spoken about the delegation API, the Android Neural Network API uses it to standardize hardware acceleration across the Android ecosystem. In P, it supports around 30 graph operations. And in Q, it will have more than 100 and support use cases like image, audio, speech, and others. For example, you could achieve 9x latency improvement on the ML Kit Face Detection model using the NN API. And it's really easy to use. You just flip the setUseNNA flag to true, and you're done. You will get acceleration. Your graph will accelerate where possible using the NN API. So the other thing we've done is we released the GPU acceleration ops using Open GL ES 3.1 for Android and metal shaders for iOS. So this will give you a 2 to 7x speed-up in comparison to floating point on the CPU. But it does add a tiny bit more to your binary size. Oh, did I go-- no. And so it's really-- sorry, I went backwards. Anyway, we're working on making the GPU faster, is basically what I'm saying here. We love feedback, so please reach out to us and let us know what's important. So Edge TPU is another example of the delegation API working with a custom ML accelerator that has high-performance, low-powered acceleration at the edge. It accelerates TensorFlow Lite models, and you can find out more in the Edge TPU talk tomorrow. So now we've got a really exciting announcement around DSP performance. We've partnered with Qualcomm to enable DSP delegation through TensorFlow Lite directly for their 600 to 800 series devices, which is hundreds of millions of devices. While we recommend that you use the NN API in Android Q and beyond, this announcement just gives us another option for accelerating your models on the DSP. You'll be able to include a Qualcomm signed binary and rely on the delegation API for acceleration. And we hope to have this released later this summer. So you're probably wondering, well, what type of performance can I get on the DSP? You can get up to an 8.3x speed-up delegating over to the DSP on Qualcomm devices, which is an incredible performance boost that we are excited to bring to TensorFlow Lite. This is just another example of how we are enabling more accelerators to work with your ML models. Lastly, as I talked about earlier, you want to ensure that you are benchmarking and validating your models. And so we offer some very simple tooling to enable this for threading and per-op profiling. And here is a way to execute the per-up profiling via the command line with Bazel and ADB. So this is basically what you get as an output when you are doing per-op profiling. It really does enable you to narrow down your graph execution, and then go back and tune performance bottlenecks. Now let's talk about optimizing your graph using the TensorFlow Model Optimization Toolkit. We offer a simple toolkit to optimize your graphs and enable them to run faster at a smaller size. For those that already understand, we are adding more techniques for during and post-training quantization. And if none of these concepts are familiar to you, don't worry, I'll explain. So what is quantization, you might be wondering. Quantization is really just the reduction in precision on weights and activations in your graph, essentially reducing from floating point to integer-based numbers. The reason this works so well is that we try to optimize the heaviest computations in lower precision but preserve the most sensitive ones with higher precision, so there is no accuracy loss. There are variants. Post-training occurs once you have an outputted graph, and during training, which preserves the forward pass and matches precision for both training and inference. So now you know what quantization is. The goal of the toolkit is really to make your graphs run faster and be smaller by abstracting all the complexity involved with all these differing techniques. So we strongly recommend that you start with post-training quantization, as it's a simple flag flip to utilize it, and you can see what type of performance improvements you can achieve. We've seen up to a 4x reduction in model size, 10% to 50% faster execution for convolutional models, and 3x improvements on fully connected and RNN-based models on the CPU. And this is how simple it is to do it. It really is just a simple flag flip. So in the coming months, we'll be improving our support for quantization with Keras, enabling post-training quantization with fixed-point math, and adding more advanced techniques, like connection pruning and sparsity support. But one thing we really want to emphasize is that post-training quantization is really almost as good as during-training quantization. If you're an advanced practitioner and have access to the model and the training data, then during-training quantization might be for you. But for most, post-training quantization can almost have the exact same effect on size and accuracy. So here you can actually see the difference-- and there's really only marginal differences-- between during-training quantization versus post-training quantization. So, again, please try and start with post-training quantization first. SARAH SIRAJUDDIN: Thanks, Tim. So is TensorFlow Lite only for mobile phones? It is not. TensorFlow Lite is already being used in many, many products which are not phones. It is being used in smart speakers, smart mirrors, vacuum cleaners, and even small space satellites from NASA. It's because of this demand that we see from our developers and also the fact that there are a huge number of microcontrollers that are out there, we have decided to invest in making TensorFlow Lite even lighter and suitable for use on these platforms. OK, so let's first talk about what is a microcontroller. They're essentially small computers on a single circuit. They typically don't run an operating system. They have very limited RAM and flash, usually just tens of kilobytes of it. And they only have memory, CPUs, and perhaps some peripherals. And the way microcontrollers are used many times is that they are used in a cascading fashion. So they perform lightweight processing, and, based on the result of that, it triggers heavier processing on some more powerful hardware. So as an example, a microcontroller can be checking to see if there is any sound. And this in turn can trigger processing on a second microcontroller, which would check if the detected sound was human speech. And this in turn would trigger processing on a heavier application processor. So at a high level, the architecture for TensorFlow Lite for microcontrollers is the same as what we have for TensorFlow Lite. We use the same model format, and we use the same conversion process. The only difference here is that the interpreter is a stripped-down, lightweight version of TensorFlow Lite's interpreter. We have been working on getting example models ready for you to use on microcontrollers. We have one already on the website for speech, and there is another one for image classification that is coming out soon. And these models have to be pretty small, too, as you can imagine. If you go to our website, you will find instructions on how you can use these models. And if you are here at-- sorry. And it also has suggestions for how you can procure hardware to get started. This brings me to another exciting announcement. We are happy to announce a closer collaboration with ARM on the development of TensorFlow Lite for microcontrollers. ARM is a well established and respected leader in this space, and we are very excited to work closely with them. We will be working closely with ARM on the development of models, framework design, as well as performance optimizations. The Mbed community is the largest community of developers in this space. And we will be integrating deeply with the tooling there to make TensorFlow Lite easy and performant for them to use. This is a relatively new effort, and we would love to work closely with the TensorFlow community to make this successful. So please send us your feedback, ideas, and also code contributions. And if you are here at I/O, you can go to the Codelabs area, where you can try running in TensorFlow Lite on a microcontroller yourself. TIM DAVIS: So where can you go and learn more about TensorFlow Lite and everything we've shared today? So the first thing is we really listened to you and heard that our documentation just wasn't good enough. So we've worked really hard on making it a lot better so you have the resources you need to develop what you want to do with TensorFlow Lite. We have new tutorials, better demos, and a new model repository available and live right now. Just go to tensorflow.org/lite to get started, and you'll see our revamped website that makes it easy to navigate and find what you're looking for. Now, as an open source product, we're working hard to engage with the community and be even more transparent about where we're headed. That's why we've published our 2019 roadmap on tensorflow.org/lite so you have visibility into our priorities. So please, feel free to check it out and give us feedback. We also have new code samples and models on the site for common use cases, like image classification, object detection, and the others that you see here. So now we're excited to show one last demo, which is very unique, fun, and shows off the performance of TensorFlow Lite in a non-mobile environment. So Sarah is going to be the star of the show for this demo, which will look at the virtual try-on of glasses and hair color. And then we'll take a look at what's going on behind the scenes. So Sarah, come on up, and let's do this. SARAH SIRAJUDDIN: So as you can see, this is a smart mirror. This was built by one of our developers. As you can see, this is a touchless mirror, which is operated only by hand gestures. It's running the CareOS operating system running on Qualcomm hardware. All the machine learning on this mirror is powered by TensorFlow Lite, and we are accelerating it on the GPU to get optimal performance. OK, let's give this a whirl. TIM DAVIS: All right, so the first try-on experience we're going to show you is realistic virtual try-on of eyeglasses. So as you can see, Sarah doesn't need to touch the mirror since it works through a touchless interaction. The embedded Google AI technology runs in real time, and is driven end to end with GPU acceleration for model inference and rendering. Those glasses are looking awesome. So now let's try maybe hair recoloring. SARAH SIRAJUDDIN: Oh, I'll go for the blue. TIM DAVIS: Oh, that looks awesome. This is enabled by a state-of-the-art segmentation model that predicts for every pixel the confidence of being part of the user's hair or not. Sarah, you look great. So now, why don't we take a look inside? What's going on behind the scenes to achieve this experience? So for every frame, a TensorFlow high-fidelity geometry model is being run to predict over 400 points on the face. And it even works for multiple people. I'll show you. So all up, this is an awesome example of on-device ML using TensorFlow Lite on a non-mobile device. You can check out the segmentation models that we have available on tensorflow.org/lite. And if you'd like to find out more about the mirror, please come up to us after the show, and we can direct you where to get more information. SARAH SIRAJUDDIN: That's all we have, folks. Please try out TF Lite if you haven't done so already. We would not have gotten to this point if it wasn't for our developer community, which has helped us a lot with feedback, as well as code contributions. So thank you a lot. We're really grateful. [APPLAUSE] A few of us will be at the AI Sandbox today and tomorrow. Please come by to have a chat and try out one of our demos. TIM DAVIS: We'll also be at Office Hours today at 10:30 AM, right after this talk. Thank you very much. SARAH SIRAJUDDIN: Thank you. [MUSIC PLAYING]
B1 中級 移動和物聯網設備的AI。TensorFlow Lite (Google I/O'19) (AI for Mobile and IoT Devices: TensorFlow Lite (Google I/O'19)) 1 1 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字