字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] TIM DAVIS: Hi, everyone. My name's Tim Davis and I'm a product manager in TensorFlow. And I'm presenting today with TJ, a TensorFlow Lite engineer who'll be speaking in a bit. I'm super excited to be here tonight to talk about all the improvements we've made over the last few months for TensorFlow Lite. So, first of all, what is TensorFlow Lite? Hopefully many of you know this by now, but we love to re-emphasize it and also provide context for users who are new to TensorFlow Lite. So TensorFlow Lite is out production ready framework for deploying ML models on mobile devices and embedded systems. TensorFlow Lite can be deployed on Android, iOS, Linux, and other platforms used in edge computing. Now, let's talk about the need for TensorFlow Lite and why we built an on-device ML solution we are in the midst of a huge demand for doing ML on the edge. It's driven by the need for user experiences that require low latency that work in situations with poor network connectivity and enable privacy preserving features. All of these are the reasons why we built TF Lite back in 2017. And just look at our journey since then. As the world's leading ML mobile framework, we have made a ton of improvements across the board. Recently we've increased the ops we support, delivered numerous performance improvements, developed tools which allow you to optimize models with techniques like quantization, increased language support for our APIs and there'll be more on that in a bit. And we're supporting more platforms like GPUs and DSPs. Now, you're probably wondering how many devices are we on now. Boom. TensorFlow Lite is now on more than 4 billion devices around the world-- across many different apps. Many of Google's own largest apps are using it, as are a large number of apps from external companies. This is a sampling of some of the apps that use TensorFlow Lite-- Google Photos, GBoard, YouTube, and the Assistant, along with really popular third-party apps like Uber, Hike, and many more. So what is TensorFlow Lite being used for you might ask? Developers are using TF Lite for use cases around image, text, and speech, but we are seeing a lot of new and emerging use cases around audio and content generation. For the rest of the talk, we're going to focus on some of the latest updates and highlights. First up, let's talk about how we're helping developers get started quickly and easily with TensorFlow Lite. At TF World we announced the TF Lite Support Library and today we are announcing a series of extensions to that. First, we are adding more APIs such as our image API and introducing new simplistic language APIs, all enabling developers to simplify their development. We are also adding Android Studio Integration, which will be available in Canary in a couple of weeks. That will enable simple drag and drop into Android Studio and then automatically generate Java classes for the TF Lite model with just a few clicks. This is powered by our new CodeGen capability. CodeGen makes it easy for TensorFlow Lite developers to use a TF Lite model by handling the various details around inputs and outputs of the model and saves you a heap of time. Here's a small example to show you what I mean. With the additions to the Support Library, you can now load a model set an input on it and then run the model, then easily get access to the resulting classifications. The CodeGen reads the model metadata and automatically generates the Java wrapper with the model-specific API and code snippet for you. This makes it easy to consume. And develop with TF Lite models without any ML expertise. And this is a sneak peek of how this will look in Android Studio. Here you can see all of the model metadata from the drag and drop TF Lite model. This will then CodeGen Java classes just the image models to begin but later for many different types of TensorFlow Lite models. How cool is that? We're then committed to making mobile ML super, super easy. Check out this in the next couple of weeks on the Android Studio Canary channel. In addition to CodeGen, all of this is made possible through the new extended model metadata. Model authors can provide a metadata spec when creating and converting models making it easier for users of the model to understand how it works. And then how to use it in production. Let's take a look at an example. The metadata descriptor provides additional information about what the model does, the expected format of the model inputs, and the meaning of the model outputs. All of this is encoded via a simple prototype of our new release has tools to help you generate the right metadata for your model. We have made our pretrained model repository much richer and added many more models, all of this is available via TensorFlow Hub. We've got this new mobile-friendly flavors of BERT, including MobileBERT and ALBERT for on-device NLP applications in addition to EfficientNet Lite. Again, all of these are hosted on our central model repository TensorFlow Hub. So check out TF Hub for all the details. Now, let's talk about transfer learning. Having a repository of ready to use models is great for getting started and trying them out, but developers regularly want to customize these models with their own data. That's why we're releasing a set of APIs which makes it easy to customize these models using transfer learning and we're calling this TF Lite Model Maker. It's just four lines of code. You start by specifying your data set, then choose which model spec you would like to use, and, boom, it just works. You can see some stats of how the model performs and lastly export it to a TF Lite model. We've got text and image classification already supported and new use cases like object detection and QA are coming soon. Now, we've got an exciting development in graph delegation. There are multiple ways to delegate your model in TF Lite through GPU, DSP, or through the NN API in Android P and up. Recently, we've added increased GPU performance and DSP delegation through Hexagon. And we've increased the number of supported ops through the NN-- Android NN API. But you already knew all that, so what's new? Well, we have a really big announcement that I'm incredibly excited to share today. I'm excited to announce the great news that, as of today, we are launching a Core ML delegate for Apple devices to accelerate floating point models in the latest iPhones and iPads using TensorFlow Lite. The delegate will run on iOS 11 and later. But to get benefits over using TF Lite directly, you want to use this delegate on devices with the Apple Neural Engine. The Neural Engine is dedicated hardware for accelerating machine learning computations on Apple's processors. And it's available on devices with the A12 SoC or later, such as the iPhone XS with iOS 12 or above. With the Neural Engine acceleration, you can get between 4 to 14 times speedup compared to CPU execution. So that's our update on delegates. We've heard from developers about the need for more and better tutorials and examples, so we're releasing several full example apps which show not only how to use a model, but the full end-to-end code that a developer would need to write to work with TF Lite. They work on multiple platforms-- Android, iOS, Raspberry Pi, and even the Edge TPU. Now, I'd like to hand over to TJ, who's going to run through some more exciting improvements to TF Lite and dive into the engineering roadmap more. TJ: Awesome. Thanks, Tim. Hi, everyone. I'm TJ, and I'm an engineer on the TensorFlow Lite team. So let's talk about performance. A key goal of TensorFlow Lite is to make your models run as fast as possible on CPUs, GPUs, DSPs, or other accelerators. And we've made serious investment on all these fronts. Recently we've seen significant CPU improvements, added OpenCL support for faster GPU acceleration, and have full support for all Android Q NNAPI ops and features. Our previously announced Qualcomm DSP delegate targeting low and mid-end devices will be available for use in the coming weeks. We've also made some improvements in our benchmarking tooling to better assist model and app developers in identifying optimal deployment configurations. We've even got a few new CPU performance improvements since we last updated you at TensorFlow World, more on that in a bit. To highlight these improvements, let's take a look at our performance about a year ago at Google I/O. For this example, we're using MobileNet V1. And compare that with our performance today. This is a huge reduction in latency. It can be expected across a wide range of models and devices, from low-end to high-end. Just pull the latest version of TensorFlow Lite into your app, and you'll benefit from these improvements without any additional changes. Digging a bit more into these numbers, floating point CPU execution is the default path providing a great baseline. Enabling quantization now made easier with our post-training quantization tooling provides a nearly 3x faster inference. Enabling GPU execution provides an even more dramatic speed up, about 7x faster than our CPA baseline. And for absolute peak performance, we have the Pixel 4's neural core, accessible via the NNAPI TensorFlow Lite delegate. This kind of specialized accelerator available in more and more of the latest phones unlocks capabilities and use cases previously unseen on mobile devices. And we haven't stopped there. Here's a quick preview of some additional CPU optimizations coming soon. In TensorFlow Lite 2.3 we're packing in even more performance improvements. Our model optimization toolkit has allowed you to produce smaller quantized models for some time. We've recently optimized the performance of these models for our dynamic range quantization strategy. So check out our model optimization video coming up later to learn more about quantizing your TensorFlow Lite models. On the floating point side, we have a new integration with the XNNPack library, available through the delegate mechanism, but still running on CPU. If you're adventurous, you can make use of either of these new features by building from source, but we're shipping them in version 2.3 coming soon. Last thing on the performance side-- profiling. TensorFlow Lite now integrates with Perfetto, the new standard profiler in Android 10. You can look at overall TFLite inference as well as op level events on the CPU or GPO delegation. Perfetto also supports profiling of heap allocation for tracking memory issues. So that's performance. OK now let's talk model conversion. Seamless and more robust model conversion has been a major focus for the team. So here's an update on our completely new TensorFlow Lite conversion pipeline. This new converter was built from the ground up to provide more intuitive error messages when conversion fails, support control flow operations, and it's why we're able to deploy new NLP models like BERT, or Deep Speech V2, or image segmentation models like Mask R-CNN and more. The new converter is now available in beta and will be available more generally soon as the default option. We want to make it easy for any app developer to use TensorFlow Lite. To that end, we've released a number of new first-class language bindings including Swift, Obj-C, C# for Unity, and more. Thanks to community efforts, we've seen the creation of additional TensorFlow Lite language bindings in Rust, Go, and Dart. This is in addition to our existing C+, Java, and Python bindings. Our model optimization toolkit remains the one-stop shop for compressing and optimizing your models. And it's now easier than ever to use with post-training quantization. So check out the optimization talk coming up later for more details. Finally, I want to talk a bit about our efforts in enabling ML not just for billions of phones, but also for the hundreds of billions of embedded devices and microcontrollers used in production globally. TensorFlow Lite for microcontrollers is that effort and uses the same model format, converter pipeline, and kernel library as TensorFlow Lite. So what are these microcontrollers. These are small, low power, all-in-one computers that power everyday devices all around us like microwaves, smoke detectors, toys, and sensors. And with TensorFlow, it's now possible to use them for machine learning. You might not realize that embedded ML is already in use in devices you use everyday. For example, hot word detection like OK Google on many smartphones typically runs on a small DSP, which then wakes up the rest of the phone. You can now use TensorFlow Lite to run models on these devices with the same tooling and frameworks. We're partnering with a number of industry leaders in this area in particular arm, an industry leader in the embedded market, has adopted TensorFlow as the official solution for AI on arm microcontrollers. And together we've made optimizations that significantly improve performance on embedded arm hardware. Another recent update-- we've partnered with Arduino and just launched the official Arduino TensorFlow library. This makes it possible to start doing speech detection on Arduino hardware in about five minutes. You create your machine learning models using TensorFlow Lite and upload them to your board using the Arduino IDE. If you're curious about trying this out, this library is available for download today. So that's a look at where we are today. Going forward, we're continuing to expand the set of supported models, make further improvements to performance, as well as some more advanced features like on-device training and personalization. So please check out our roadmap on TensorFlow.org and give us feedback. We'd love to hear from you. [MUSIC PLAYING]
B2 中高級 TensorFlow Lite:移動和物聯網設備的ML(TF Dev Summit '20)。 (TensorFlow Lite: ML for mobile and IoT devices (TF Dev Summit '20)) 4 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字