字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] DANIEL SITUNAYAKE: Hey, everybody. So my name's Daniel. LAURENCE MORONEY: And I'm Lawrence. DANIEL SITUNAYAKE: And we have an awesome talk for you, this afternoon. So I've been super excited watching the keynote, because there's just been so much stuff that is relevant to what we're going to be talking about today, which is running machine learning on devices. So we've seen the kind of amazing things that you can do if you're running ML at the edge and on device. But there are a ton of options that developers have for doing this type of stuff. And it can be a little bit hard to navigate. So we decided to put this session together to give you an overview. We're going to be showing you all the options, walking you through some code, and showing you some awesome demos. So hopefully, you'll enjoy. So first, we're going to walk through using TensorFlow to train models and then export saved models, which you can then convert to deploy on devices. We're then going to see a number of different ways that you can deploy to Android and iOS devices. And finally, we're going to talk about some new and super exciting hardware devices that you can use to run your models. So first of all, I want to give you an overview of some different technologies and the device types they each allow you to support. So to begin with, we have ML Kit, which is designed to make it super easy to deploy ML inside of mobile apps. We then have TensorFlow.js, which basically lets you target any device that has a JavaScript interpreter, whether that's in browser or through Node.js. So that even supports embedded platforms. And finally, TensorFlow Lite gives you high performance inference across any device or embedded platform, all the way from mobile phones to microcontrollers. So before we get any further, let's talk a little bit about TensorFlow itself. So TensorFlow is Google's tool chain for absolutely everything to do with machine learning. And as you can see, there are TensorFlow tools for basically every part of the ML workflow, from loading data through to building models and then deploying them to devices and servers. So for this section of the talk, we're going to focus on building a model with TensorFlow and then deploying it as a TensorFlow Lite model. There are actually tons of ways to get up and running with TensorFlow on device. So the quickest way is to try out our demo apps and sample code. And we also have a big library of pretrained models that you can drop into your apps that are ready to use. You can also take these and retrain them based on your own data using transfer learning. You can, as you've seen this morning, use Federated Learning to train models based on distributed data across a pool of devices. And you can finally build models from scratch, which is what Laurence is now going to show off. LAURENCE MORONEY: Thank you, Daniel. Quick question for everybody. How many of you have ever built a machine learn model? Oh, wow. DANIEL SITUNAYAKE: Wow. LAURENCE MORONEY: Oh, wow. Big round of applause. So hopefully, this isn't too basic for you, what I'm going to be showing. But I want to show just the process of building a model and some of the stops that you can then do to prepare that model to run on the mobile devices that Daniel was talking about. Can we switch to the laptop, please? Can folks at the back read that code? Just wave your hands if you can. OK, good. Wave them like this if you need it bigger. OK, some do, or you just want to stretch. Let's see. How's that? OK, cool. So I'm just going to show some very basic TensorFlow code, here. And I wanted to show the simplest possible neural network that I could. So for those of you who've never built something in machine learning or have never built a machine learn model, the idea is like with a neural network, you can do some basic pattern matching from inputs to outputs. We're at Google I/O, so I'm going to talk about inputs and outputs a lot. And in this case, I'm creating the simplest possible neural network I can. And this is a neural network with a single layer and a single neuron in that neural network. And that's this line of code, right here. The Keras.layers.Dense units equal 1. Input shape equals 1. And I'm going to then train this neural network on some data, and that's what you can see in the line underneath-- the Xs and the Ys. Now, there is a relationship between these data points. Can anybody guess what that relationship is? There's a clue in the 0 and 32. AUDIENCE: Temperature. LAURENCE MORONEY: Yeah, a temperature conversion, right? So the idea is I could write code that's like 9 over 5 plus whatever, plus 32. But I want to do it as a machine learn model, just to give as an example. So in this case, I'm going to create this model. And with this model, I'm just training it with six pairs of data. And then what it will do is it will start then trying to infer the relationship between these data points. And then from that, going forward, it's a super simple model to be able to do a temperature conversion. So how it's going to work is it's going to make a guess. And this is how machine learning actually works. It just makes a wild guess as to what the relationship between these data is. And then it's got something called a loss function. And what that loss function is going to do is it's going to see how good or how bad that guess actually is. And then based on the data from the guess and the data from the loss function, it then has an optimizer, which is this. And what the optimizer does is it creates another guess, and then it will measure that guess to see how well or how badly it did. It will create another guess, and it will measure that, and so on, and so on, until I ask it to stop or until it does it 500 times, which is what this line of code is actually doing. So if I'm going to create this model quite simply, we'll see it's going to train. There was one that I created earlier, so my workbook's taking a moment to get running. My network connection's gone down. Hang on. Let me refresh and reload. You love it when you dry run a demo, and it works great. We get that warning. I'll run that. I'll do that. And now, it starts training, hopefully. There we go. It's starting to train, now. It's going through all these epochs. So it's going to do that 500 times. And then at the end of the 500 times, it's going to have this trained model. And then this trained model, I'm just going to ask it to predict. So for example, if I give it 100 degrees centigrade, what's that going to be in Fahrenheit? The real answer is 212, but it's going to give me 211 and something, because this isn't very accurate. Because I've only trained it on six points of data. So if you think about it, there's a linear relationship between the data from Fahrenheit to centigrade on those six, but the computer doesn't know that. It doesn't go linearly forever. It could go like this, or it could change. So it's giving me a very high probability that for 100 degrees centigrade, it would be 212 Fahrenheit. And that comes out as 211 degrees centigrade. So I've just built a model. And what we're going to take a look at next is, how do I get that model to work on mobile? Can we switch back to the slides, please? So the process is pretty simple. The idea is like using Keras or using an estimator in TensorFlow. You build a model. You then save that model out in a file format called SavedModel. And in TensorFlow 2, we're standardizing on that file format to make it easier for us to be able to go across different types of runtimes, like JavaScript, TFX in the web, or TensorFlow Lite. By the way, the QR code on this slide is to the workbook that I just showed a moment ago. So if you want to experiment with that workbook for yourself, if you're just learning, please go ahead and do so. It's a public URL, so feel free to have fun with it. And I put a whole bunch of QR codes in the rest of the slides. Now, once you've done that, in TensorFlow Lite, there's something called the TensorFlow Lite Converter. And that will convert our SavedModel into a TensorFlow Lite model. So the process of converting means it's going to shrink the model. It's going to optimize the model for running on small devices, for running on devices where battery life is a concern, and things like that. So out of that process, I get a TensorFlow Lite model, which I can then run on different devices. And here's the code to actually do that. So we've got a little bit of a breaking change between TensorFlow 1 and TensorFlow 2. So in the workbook that was on that QR code, I've put both pieces of code on how to create the SavedModel. And then once you've done that, the third line from the bottom here is the TF Lite Converter. And all you have to do is say here's the SavedModel directory. Run the TF Lite Converter from SavedModel in that directory, and it will generate a .tflite file for me. And that .tflite file is what I can then use on mobile. So let's take a look at that in action, if we can switch back to the laptop. So all I'm going to do within the same workbook is I'm going to run that code that I just saw. And I'm using TensorFlow 1.x and Colab here. And we shall see that it actually has saved out a model for me in this directory. And I need that in the next piece of code, because I have to tell it the directory that it got saved to. So I'll just paste that in, and then I'll run this out. And we can see the TF Lite Converter is what will do the conversion for us. So if I run that, it gives me the number 612. Can anybody guess why it gives me the number 612? It's not an HTTP code. I thought it was that, at first, too. That's actually just the size of the model. So the model that I just trained off those six pieces of data, when that got compiled down, it's a 612 byte model. So if I go in there, you can see I saved it in /tmp/model.tflite. And if in my Colab, if I go and I look at /tmp directory, we'll see model.tflite is there. And I could download that then to start using it in my mobile apps if I like. Can we switch back to the slides, please? So now, we have the model. We've trained the model. Obviously, the models you're going to train are hopefully a little bit more complicated than the one that I did. You've been able to convert that model to TF Lite. And now, what can you do with that model, particularly on mobile? Well, there's three sets of options that I want to cover. The first one, if you were at the developer keynote, you probably saw ML Kit. And ML Kit is super cool. For me, in particular, it uses the Firebase programming API. Any Firebase fans, here? Yeah, woo! The Firebase API for programming, I find particularly cool. It's got a really nice asynchronous API. And when you think about it, when I'm using a model, I'm going to be passing data to the model. The model is going to run some inference, and it's going to send something back to me. So it's perfect for Firebase, and it's perfect for that asynchronous API that Firebase gives us. If we don't want to use Firebase-- and remember, Firebase ships with a bunch of models that work out of the box for vision detection and some of the AutoML stuff that we saw today. But you can also ship your custom TF Lite model into Firebase if you want. But if you don't want to use Firebase, or maybe your model is going to be deployed in a country where Firebase isn't supported, or you want it to work completely offline, and things like that, then the idea is you can still deploy a model directly to your app. And I'm going to show a TensorFlow Lite for that, and getting low level, and using TensorFlow Lite directly instead of going through the ML Kit wrapper. And then finally, there's the mobile browser. So you can actually deploy a model. You can convert it to JSON, and you can deploy it to run, actually, in a mobile browser, which I find pretty cool. So first, let's take a look at ML Kit. So ML Kit is Google's solution for Firebase developers and for mobile developers who want to have machine learning models running in their applications. Any ML Kit users here, out of interest? Oh, not many. Wow. Well, you're in for a treat if you haven't used it yet. Go check out the Firebase booth, the Firebase sandbox. They got some really cool stuff that you can play with. But just to show how it works, the idea is that in the Firebase console, you can either pick one of the preexisting models that Firebase gives you, or you can upload the model that you just created. So in Firebase, you've got the option to say a custom model I've uploaded-- here, you can see one that I did a couple of weeks ago of this model that I uploaded. It's now in the Firebase console, and I can use it within my Firebase app. And I can use it alongside a lot of the other Firebase goodies like analytics. Or a really cool one is A/B testing. So I can have two versions of my model. I could A/B test to see which works best. Those kind of services are available to Firebase developers, and when integrated with machine learning, I find that makes it pretty cool. And then once I've done that, now, when I start building my application, I do get all of the goodness of the Firebase programming API. So if this is on Android, the idea is with TensorFlow Lite, there's a TensorFlow Lite runtime object that we'll often call the interpreter. And here, you can see, I'm just calling interpreter.run. I'm passing it my inputs. So in this case, if it's Fahrenheit to centigrade conversion, I'm just going to pass it a float. And then in its onSuccessListener, it's going to give me a call back when the model has finished executing. So it's really nice in the sense that it can be very asynchronous. If you have a really big model that might take a long time to run, instead of you locking up your UI thread, it's going to be working nice and asynchronously through ML Kit. So in my addOnSuccessListener, I'm adding a SuccessListener. It's going to give me a call back with the results. And then that result, I can parse to get my output from the machine learn model. And it's really as simple as that. And in this case, I'm passing it in a float. It's converting the temperature. It's sending a float back to me. And that's why my getOutput array is a float array with a single element in it. That's one thing if you haven't worked in machine learning and if you haven't built machine learning models before, one of the things that you'll encounter a lot is that when you're passing data in, you pass data in as tensors. But when you are mapping those tensors to a high level programming language, like Java or Kotlin, you tend to use arrays. And when it's passing stuff back to you, it's passing back a tensor. And again, they tend to map to arrays, and that's why in the code here, you're seeing arrays. So iOS. Any iOS fans, here? Oh, a few. Hey, nobody booed. You said they would boo. [CHUCKLING] So in iOS, it also works. So for example, again, I have my interpreter in iOS. This is Swift code. I'll call the .run method on my interpreter. I'll pass it the inputs, and I will get the outputs back. And again, in this very simple model, I'm just getting a single value back. So it's just my outputs at index 0 I'm going to read. If you're doing something more complex, your data in and your data out structures are going to be a bit more complex than this. But as Daniel mentioned earlier on, we have a bunch of sample applications that you can dissect to take a look at how they actually do it. So that's ML Kit. And that's a rough look at how it can work with the custom models that you build and convert to run in TensorFlow Lite. But let's take a look at the TensorFlow Lite runtime itself. So now, if I'm building an Android application, and I've built my model, and I don't want to depend on an exterior service like the Firebase service to deploy the model for me, I want to bundle the model with my app. And then, however, the user gets the app, via the Play Store or via other means, the model is a part of that. Then it's very easy for me to do that. So that .tflite file that I created earlier on, all I have to do is put that in my assets folder in Android as an asset, just like any other-- like any image, or any JPEG, or any of those kind of things. It's just an asset. But the one thing that's really important, and it's the number one bug that most people will hit when they first start doing this, is that when Android deploys your app to the device to run it, it will zip up. It will compress everything in the Assets folder. The model will not work if it is compressed. It has to be uncompressed. So when you build out Gradle, you just specify aaptOptions. You say noCompress "tflite", and then it won't compress the tflite file for you. And then you'll be able to run it and do inference. So many times, I've worked with people building their first TF Lite application, it failed unloading it into the interpreter. They had no idea why. And it's they've forgotten to put this line. So if you only take one thing away from this talk, take this slide away, because it will solve a lot of your problems when you get started with TF Lite. Then of course, still in build.gradle, all you have to do is, in your dependencies, you add the implementation of the TensorFlow-lite runtime. And what that's going to do is it's going to give you the latest version of TensorFlow Lite, and then that will give you the interpreter that you can use it. And this QR code is a link. I've put the full app that I'm going to show in a moment on GitHub, so you can go and have a play and hack around with it if you like. So now, if I want to actually do inference-- so this is Kotlin code. And there's a few things to take a look at here on how you'll do inference and how you'll actually be able to get your model up and running to begin with. So first of all, there are two things that I'm declaring here. Remember earlier, we tend to use the term interpreter for the TF Lite runtime. So I'm creating a TF Lite object, which I'm going to call interpreter. Sorry, I'm going to create an interpreter object, which I'm going to call TF Lite. And then I'm going to create a MappedByteBuffer object, which is the TF Lite model. Now earlier, remember, I said you put the TF Lite model into your Assets folder. How you read it out of the Assets folder is as a MappedByteBuffer. I'm not going to show the code for that in the slides, but it's available on the download, if you want have a look at it for yourself. And then you're also going to need a TF Lite Options object. And that TF Lite Options object is used to set things like the number of threads that you wanted to execute on. So now, to instantiate your model so that you can start using it, it's as easy as this. So first of all, I'm going to call a loadModelfile function. That loadModelfile function is what reads the TF Lite model out of the Assets folder as a MappedByteBuffer. And it gives me my MappedByteBuffer called tflitemodel. In my options, I'm going to say, for example, I just want this to run on one thread. And then when I instantiate my interpreter like this, by giving it that MappedByteBuffer of the model and giving it the options, I now have an interpreter that I can run inference on in Android itself. And what does the inference look like? It will look something like this. So remember earlier, when I mentioned a neural network takes in a number of inputs as tensors, it gives you a number of outputs as tensors. Those tensors, in a higher level language like Kotlin or Java or Swift, will map to arrays. So even though I'm feeding in a single float, I have to feed that in as an array. So that's why here, my input value is a float array with a single value in it. So if I want to convert 100, for example, that's going to be a float array with a single value containing 100. And that F is for float, not for Fahrenheit. When I was rehearsing these slides before, somebody was like, oh, how'd you put Fahrenheit into code like that. But it's a float. It's not Fahrenheit. And then when I'm reading, we have to get down a little low level here, because the model's going to send me out a stream of bytes. I know that those bytes map to a float, but Kotlin doesn't. Java doesn't. So those stream of bytes, I know they're mapping to a float. And a float has 4 bytes, so I'm going to create a byte buffer. I'm going to allocate 4 bytes to that byte buffer, and I'm just going to set its order to be native order. Because there's different orders like big-endian and little-endian. But when you're using TF Lite, always just use native order. And then to do my inference, I call tflite.run. I give it my input value. I give it my output value. It'll read from the input value. It'll write to the output file. And then on the output file, if I want to get my prediction, it's written those 4 bytes. I have to rewind them, and then I'm going to read a float. And what Kotlin will do is say, OK, I'm taking those 4 bytes out of that buffer. And I'm going to give you back a float from them. So that's how I would do an inference. It seems very complex for a very simple task, like float in, float out, but the structure is the same regardless of how complex your input is and how complex your output is. So while this might seem to be the 20 pound hammer for the one pound nail, it's also the same hammer when you have a 20 pound nail. So that was Android, iOS is very similar. So in iOS, all I have to do is I put my model in my application. So I just put my TF Lite model. It's an asset like any other. And then in code, first of all, I create a pod. And in my pod file, I'll have a pod for TensorFlow Lite. I've spoken at I/O for the last five years, and this is my first time ever showing C++ code. I'm kind of geeking out a little bit. Right now, it supports objective C++. We do have a Swift wrapper in some of our sample applications, but the Swift wrapper, right now, only works in a few scenarios. We're working at generalizing that. So for now, I'm just going to show C++ code. Any C++ fans, here? Oh, wow. More than I thought. Nice. So then your C++ code, it's exactly the same model as I was just showing. So first of all, I'm going to create an interpreter, and I'm going to call that interpreter. And now, I'm going to create two buffers. So these are buffers of unsigned ints. One buffer is my input buffer that I call ibuffer. The other buffer is my output buffer that I call obuffer. And both of these, the interpreter, I'm just going to say, hey, use a typed_tensor for these. So that's my input. That's my output. And when I tflite.run, it's going to read from the input, write to the output. Now, I have an output buffer, and I can just get my inference back from that output buffer. So that was a quick tour of TF Lite, how you can build your model, save it as a TF Lite, and I forgot to show-- oh, no. I did show, sorry, where you can actually download as a TF Lite. But I can demo it now running an Android. So if we can switch back to the laptop? So I'm going to go to Android Studio. And I've tried to make the font big enough. We can all see the font. And let me just scroll that down a little bit. So this was the Kotlin code that I showed a moment ago, and the simplest possible application that I could build is this one. So it's a simple Android application. It's got one button on it. That button says do inference. When you push that, it's hard code. It'll pass 100 to the model and get back the response for the model. I have it in debug mode. I have some breakpoints set. So let's take a look at what happens. So once I click Do Inference, I'm hitting this breakpoint now in Android Studio. And I've set up my input inputVal, and we can see my inputVal is containing just 100. And if I step over, my outputVal has been set up. It's a direct byte buffer. Right now, position is 0. Its capacity is 4. I'm going to set its order, and then I'm going to pass the inputVal containing 100, the outputVal, which is my empty 4 byte buffer, to tflite.run. Execute that, and the TF Lite interpreter has done its job, and it's written back to my outputVal. But I can't read that yet. Remember earlier, the position was 1. The limit was 4. The capacity was 4. It's written to it, now. So that buffer is full. So when I rewind, now, we can see my position has gone back to 0. So I know I can start reading from that buffer. So I'm going to say outputVal.getFloat, and we'll see the prediction that comes back is that 211.31. So that model has been wrapped by the TF Lite runtime. I've given it the input buffer. I've given that the output buffer. I've executed it, and it's given me back that code. And actually, there's one really cool Kotlin language feature that I want to demonstrate, here. I don't know if anybody has seen this before. This might be the first time we've actually ever shown this on stage. But if I want to run this model again, you'll notice that there are line numbers here. All I have to do is type goto 50. I'm seeing who's still awake. Of course, it's gosub 50. So that's just a very quick and very simple example of how this would run in Android. And again, that sample is online. It's on GitHub. I've put it on GitHub so that we can have a play with it. All right, if we can switch back to the slides? So the third of the options that I had mentioned-- first was ML Kit. Second was to TF Lite. The third of the options was then to be able to use JavaScript and to be able to run your model in a browser. So TensorFlow.js is your friend. So the idea is that with TensorFlow.js, in your Python, when you're building the model, you've PIP installed a library called TensorFlow.js. And then that gives you a command called TensorFlow.js converter. With TensorFlow.js converter, if you'd saved that as a saved model, as we showed earlier on, you just say, hey, my input format's a SavedModel. Here's the directory the SavedModel is in. Here's the directory I want you to write it to. So now, once it's done that, it's actually going to take that SavedModel and convert that into a JSON object. So now, in a super, super simple web page-- and this QR code, again, has that web page-- now, all I have to do is say, here's the URL of that model.json. And I will say const model = await tf.loadLayersModel, giving it that URL. So if you're using TensorFlow.js in your browser with that script tag right at the top of the page, now, that model is loaded from a JSON serialization. And I can start running inference on that model in the browser. So here was how I would use it. Again, I'm setting up my inputs. And JavaScript, you know earlier, I was saying you're going to pass in arrays, and you get out arrays, except in a high level language? Sorry, you pass in tensors. You get out tensors. High level language tends to wrap them in arrays. TensorFlow.js actually gives you a tensor 2D object, and that's what I'm using here. So the tensor 2D object takes two parameters. The first parameter is the array that you want to pass in, and you can see here that array is just the value 10. It's a single item array. And then the second parameter is the shape of that array. So here, the first parameter is a 10. The second parameter is 1, 1. And that's the shape of that array. It's just a 1 by 1 array. So once I've done that, and I have my input, now, if I want to run inference using the model, all I have to do is say model.predictinput. And it will give me back my results. In this case, I was alerting the results. But in my demo, I'm actually going to write it. So if we can switch back to the demo box? And I have that super simple web page hosted on the web. And I've put that model in there, and it's going to run. This is a slightly different model. I'll show training that model in a moment. This was just the model where y equals 2x minus 1. So I'm doing an inference where x equals 10. And if x equals 10, y equals 2x minus 1 will give you 19. And when I train the model on six items of data, it says 18.97. So again, all I do is, in Python, I can train the model. With TensorFlow.js, I can then convert that model to a JSON object. And then in TensorFlow.js, I can instantiate a model off of that JSON and then start doing predictions in that model. If we can switch back to the demo machine for a moment? Oh, no. I'm still on the demo machine, aren't I? I can show that in action in a notebook. I lost my notebook. So this notebook is also available, where I gave those QR codes. And this notebook, again, is a very similar one to the one I showed earlier on super, super simple neural network, single layer with a single neuron. I'm not going to step through all of it now. But the idea is if you PIP install TensorFlow.js right now in Google Colab, it will upgrade Google Colab from TensorFlow 1.13 to TensorFlow 2. So if you run through this and install that, you'll see that happening. And then once you have TensorFlow 2 on your machine, then you can use the TensorFlow.js converter, as shown here, giving it the input format and giving it the SavedModel from the directory as I'd done earlier on. And it will write out to /temp/linear. The one thing to take note of, though, if you are doing this yourself is that when it writes to that, it won't just write the JSON file. It also writes a binary file. So when you upload the JSON file to the web server-- to be able to create a model off of that JSON file, make sure the binary file is in the same directory as the JSON file. Or the model off of the JSON file is going to give you some really weird results. That's also the number one bug that I found when people have been using TensorFlow.js. It's that they will convert to the JSON file. They'll upload it to their server. They'll have no idea what that random binary file was, and they're getting indeterminate results back from that model. So make sure when you do that, you get that model. I don't know if I have one here that I prepared earlier that I can show you what it looks like. I don't. It's empty, right now. But when you run this and you write it out, you'll see that the model.json and a binary file are there. Make sure you upload both of them to use it. Can we switch back to the slides, please? So that was a quick summary, where we saw that model that you build using Python can be converted to TensorFlow Lite. You can save it as a SavedModel. You can convert it to TensorFlow Lite and use it in ML Kit, or use it directly in TensorFlow Lite itself. Alternatively, if you want to convert it to .js, then it will save it out as a JSON file. You can convert it to a JSON file and then use that in JavaScript. So that's the summary of being able to use those models on mobile devices. But now, Daniel is going to tell us all about going beyond phones and the web. So thank you, Daniel. Thank you. DANIEL SITUNAYAKE: Thank you, Laurence. [APPLAUSE] Awesome. So like Lawrence says so far, we've talked about phones. But these aren't the only devices that we use every day. So I'm going to talk about some new tools that our team has designed to help developers use machine learning everywhere. So our homes and cities are filled with devices that contain embedded computing power. And in fact, every year, literally billions of devices are manufactured that contain small but highly capable computation devices called microcontrollers. So microcontrollers are at the core of most of our digital gadgets, everything from the buttons on your microwave through to the electronics controlling your car. And our team started to ask, what if developers could deploy machine learning to all of these objects? So at the TensorFlow Dev Summit, we announced an experimental interpreter that will run TensorFlow models on microcontrollers. So this is actually a new frontier for AI. We have super cheap hardware with super huge battery life and no need for an internet connection, because we're doing offline inference. So this enables some incredible potential applications, where AI can become truly personal while still preserving privacy. We want to make it ridiculously easy for developers to build these new types of products, so we've actually worked with SparkFun to design a microcontroller development board that you can buy today. It's called the SparkFun Edge, and it's powered by an ultra efficient ARM processor that's packed with sensors and I/O ports. So you can use it to prototype embedded machine learning code. And we have example code available that shows how you can run speech recognition in a model that takes up less than 20 kilobytes of memory, which is crazy. So I'm now going to give you a quick demo of the device, and I'll show you what some of this code looks like for running inference. So you should remember before we do this, all of this is all available on our website, along with documentation and tutorials. And the really cool thing, while you're here at I/O, you should head over to the Codelabs area. And you can try hands on development with the SparkFun Edge boards. So let's switch over to the camera, here. LAURENCE MORONEY: I think that actual image was bigger than 20 kilobytes. DANIEL SITUNAYAKE: Yeah, definitely. It's kind of mind-blowing that you can fit a speech model into such a small amount of memory. So this is the device itself. So it's just a little dev board. I'm going to slide the battery in. So the program we have here, basically, is just running inference. And every second of audio that comes in, it's running through a little model that looks for a couple of hot words. You can see this light is flashing. It flashes once every time inference is run. So we're getting a pretty decent frame right, even though it's a tiny, low powered microcontroller with a coin cell battery. So what I'm going to do now is take my life in my hands and try and get it to trigger with the hot words. And hopefully, you'll see some lights flash. Yes, yes, yes. First time, not lucky. Yes, yes, yes. Yes, yes, yes. So it's not working so great when we've got the AC going, but you saw the lights lighting up there. And I basically got a really simple program that looks at the confidence score that we get from the model that the word yes was detected. And the higher the confidence, the more lights appear. So we got three lights there. So it's pretty good. Let's have a look at the code. So if we can go back to the slides? So all we do, basically, to make this work is we've got our model, which is just a plain old TensorFlow Lite model that you trained however you wanted to with the rest of our TensorFlow tool chain. And we have this model available as an array of bytes within our app. We're going to pull in some objects that we're going to use to run the interpreters. So first of all, we create a resolver, which is able to pull in the TensorFlow ops that we need to run the model. We then create some memory that is allocated for some of the working processes that are going to happen as we input data and run some of the operations. And then we build and interpret the object, which we pass all this stuff into, that is actually going to execute the model for us. So the next thing we do is basically generate some features that we're going to pass into the model. So we have some code not pictured here, which takes audio from the microphones that are on the board and transforms that into a spectrogram that we then feed into the model. Once we have done that, we invoke the model, and we get an output. So the output is just another tensor, and we can look through that tensor to find which of our classes were able to be matched. And hopefully, in this case, it was the yes that showed up as the highest probability. So all of this code is available online. We have documentation that walks you through it. And like I said, the device is available here, in I/O, in the Codelabs labs area, if you'd like to try it yourself. So tiny computers are great, but sometimes, you just need more power. So imagine you have a manufacturing plant that is using computer vision to spot faulty parts on a fast moving production line. So we recently announced the Coral platform, which provides hardware for accelerated inference at the Edge. So these are small devices still, but they use something called the Edge TPU to run machine learning models incredibly fast. So one of our development boards here can run image classification on several simultaneous video streams at 60 frames per second. So it's super awesome. We have these devices available to check out in the Codelabs area, as well. And in addition, in the ML and AI sandbox, there's a demo of showing a use case, spotting faulty parts in manufacturing. So once again, it's super easy to run TensorFlow Lite models on Coral devices. And this example shows how you can load a model, grab camera input, run inference, and annotate an output image in just a few lines of code. So all of this, again, is available online on the Coral site. So we've shown a ton of exciting stuff today, and all of it is available on TensorFlow.org and the Coral site, right now. So you'll be able to find example code, example apps, pretrained models, and everything you need to get started with deploying to device. And I've got some links up here for you. But while you're here at I/O, there are a ton of other opportunities to play with on device ML. So we have a couple of sessions that I'd like to call out, here. We have the TensorFlow Lite official talk, tomorrow, which is going to go into a lot more depth around TensorFlow Lite and the tools we have available for on device inference and converting models. And we also have a talk on What's New in Android ML, which is this evening at 6:00 PM. So you should definitely check both of those out. And in the Codelabs area, we have a load of content. So if you're just learning TensorFlow, we have a six part course you can take to basically go end to end from nothing to knowing what you're talking about. And then we have a couple of Codelabs you can use for on device ML, and I think there's a Coral codelab, as well. So thank you so much for showing up. And I hope this has been exciting, and you've got a glimpse of how you can do on device ML. Like you saw in the keynote, there's some amazing applications, and it's up to you to build this amazing, new future. So thank you so much for being here. [MUSIC PLAYING]
B1 中級 機器學習在你的設備上。選項(Google I/O'19 (Machine Learning on Your Device: The Options (Google I/O'19)) 4 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字