字幕列表 影片播放 列印英文字幕 KONSTANTINOS KATSIAPIS: Hello, everyone. Good morning. I'm Gus Katsiapis. And I'm a principal engineer in TFX. ANUSHA RAMESH: Hi, everyone. I'm Anusha. I'm a product manager in TFX. KONSTANTINOS KATSIAPIS: Today, we'll talk to you about our end-to-end ML platform, TensorFlow Extended, otherwise known as TFX, on behalf of the TFX team. So the discipline of software engineering has evolved over the last five-plus decades to a good level of maturity. If you think about it, this is both a blessing and a necessity because our lives usually depend on it. At the same time, the popularity of ML has been increasing rapidly over the last two-plus decades. And over the last decade or so, it's been used very much, very actively both in experimentation and production settings. It is no longer uncommon for ML to power widely-used applications that we use every day. So much like it was the case for software engineering, the wide use of ML technology necessitates the evolution of the discipline from ML coding to ML engineering. As most of you know, to do ML in production, you need a lot more than just a trainer. For example, the trainer code in an ML production system is usually 5% to 10% of the entirety of the code. And similarly, the amount of time that engineers spend on the trainer is often dwarfed by the amount of time engineers spend in preparing the data, ensuring it's of good quality, ensuring it's unbiased, et cetera. At the same time, research eventually makes its way into production. And ideally, one wouldn't need to change stacks in order to evolve an idea and put it into a product. So I think what is needed here is flexibility, and robustness, and a consistent system that allows you to apply ML in a product. And remember that the ML code itself is a tiny piece of the entire puzzle. ANUSHA RAMESH: Now, here is a concrete example of the difference between ML coding and ML engineering. As you can see in this use case, it took about three weeks to build a model. It's about a year. It's still not deployed in production. Similar stories used to be common at Google as well, but we made things noticeably easier over the past decade by building ML platforms like TFX. Now, ML platforms in Google is not a new thing. We've been building Google's scale machine learning platforms for quite a while now. Sibyl existed as a precursor to TFX. It started about 12 years ago. A lot of the design code and best practices that we gained through Sibyl have been incorporated into the design of TFX. Now, while TFX shares several core principles with Sibyl, it also augments it under several important dimensions. This made TFX to be the most widely used end-to-end ML platform at Alphabet, while being available on premises and on GCP. The vision of TFX is to provide an end-to-end ML platform for everyone. By providing this ML platform, our goal is to ensure that we can proliferate the use of ML engineering, thus improving ML-powered applications. But let's discuss on what it means to be an ML platform and what are the various parts that are required to help us realize this vision. KONSTANTINOS KATSIAPIS: So today, we're going to tell you a little bit more about how we enabled global-scale ML engineering at Google from best practices and libraries all the way to a full-fledged end-to-end ML platform. So let's start from the beginning. Machine learning is hard. Doing it well is harder. And doing it in production and powering applications is actually even harder. We want to help others avoid the many, many pitfalls that we have encountered in the past. And to that end, we actually publish papers, blog posts, and other material that capture a lot of our learnings and our best practices. So here are but a few examples of our publications. They capture collective lessons learned more than a decade of applied ML at Google. And several of them, like the "Rules of Machine Learning," are quite comprehensive. We won't have time to go into them today as part of this talk obviously, but we encourage you to take a look when you get a chance. ANUSHA RAMESH: While best practices are great, communication of best practices alone would not be sufficient. This does not scale because it does not get applied in code. So we want to capture our learnings and best practices in code. We want to enable our users to reuse these best practices and at the same time, give them the ability to pick and choose. To that extent, we offer standard and data parallel libraries. Now, here are a few examples of libraries that we offer for different phases of machine learning to our developers. As you can see, we offer libraries for almost every step of your ML workflow, starting from data validation to feature transformations to analyzing the quality of a model, all the ways till serving that in production. We also make transfer learning easy by providing TensorFlow Hub. Ml-metadata is a library for recording and retrieving metadata for ML workflows. Now, the best part about these libraries is that they are highly modular, which makes it easy to plug into your existing ML infrastructure. KONSTANTINOS KATSIAPIS: We have found that libraries are not enough within Alphabet, and we expect the same elsewhere. Not all users need or want the full flexibility. Some of them might actually be confused by it. And many users prefer out-of-the-box solutions. So what we do is manage the release of our libraries. We ensure they're nicely packaged and optimized, but importantly, we also offer higher-level APIs. And those come frequently in the form of binaries or components-- or containers, sorry. ANUSHA RAMESH: Libraries and binaries provide a lot of flexibility to our users, but this is not sufficient for ML workflows. ML workflows typically involve inspecting and manipulating several types of artifacts. So we provide components which interact with well-defined and strongly-typed artifact APIs. The components also understand the context and environment in which they operate in and can be interconnected with one another. We also provide UI components for visualization of the said artifacts. That brings us to a new functionality we're launching in TensorFlow World. You can run any TFX component in a notebook. As you can see here, you can run TFX components cell by cell. This example showcases a couple of components. The first one is ExampleGen. ExampleGen ingests data into a TFX pipeline. And this is typically the first component that you use. The second one is StatisticsGen, which computes statistics for visualization and example validation. So when you run a component like StatisticsGen in notebook, you can visualize something like this, which showcases stats on your data and it helps you detect anomalies. The benefit of running TFX components in a notebook is twofold. First, it makes it easy for users to onboard onto TFX. It helps you understand the various components of TFX, and how you connect them, and the order in which you can go. It also helps with debugging the various steps of your ML workflow as you go through the notebook. KONSTANTINOS KATSIAPIS: Through our experience though, we've learned that components aren't actually sufficient for production ML. Manually orchestrating components can become cumbersome and importantly error prone. And then also understanding the lineage of all the artifacts that are produced by those components-- produced or consumed by those components-- is often fundamental both from a debugging perspective, but many times from a compliance perspective as well. As such, we offer ways of creating task-driven pipelines of components. We allow you to stitch components together in a task-driven fashion. But we have also found that data scale and advanced use cases also necessitate this pipeline to actually be reactive to the environment, right? So we found that over time, we need more something like data-driven components. Now, the interesting part here is that the components we offer are the same components that could operate both in a task-driven mode and in a data-driven mode, thereby enabling more flexibility. And the most important part is that the artifact lineage is tracked throughout this ML pipeline, whether its task- or data-driven, which helps experimentation, debugging, and compliance. So here's putting it all together. Here is kind of a canonical production end-to-end ML pipeline. It starts with ExampleGeneration, StatisticGeneration to ensure the data is of good quality, proceeds with transformations to augment the data in ways that make it easier to fit the model, training the model. After we train the model, we ensure that it's of good quality. And only after we're sure it meets the quality bar that we're comfortable with do we actually push to one of the serving systems of choice, whether that's a server or a mobile application via TF Lite. Note that the pipeline topology here is fully customizable, right? So you can actually move things around as you please. And importantly, if one of the out-of-the-box components we offer doesn't work for you, you can create a custom component with custom business logic. And all of this is under a single ML pipeline. Now, what does it mean to be an end-to-end ML platform? So I think there are some key properties to it. And one is seamless integration. We want to make sure that all the components within the pipeline actually seamlessly interoperate with each other. And we have actually found that within Google, the value added for our users gets larger as they move higher up the stack-- you know, as they move higher from the libraries going further up to components and further up into the pipeline itself. This is because operating at a higher level of the abstraction allows us to give better robustness and supportability. Another important aspect of an ML platform is its interoperability with the environment it operates in. So each of those platforms might be employed in different environments-- you know, some on premises, some on GCP, et cetera. And we need to make sure that we interact with the ecosystem that you operate in. So TFX actually works with other parts of the fundamental parts of the ML ecosystem, like Kubeflow Pipelines, Apache Beam, Apache Spark, Flink, Airflow, et cetera. This interoperability also gives us something else that's very important here-- the flexibility, right? So we allow customization of components and extension points within the ML platform that allows you to if something doesn't work out of the box for you, it allows you to customize it to your business needs. TFX is by no means a perfect platform, but we strive to collect feedback and improve it, so please give it to us. ANUSHA RAMESH: Internally, TFX platform powers several Alphabet companies. Within Google, it powers several of our most important products that you're probably familiar with. Also, TFX powers by integrates with Cloud AI Platform, ML Engine, and Dataflow products, and thus helping you realize your ML needs robustly on GCP. TFX also powers several of Cloud AutoML solutions that automate and simplify ML for you, so check them out. To the external world, TFX is available as an end-to-end solution. Our friends at Twitter, who spoke at the keynote yesterday, talked about they have already published like a fascinating blog post on how they are ranking tweets on their home timeline using TensorFlow. They are using TensorFlow Model Analysis and TensorFlow Hub for sharing word embeddings. They evaluated several other technologies and frameworks and decided to go ahead with TensorFlow ecosystem for their production requirements. Similar to Twitter, we also have several other partners who are using TFX. I hope you will join us right after this talk to hear from Spotify on how they are using TFX for their production workflow needs. We also have another detailed talk later today called "TFX, Production ML Pipelines with TensorFlow." So we have two great talks-- one by Spotify, the other one a detailed talk on TFX. If you're interested in learning more, check these two talks. Visit our web page tensorflow.org/tfx to get started. Thank you. [APPLAUSE] TONY JEBARA: Very excited to be here. So my name is Tony Jebara. Today, I'm going to be talking to you about Spotify, where I work today, and how we've basically taken personalization and moved it onto TensorFlow. I'm the VP of engineering and also the head of machine learning. And I'm going to describe our experience moving onto TensorFlow and to the Google Cloud Platform and Kubeflow, which has been really an amazing experience for us and really has opened up a whole new world of possibilities. So just a quick note, as Ben was saying, before I started at Spotify, I was at Netflix. And just like today, I'm going to talk about Spotify's home page, also at Netflix, I was working on personalization algorithms and the home screen of Netflix as well. So you may be thinking, oh, that sounds like a similar job. They both have entertainment, and streaming, and home screens, and personalization, but there are fundamental differences. And I learned about those fundamental differences recently. I joined a couple of months ago, but the biggest fundamental difference to me is it's a difference in volume and scale. And I'll show you what I mean in just a second. So if you look at movies versus music or TV shows versus podcasts, you'll see that there's a very different magnitude of scale. So on the movie side, there's about 158 million Netflix users. On the music side, there's about 230 million Spotify users. That's also a different scale. Also the content really is a massively different scale problem. There's only about 5,000 movies and TV shows on the Netflix service. Whereas on Spotify, we've got about 50 million tracks and about half a million almost podcasts. So if you think about the amount of data and content you need to index, that's a huge scale difference. There's also content duration. Once you make a recommendation off the home screen on, let's say, Netflix, the user is going to consume that recommendation for 30 minutes for a TV show, maybe several seasons sometimes, two hours for a movie. Only 3 and 1/2 minutes of consumption per track, let's say, on Spotify. And they don't replay as often on, let's say, movies, but you'll replay songs very often. So it's really a very different world of speed and scale. And we're getting a lot more granular data about the users. Every 3 and 1/2 minutes, they're changing tracks, listening to something else, engaging differently with the service, and they're touching 50 million-plus pieces of content. That's really a very granular data. And that's one of the reasons why we had to move to something like TensorFlow to really be able to scale and do something that's high speed and in fact, real time. So this is our Spotify home. How many people here use Spotify? All right, so about half of you. I'm not trying to sell Spotify on anyone. I'm just trying to say that many of you are familiar with this screen. This is the home page. So this is basically driven by machine learning. And every month, hundreds of millions of users will see this home screen. And every day, tens of millions of users will see this home screen. And this is where you get to explore what we have to offer. It's a two-dimensional grid. Every image here is what we call a card. And the cards are organized into rows we call shelves. And what we like to do is move these cards and shelves around from a massive library of possible choices and place the best ones for you at the top of your screen. And so when we open up Spotify, we have a user profile. The home algorithms will score all possible cards and all possible shelves and pack your screen with the best possible cards and shelves combination for you. And we're doing this in real time based off of your choices of music, your willingness to accept the recommendation, how long you play different tracks, how long you listen to different podcasts. And we have dozens and dozens of features that are updating in real time. And every time you go back to the home page, it'll be refreshed with the ideal cards and shelves for you. And so we like to say there isn't a Spotify home page or a Spotify experience. Really, there's 230 million Spotify's-- one for each user. So how do we do this and how did we do this in the past? Well, up until our migration to GCP TensorFlow and Kubeflow, we wrote a lot of custom libraries and API in order to drive the machine learning algorithms behind this personalization effort. So the specific machine learning algorithm is a multi-armed bandit. Many of you have heard about that. It's trying to balance exploration and exploitation, trying to learn which cards and shelves are good for you and score them, but also trying out some new cards and shelves that it might not know if they're kind of hidden gems for you or not. And we have to employ counterfactual training, and log propensities, and log some small amounts randomization in order to train these systems in order to avoid large-scale A/B tests and large-scale randomization. Before we moved to TensorFlow, this was all done in custom, let's say, APIs and data libraries. And that had a lot of challenges. So we'd always have to go back and rewrite code. And if we wanted to compare different choices of the model underneath the multi-armed bandit, like logistic regression versus trees versus deep neural nets, that involved tons of custom code rewriting. And so that would make the system really brittle, hard to innovate and iterate on. And then when you finally pick something you want to roll out, when you roll it out, you're also worried that it may fail because of all this custom stitching. So then we moved over to the TensorFlow ecosystem. And we said, hey, let's move on to techniques like TensorFlow Estimators and TensorFlow Data Validation to avoid having to do all this custom work. And so for TensorFlow Estimator, what we can do is now build machine learning pipelines where we get to try a variety of models and train and evaluate them very quickly-- some things like logistic regression, boosted trees, and deep models-- and in a much faster kind of iterative process. And then also migrating out to Kubeflow as well was super valuable because that helped us manage the workload, and accelerate the pace of experimentations, and roll out. And so this has been super fast for automatically retraining, and scaling, and speeding up our machine learning training algorithms. Another thing that we really rely on heavily is TensorFlow Data Validation, which is another part of the TFX offering. One key thing we have to do is find bugs in our data pipelines and our machine learning pipelines while we're developing them, and evaluating them, and rolling them out. For example, we want to catch data issues as quickly as possible. And so one thing we can do with TFDV is quickly find out if there's some missing data or data inconsistencies in our pipelines. And we have this dashboard that quickly plots the distribution of any feature, and the counts of different data sets, and so on, and also kind more granular things like how much is the user spending on the service, what are their preferences, and so on, looking at those distributions. And we caught a bug like this one on the left, which basically was showing us that in our training data, the premier tier data samples were missing from our training pipelines. And then on the validation, the free shuffle tier data set and samples were missing from our evaluation pipeline. So this is horrible from a machine learning perspective, but we caught it quickly. We're able to now trigger alarms, and alerts, and have dashboards, and look at these distributions daily, so the machine learning engineers don't have to worry about the data pipelines into their system. So now, we have Spotify paved path, which is a machine learning infrastructure based off of Google Cloud, Kubeflow, and TensorFlow. And it has achieved significant lists off of baseline systems and popularity-based methods. And now, we're just scratching the surface. We want to do many more sophisticated machine learning types of explorations. And we really view this as an investment. It's an investment in machine learning engineers and their productivity. We don't want machine learning engineers to spend tons of time fixing custom infrastructure, and catching kind of silly bugs, and updating libraries, and having to learn bespoke types of platforms. Instead, we want to have them go on to a great kind of lingua franca platform like GCP, Kubeflow, and TensorFlow and really think about machine learning, and the user experience, and building better entertainment for the world. And that's what we want to enable, not necessarily building custom, let's say, machine learning infrastructure. And so if you're excited about working in a great platform that's got kind of a great future ahead of it, like TFX, and Google Cloud, and Kubeflow, but also working on really deep problems around entertainment and what makes people excited and engaged with a service, and music, and audio, and podcasts, then you can get this best of both worlds. We're hiring. Please look at these links and come work with us. Thank you so much. [APPLAUSE] MIKE LIANG: Good morning, everyone. My name is Mike. I'm one of the product managers on the TensorFlow team. And today, I'd like to share with you something about TensorFlow Hub. So we've seen some amazing breakthroughs on what machine learning can do over the past few years. And throughout this conference, you've heard a lot about the services and tools that have been built on top of them. Machines are becoming capable of doing a myriad of amazing things from vision to speech to natural language processing. And with TensorFlow, machine learning experts and data scientists are able to combine data, and algorithms, and computational power together to train machine learning models that are very proficient at a variety of tasks. But if your focus was to solve business problems or build new applications, how can you quickly use machine learning in your solutions? Well, this is where TensorFlow Hub comes in. TensorFlow Hub is a repository of pretrained ready-to-use models to help you solve novel business problems. It has a comprehensive collection of models from across the TensorFlow ecosystem. And you can find state-of-the-art research models here in TensorFlow Hub. Many of the models here also can be composed into new models and retrained using transfer learning. And recently, we've added a lot of new models that you can deploy straight to production from cloud to the edge through TensorFlow Lite or TensorFlow.js. And we're getting many contributions from the community as well. TensorFlow Hub's rich repository of models covers a wide range of machine learning problems. For example, in image-related tasks, we have on a variety of models for object detection, image classification, automatic image augmentation, and some new things like image generation for cell transfers. In text-related tasks, we have some of the state-of-the-art models out there, like BERT and ALBERT, and universal sentence encoders. And you've heard about some of the things that machines can deal with with BERT just yesterday. These encoders can support a wide range of natural language understanding tasks, such as question and answering, text classification, or sentence analysis. And there are also video-related models too. So if you want to do gesture recognitions, you can use some of the models there or even video generation. And we've recently actually just completely upgraded our front-end interface so that it's a lot easier to use. So many of these models can be easily found or searched going to TensorFlow Hub. We've invested a lot of energy in making these models in TensorFlow Hub easily reusable or composable into new models, where you can actually bring your own data and through transfer learning, improve the power of those models. With one line of code, you can bring these models right into TensorFlow 2. And using the high-level Keras APIs or the low-level APIs, you can actually go and retrain these models. And all these models can also be deployed straight into machine learning pipelines, like TFX, as you've heard about earlier today. Recently, we've added support for models that are ready to deploy. These pretrained models have been prepared for a wide range of environments across the TensorFlow ecosystem. So if you want to work in a web or a node-based environment, you can deploy them into TensorFlow.js or if you are working with mobile [INAUDIBLE] devices, you can employ some of these models through TensorFlow Lite. In TensorFlow Hub, you can also discover ready-to-use models for Coral edge TPU devices. And we recently start adding these. These devices combine TensorFlow Lite models with really efficient accelerators. That allows companies to create products that can run inference right on the edge. And you can learn more about that at coral.ai. So here's an example of how you can use TensorFlow Hub to do fast, artistic style transfer that can work on an arbitrary painting style or generative models. So let's say you had an image of a beautiful yellow Labrador, and you wanted to see what that style would look like in Kandinsky. Well, with one line of code, you can load one of these pretrained style transfer models from the Magenta team at Google, and then you can just apply it to your content and style image and you can get a new stylized image. And you can learn more about some simple tutorials like that in this link below. Or let's say you wanted to train a new text classifier, such as predicting whether a movie review had a positive or negative rating. Well, training a text embedding layer may take a lot of time and data to make that work well, but with TensorFlow Hub, you can pull a number of pretrained text models with just one line of code. And then you can incorporate it into TensorFlow 2. And using standard APIs like Keras, you can retrain it on your new data set just like that. We've also integrated an interactive model visualizer in beta for some of the models. And this allows you to immediately preview what the model would do and run that model within the web page or on a mobile app, like a Playground app. For example, here is a model from the Danish Mycological Society for identifying a wide range of fungi as part of the Svampeatlas project. You can directly drag an image onto the site and the model will run it in real time and show you the results, such as what mushrooms were in that image. And then you can click on it to go and get more information. Many of the TensorFlow Hub models also have Colab links, so you can play with these models with the code right inside the browser and powered by the Google infrastructure with Colab. In fact, the Google machine learning fairness team also has built some Colab notebooks that can pull text embeddings and other embeddings straight into their platform so that you can assess whether there are potential biases for a standard set of tasks. And you can come by our demo booth if you want to learn more about that. TensorFlow Hub is also powered by the community. When we launched TensorFlow Hub last year, we were sharing some of the state-of-the-art models from DeepMind and Google. But now, a wide range of publishers are beginning to share their models from a diverse set of areas, such as Microsoft AI for Earth, the Met, or NVIDIA. And these models can be used for many different tasks, such as from studying wildlife populations through these camera traps or for automatic visual defect detections in industries. And Crowdsource by Google is also generating a wide range of data through the Open Images Extended data sets. And with that, we can get an even richer set of ready-to-use models across many different specific data sets. So with hundreds of models that are pretrained and ready to use, you can use TensorFlow Hub to immediately begin using machine learning to solve some business problems. So I hope that you can come by our demo booth or go to tfhub.dev. And I'll see you there. Thank you. [APPLAUSE] UJVAL KAPASI: So the TensorFlow team with TF 2 has solved a hard problem, which is to make it easy for you to easily express your ideas and debug them in TensorFlow. This is a big step, but there are additional challenges in order for you to obtain the best results for your research or your product designs. And I'd like to talk about how NVIDIA is solving three of these challenges. The first is simple acceleration. The second is scaling to large clusters. And finally, providing code for every step of the deep learning workflow. One of the ingredients of the recent success of deep learning has been the use of GPUs for providing the necessary raw compute horsepower. This compute is like oxygen for new ideas and applications in the field of AI. So we designed and shipped Tensor Cores in our Volta and Turing GPUs in order to provide an order of magnitude more performance capability, compute capability than was previously available. And we built libraries, such as cuDNN, to ensure that all the important math functions inside of TF can run on top of Tensor Cores. And we update these regularly as new algorithms are invented. We worked with Google to provide a simple API so you can from your TensorFlow script, easily activate these routines in these libraries and train with mixed precision on top of Tensor Cores and get speed-ups for your training with examples here, for instance, 2x to 3x faster, which helps you iterate faster on your research, and also maybe within a fixed budget of time, get better results. Once you have a trained model, we provide a simple API inside of TensorFlow to activate TensorRT so you can get drastically faster latency for serving your predictions, which lets you deploy perhaps more sophisticated models or pipelines than you would be able to otherwise. But optimizing the performance of a single GPU is not enough. And let me give you an example. So Google, last year, released a model called BERT. As Jeff Dean explained yesterday, this model blew away the accuracy on a variety of language tasks compared to any approach or model previous to it. But on a single GPU, it takes months to train. Even on a server with eight GPUs, it takes more than a week. But if you can train with 32 servers, or 256 GPUs, training can complete with TensorFlow in mere hours. However, training at these large scales introduces and poses several new challenges at every level of the system. If you don't properly codesign the hardware and software and precisely tune them, then as you add more compute, you will not get a commensurate increase in performance. And I think NVIDIA is actually ideally uniquely suited to solve some of these challenges because we're building hardware from the level of the GPU to servers to supercomputers, and we're working on challenges at every level on hardware design, software design, system design, and at the boundaries of these. You know, the combination of a bunch of our work on this is the DGX SuperPOD. And to put its capabilities sort of in visceral terms, a team at NVIDIA recently was able to on the DGX SuperPOD, as part of Project Megatron, train the largest language model ever, more than 8 billion parameters, 24 times larger than BERT. Another contribution that NVIDIA is making and what we're working on is providing reliable code that anyone from individuals to enterprises can build on top of. NVIDIA is doing the hard work of optimizing, documenting, qualifying, packaging, publishing, maintaining code for a variety of models and use cases for every step of the deep learning workflow from research to production. And we're curating this code and making it available to everyone, both at ngc.nvidia.com, but also other places where developers might frequent, such as GitHub and TF Hub, which you just heard about as well. So I hope that in the short time, I was able to convey some of the problems that NVIDIA is working on, the challenges we're working on, and how we're making available to the TensorFlow community, along with Google, simple APIs for acceleration, solving scaling challenges, putting out DGX SuperPODs, building DGX SuperPODs, and curating code that anyone can build on top of for the entire deep learning workflow. Thank you for your time. I hope you enjoy the rest of the conference. ANNA ROTH: So the world is full of experts, like pathologists who can diagnose diseases, construction workers who know that if a certain tube is more than 40% obstructed, you have to turn that machine off like right now, people who work in support and know how to, like, kind of triage tickets. And one of the exciting things about kind of the past few years is that it's become increasingly easy for people who want to take some thing that they know how to do and teach it to a machine. I think the big dream is that anybody could be able to go and do that. It's what I spent my time on in the past few years. I've worked on the team that launched Cognitive Services. And I spent the past few years working on customvision.ai. It's a tool for building image classifiers and object detectors. But it really has never been easier to build machine learning models, like the tooling is really good. We're all here at TensorFlow World. Computational techniques have gotten faster, transfer learning easier to use. You have access to compute in the cloud. And then educational materials have, like, never been better, right? One of my hobbies is to go and, like, browse the fast.ai forums just to see what learners are building. And it's completely inspiring. That being said, it's actually still really hard to build a machine learning model. In particular, it's hard to build robust production-ready models. So I've worked with hundreds-- actually, by this point, thousands of customers, who are trying to automate some particular task. And a lot of these projects fail. You know, it's really easy to build your first model. And sometimes, it's actually kind of a trick, right? Like, you can get something astonishingly good in a couple of minutes. You get some data off the web, like model.fit, and like a few minutes later, I have a model that does something and it's kind of uncanny. But getting that to be robust enough to use kind of in a real environment is actually really tough. So the first problem people run into, it's actually hard to transfer your knowledge to a machine. So like this might seem trite, but when people first train object detectors, actually a lot of people don't put bounding boxes around every single object. Like, the model doesn't work. Or they get stuck on the kind of parsimoniousness. So for example, I had one guy in Seattle. People like the Seahawks. He wanted to train a Seahawks detector. He puts bounding boxes around a bunch of football players and discovers that he's actually really kind of built a football person detector, as opposed to a Seahawks detector. Like it's really upset when he kind of uploads another information from another team because the model didn't have that semantic knowledge that the user had. And so, like, you know, this is stuff you can document away, right? Like, you can kind of learn this in your first hour or so, but it speaks to the unnaturalness of the way in which we train models today. Like when you teach something to a computer, you're having to kind of give it data that represents in some way a distribution. That's not how you and I would normally teach something. And it really kind of trips people up a lot. But sure, so you grok that. You figure it out. You figure out, all right, the problem is building a data set. That's really hard to do too. And so I want to walk through one kind of hypothetical case. So I get a customer. And what they really wanted to do was recognize when people would upload it to their online photo store, like something that might be, like, personally-identifiable information. So for example, if you'd uploaded a photo of a credit card or a photo of your passport. So to start this off, they scrape some web data, right? You just, like, go. You use kind of like a search API and you get a bunch of images of credit cards off the web. You do evaluations. All right, it looks like we're going to have maybe a 1% false positive rate. Well, that's not good. I got a million user images I want to run this on. Suddenly, I have 10,000 sort of potential false positives. So then they kind of, but they build the model. Let's see how it goes. And when they try it out on real user data, it turns out that the actual false positive rate, as you might expect, is much, much, much higher. All right, so now, the user has to take another round. So now, let's add some negative classes, right? We want to be able to kind of make examples of other kinds of documents, sort of non-credit card things, et cetera, et cetera. But it's still OK, right? We're on day one or day two of the project, like this still feels good. You know, we're able to kind of make progress. It's a little more tedious. Second round-- I think you guys kind of know where this is going. It doesn't work. Still an unacceptably high number of negative examples are coming up-- way too many false positives. So now, we kind of go into kind of stage three of the experience of trying to build a usable model, which is, all right, let's collect some more data and let's go kind of label some more data. It starts to get really expensive, right? Now, something that I thought was going to take me a day in the first round, I'm on like day seven of getting a bunch of labelers, trying to get MTurk to work, and labeling kind of very large amounts of data. It turns out the model still doesn't work. So the good news was at this point, somebody said, all right, well, let's try one of these kind of interpretability techniques, [INAUDIBLE] saliency visualization. And it turns out, the problem was thumbs. So when you are using kind of-- when people take photos on their phone of something like a document, they're usually holding it, which is not what you see in web-scraped images for example, but it's kind of what you tend to do. So it turned out that they had basically built a classifier that recognized are you holding something and is your thumb in the picture? Well, that was not the goal, but OK. But this isn't just kind of a one-off problem. It happens all the time. So for example, there's that really famous nature paper from 2017 where they were doing like dermatology images. And they kind of discover, all right, well, having a ruler in an image of a mole is actually a very good signal that that might be cancerous. You might think we learned from that. Except just a couple weeks ago, I think, Walker, et al published another paper where they said having surgical markings in an image, so having marked up things around a mole, also tended to trip up the classifier because, not unsurprisingly, people don't tend to-- the training data didn't have any marked up skin for people that didn't have cancerous moles. And a lot of people, I think, particularly these people who are sometimes on our team, look at that and say it's user error, it's human error. They weren't building the right distribution of data. That's like extremely hard to do, even for experts. And even harder to do for somebody who's just getting started. Because reality, real world environments are incredibly complex. This is where projects die. Out of domain problems, which most problems people want to actually do something in a real world environment, whether it's a camera, a microphone, a website, where user inputs are unconstrained, are incredibly challenging to build good data for. One of my favorite examples, I had a customer who had built a system, [INAUDIBLE] camera, an IoT camera. And one day it hails. And it turns out, it just hadn't hailed in this town before. Model fails. You can't expect people to have had data for hail. Luckily, they had a system of multiple sensors, they had other kinds of validation, a human in the loop. It all worked out. But this thing is really challenging to do, rare events. If I want to recognize explosions, how much data am I going to have from explosions? Or we had a customer who was doing hand tracking. It turned out, the model failed the first time somebody with a hand tattoo used it. There aren't that many people with hand tattoos. But you still want your model to work in that case. Look, there's a lot of techniques for being able to do this better. But I thing it's worth recognizing that it's actually really hard to build a model that's an important problem. Once you build a model, you got to figure if it's going to work. A lot of the great work here is happening in the fairness and bias literature. But there is an overall impact for any customer or any person who's trying to build a high quality model. One of the big problems is that aggregate statistics hide failure conditions. You might make this beautiful PR curve. Even the slices that you have look really great. And then it turns out that you don't actually have a data set with all the features in your model. So let's say you're doing speech, you may not have actually created a data set that says, OK, well, this is a woman, a woman with an accent, or a child with an accent. All these subclasses become extremely important. And it becomes very expensive and difficult to actually go and figure out where your model is failing. A lot of techniques for this. Sampling techniques, pairing uninterpreted models, interpreted models, things that you can do. But it's super challenging for a beginner to figure out what their problems might be, and even for experts. You see these problems come up of railroad systems all the time. Finally, when you have a model it can be tough to actually figure out what to do with it. Most of the programs that you use don't have probabilistic outputs in the real world. What does it mean for something to be 70% likely or to have seven or eight trained models in a row? It might more obvious for you. But for an end user, it can actually hard to figure out what actions you should take. Look, nothing I've said today, I think, is particularly novel for the folks in this room. You've gone through all of these challenges before. You've built a model, you've built a data set, you've probably built it 18 times, finally gotten it to work. I had a boss who used to say that problems are inspiring. And for me, there isn't a problem that is more inspiring in figuring out how can we help anybody who wants to automate some problem be able to do so and be able to train a machine and have a robust production ready model. I can't think of a more fun problem. I can't think of a more fun problem to work on with everybody in this room. Thanks. [APPLAUSE] SARAH SIRAJUDDIN: Welcome, everyone. I'm Sarah. I'm the engineering lead for TensorFlow Lite. And I'm really happy to be here talking to you about on device machine learning. JARED DUKE: And I'm Jared, tech lead on TensorFlow Lite. And I'm reasonably excited to share with you our progress and all the latest updates. SARAH SIRAJUDDIN: So first of all, what is TensorFlow Lite? TensorFlow Lite is our production ready framework for deploying machine learning on mobile and embedded devices. It is cross-platform, so it can be used for deployment on Android, iOS Linux based space systems, as well as several other platforms. Let's talk about the need for TensorFlow Lite and why we build an on device machine learning solution. Simply put, there is now a huge demand for doing machine learning on the edge. And it is driven by a need for building user experiences which require low latency. Further factors are poor network connectivity and the need for user privacy preserving features. All of these are easier done when you're doing machine learning directly on the device. And that's why we released TensorFlow Lite late in 2017. This shows our journey since then. We've made a ton of improvements across the board in terms of the ops that we support, performance, usability, tools which allow you to optimize your models, the number of languages we support in our API, as well as the number of platform TensorFlow Lite runs on. TensorFlow Lite is deployed on more than three billion devices globally. Many of Google's own largest apps are using it, as are apps from several other external companies. This is a sampling of apps which use TensorFlow Lite. Google Photos, Gboard, YouTube, Assistant, as well as leading companies like Hike, Uber, and more. So what is TensorFlow Lite being used for? We find that our developers use it for popular use cases around text, image, and speech. But we are also seeing lots of emerging and new use cases come up in the areas of audio and content generation. This was a quick introduction about TensorFlow Lite. In the rest of this talk we are going to be focusing on sharing our latest updates and the highlights. For more details, please check out the TensorFlow Lite talk later in the day. Today I'm really excited to announce a suite of tools which will make it really easy for developers to get started with TensorFlow Lite. First up, we're introducing a new support library. This makes it really easy to preprocess and transform your data to make it ready for inferencing with a machine learning model. So let's look at an example. These are the steps that a developer typically goes through to use a model in their app once they have converted it to the TensorFlow Lite model format. Let's say they're doing image classification. So then they will likely need to write code which looks something like this. As you can see, it is a lot of code for loading, transforming, and using the data. With the new support library, the previous wall of code that I showed can be reduced significantly to this. Just a single line of code is needed for each of loading, transforming, and using the resultant classifications. Next up, we're introducing model metadata. Now model authors can provide a metadata spec when they are creating and converting models. And this makes it easier for users of the model to understand what the model does and to use it in production. Let's look at an example again. The metadata descriptor here provides additional information about what the model does, the expected format of the inputs, and what is the meaning of the outputs. Third, we've made our model repository much richer. We've added several new models across several different domains. All of them are pre-converted into the of TensorFlow Lite model formats, so you can download them and use them right away. Having a repository of ready to use models is great for getting started and trying them out. However, most of our developers will need to customize these models in some way, which is why we are releasing a set of APIs which you can use your own data to retrain these models and then use them in your app. We've heard from our developers that we need to provide better and more tutorials and examples. So we're releasing today several full examples which show code not only how to use a model but how you would write an end-to-end app. And these examples have been written for several platforms, Android, iOS, Raspberry Pi and even Edge TPU. And lastly, I'm super happy to announce that we have just launched a brand new course on how to use TensorFlow Lite on Udacity. All of these are live right now. Please check them out and give us feedback. And this brings me to another announcement that I'm very excited about. We have worked with the researchers at Google Brain to bring mobile BERT to developers through TensorFlow Lite. BERT is a method of pre-training language representations, which gets really fantastic results on a wide variety of natural language processing tasks. Google itself uses BERT extensively to understand natural text on the web. But it is having a transformational impact broadly across the industry. The model that we are releasing is up to 4.4 times faster than standard BERT, while being four times smaller with no loss in accuracy. The model is less than 100 megabytes in size. So it's usable even on lower-end phones. It's available on our site, ready for use right now. We're really excited about the new use cases this model will unlock. And to show you all how cool this technology really is, we have a demo coming up of mobile BERT running live on a phone. I'll invite Jared to show you. JARED DUKE: Thanks, Sarah. As we've heard, BERT can be used for a number of language related tasks. But today I want to demonstrate it for question answering. That is, given some body of text and a question about its content, BERT can find the answer to the question in the text. So let's take it for a spin. We have an app here which has a number of preselected Wikipedia snippets. And again, the model was not trained on any of the text in these snippets. I'm a space geek, so let's dig into the Apollo program. All right. Let's start with an easy question. [BEEPING] What did Kennedy want to achieve with the Apollo program? COMPUTER GENERATED WOMAN'S VOICE: Landing a man on the moon and returning him safely to the Earth. JARED DUKE: OK. But everybody knows that. Let's try a harder one. [BEEPING] Which program came after Mercury but before Apollo? COMPUTER GENERATED WOMAN'S VOICE: Project Gemini. JARED DUKE: Not bad. Hmm. All right, BERT, you think you're so smart, [BEEPING] Where are all the aliens? COMPUTER GENERATED WOMAN'S VOICE: Moon. JARED DUKE: There it is. [LAUGHTER] Mystery solved. Now all jokes aside, you may not have noticed that this phone is running in airplane mode. There's no connection to the server. So everything from speech recognition to the BERT model to text to speech was all running on device using ML. Pretty neat. [APPLAUSE] Now I'd like to talk about some improvements and investments we've been making in the TensorFlow Lite ecosystem focused on improving your model deployment. Let's start with performance. A key goal of TensorFlow Lite is to make your models run as fast as possible across mobile and Edge CPUs, GPUs, DSPs, and NPUs. We've made many investments across all of these fronts. We've made significant CPU improvements. We've added OpenCL support to improve GPU acceleration. And we've updated our support for all of Android Q and an API ops and features. Our previously announced Qualcomm DSP delegate, targeting mid- and low-tier devices, will be available for use in the coming weeks. We've also made some improvements in our performance and benchmark tooling to better assist both model and app developers in identifying the optimal deployment configuration. To highlight some of these improvements, let's take a look at our performance just six months ago at Google I/O using MobileNet for classification inference and compare that with the performance of today. This represents a massive reduction in latency. And you can expect this across a wide range of models and devices, both low end and high end. Just pull the latest version of TensorFlow Lite into your app and you can see these improvements today. Digging a little bit more into these numbers, floating point CPU execution is our default path. It represents a solid baseline. Enabling quantization, now easier with post-training quantization, provides three times faster inference. And enabling GPU execution provides yet more of a speedup, six times faster than our CPU baseline. And finally, for absolute peak performance, we have the Pixel 4 neural core accessible via the NNAPI TensorFlow Lite delegate. This kind of specialized accelerator, available in more and more of the latest devices, amongst capabilities and use cases that just a short time ago were thought impossible on mobile devices. But we haven't stopped there. Seamless and more robust moral conversion has been a major priority for the team. And we'd like to give an update on a completely new TensorFlow Lite model conversion pipeline. This new converter was built from the ground up to provide more intuitive error messages when conversion fails, add support for control flow, and for more advanced models, like BERT, Deep Speech v2, Mask R-CNN, and more. We're excited to announce that the new converter is available in beta, and will be available more generally soon. We also want to make it easy for any app developer to use TensorFlow Lite. And to that end, we've released a number of new first class language bindings, including Swift, Objective-C, C# for Unity, and more. This complements our existing set of bindings in C++, Java, and Python. And thanks to community efforts, we've seen the creation of additional bindings in Rust, Go, and even Dart. As an open source project, we welcome and encourage these kinds of contributions. Are model optimization toolkit remains the one-stop shop for compressing and optimizing your model. There will be a talk later today with more details. Check out that talk. We've come a long way, but we have many planned improvements. Our roadmap includes expanding the set of supported models, further improvements in performance, as well as some more advanced features, like on device personalization and training. Please check out our roadmap on tensorflow.org and give us feedback. Again, we're an open source project and we want to remain transparent about our priorities and where we're headed. I want to talk now about our efforts in enabling ML not just on billions of phones but on the hundreds of billions of embedded devices and microcontrollers that exist and are used in production globally. TensorFlow Lite for microcontrollers is that effort. It uses the same model format, the same conversion pipeline, and largely the same kernel library as TensorFlow for Lite. So what are these microcontrollers? These are the small, low power all-in-one computers that power everyday devices all around us, from microwaves and smoke detectors to sensors and toys. It can cost as little as $0.10 each. And with TensorFlow, it's possible to use them for machine learning. Arm, an industry leader in the embedded market, has adopted TensorFlow as their official solution for AI on Arm microcontrollers. And together, we've made optimizations that significantly improve performance on this embedded Arm hardware. We've also partnered with Arduino, and just launched the official Arduino TensorFlow library. This makes it possible for you to get started doing speech detection on Arduino hardware in just under five minutes. And now we'd like to demonstrate TensorFlow Lite for microcontrollers running in production. Today, if a motor breaks down, it can cause expensive downtime and maintenance costs. But using TensorFlow, it's possible to simply and affordably detect these problems before failure, dramatically reducing these costs. Mark Stubbs, co-founder of Shoreline IoT, will now give us a demo of how they're using TensorFlow to address this problem. They've developed a sensor that can be attached to a motor just like a sticker. It uses a low power, always on TensorFlow model to detect motor anomalies. And with this model, their device can run for up to five years on a single small battery, using just 45 microamps with its Ambiq Cortex-M4 CPU. Here we have a motor that will simulate an anomaly. As the RPMs increase, it'll start to vibrate and shake. And the TensorFlow model should detect this as a fault and indicate so with a red LED. All right, Mark, let's start the motor. [HIGH-PITCHED MOTOR HUMMING] Here we have a normal state. And you can see this, it's being detected with the green LED. Everything's fine. Let's crank it up. [MOTOR WHIRRING] OK. It's starting to vibrate, it's oscillating. I'm getting a little nervous and frankly, a little sweaty. Red light. Boom. OK. The TensorFlow model detected the anomaly. We could shut it down. Halloween disaster averted. Thank you, mark. [APPLAUSE] SARAH SIRAJUDDIN: That's all we have folks. Please try out TensorFlow Lite if you haven't already. And once again, we're very thankful for the contributions that we get from our community. JARED DUKE: We also have a longer talk later today. We have a demo booth. Please come by and chat with us. Thank you. [APPLAUSE] SANDEEP GUPTA: My name is Sandeep Gupta. I am the product manager for TensorFlow.js. I'm here to talk to you about machine learning in JavaScript. So you might be saying to yourself that I'm not a JavaScript developer, I use Python for machine learning, so why should I care? I'm here to show you that machine learning in JavaScript enables some amazing and useful applications, and might be the right solution for your next ML problem. So let's start by taking a look at a few examples. Earlier this year, Google released the first ever AI inspired Doodle, what you see on the top left. This was on the occasion of Johann Sebastian Bach's birth anniversary. And users were able to synthesize a back style harmony by running a machine learning model in the browser by just clicking on a few notes. Just in about three days, more than 50 million users created these harmonies, and they saved them and shared them with their friends. Another team and Google has been creating these fun experiences. One of these is called shadow art, where users are shown a symbol of a figure, and you use your hand shadow to try to match that figure. And that character comes to life. Other teams are building amazing accessibility applications, making web interfaces more accessible. On the bottom left, you see something called Createability, where a person is trying to control a keyboard simply by moving their head. And then on the bottom right is an application called Teachable Machine, which is a fun and interactive way of training and customizing a machine learning model directly in a browser. So all of these awesome applications have been made possible by TensorFlow.js. TensorFlow.js is our open source library for doing machine learning in JavaScript. You can use it in the browser, or you can use it server-side with Node.js. So why might you consider using TensorFlow.js? There are three ways you would use this. One is you can run any of the pre-existing pre-trained models and deploy them and run them using TensorFlow.js. You could use one of the models that we have packaged for you, or you can use any of your TensorFlow saved models and deploy them in the web or in other JavaScript platforms. You can retrain these models and customize them on your own data, again, using TensorFlow.js. and lastly, if you're a JavaScript developer wanting to write all your machine learning directly in JavaScript, you can use the low level ops API and from scratch build a new model using this library. So let's see why this might be useful. First, it makes machine learning really, really accessible to a web developer and a JavaScript developer. With just a few lines of code, you can bring the power of machine learning in your web application. So let's take a look at this example. Here we have two lines of code with which we are just sorting our library from our hosted scripts, and we are loading a pre-trained model. In this case, the body-pix model, which is a model that can be used to segment people in videos and images. So just with these two lines, you have the library and the model embedded in your application. Now we choose an image. We create an instance of the model. And then we call the model's estimate person segmentation method, passing it the image. And you get back an array, an object which contains the pixel mask of where there is the person present in this image. And there are other methods that can subdivide this into various body parts. And there are other rendering utilities. So just with about five lines of code, your web application has all the power of this powerful machine learning model. The library can be used both client-side and server-side. Using it client-side in browser has lots of advantages. You get the amazing interactivity and reach of browser as a platform. Your application immediately reaches all your users who have nothing to install on their end. By simply sharing the URL of your application they are up and running. You get the benefit of interactivity of browser as a platform with easy access to webcam, and microphone, and all of the sensors that are attached to the browser. Another really important point is that because these are running client-side, user data stays client-side. So this has strong implications for privacy sensitive applications. And lastly, we support GPU acceleration through WebGL. So you get great performance out of the box. Using the server-side, TensorFlow.js supports Node. Lots of enterprises use Node for their back-end operations and for a ton of their data processing. Now you can use TensorFlow directly with Node by importing any TensorFlow saved model and running it through TensorFlow.js Node. Node also has an enormous NPM package ecosystem. So you can benefit from that, and plug into the NPM repository collection. And for enterprises, where your entire back-end stack is in Node, you can now bring all of the ML into Node and maintain a single stack. A natural question to ask is, how fast is it? We have done some performance benchmarking. And I'm showing here some results from MobileNet inference time. On the left, you see results on mobile devices running client-side. And on state of the art mobile phones, you get really good performance with about 20 milliseconds inference time, which means that you can run real time applications at about 50 frames per second. Android performance has some room for improvement. Our team is heavily focused on addressing that. On the server side, because we bind to TensorFlow's native C library, we have performance parity with Python TensorFlow, both on CPU as well as on GPU. So in order to make it easy for you to get started, we have prepackaged a collection of models, pre-trained models, for most of the common ML tasks. These include things like image classification, object detection human pose and gesture detection, speech commands models for recognizing spoken words, and a bunch of text classification models for things like sentiment and toxicity. You can use these models with very easy wrapped high level APIs from our hosted scripts, or you can NPM install them. And then you can use these pre-trained models and build your applications for a variety of use cases. These include AR, VR type of applications. These include gesture-based interactions that help improve accessibility of your applications, detecting user sentiment and moderating content, conversational agents, chat bots, as well as a lot of things around front end web page optimization. These pre-trained models are a great way to get started, and they are good for many problems. However, often, you have the need to customize these models for your own use. And here, again, the power of TensorFlow.js with the interactivity of the web comes in handy. I want to show you this application called a Teachable Machine, which is a really nice way of customizing a model in just a matter of minutes. I am going to test both the demo gods as well as the time buzzer gods here and try to show this live. What you're seeing here is-- this is the Teachable Machine web page, which has the MobileNet model already loaded. We are going to be training three classes. These are these green, purple, and orange classes. We will output words. So let's say we will do rock for green, paper for purple, and scissors for red. We're going to record some images. So let's record some images for rock. I'm going to click this button here. COMPUTER GENERATED MAN'S VOICE: Rock. SANDEEP GUPTA: And I'm going to record some images for paper. COMPUTER GENERATED MAN'S VOICE: Pa-- rock. SANDEEP GUPTA: And I'm going to record some images for scissors. COMPUTER GENERATED MAN'S VOICE: Paper. SANDEEP GUPTA: OK. So there-- COMPUTER GENERATED MAN'S VOICE: Scissors. SANDEEP GUPTA: We have customized our model with these just about 50 images recorded for each class. Let's see how it works. COMPUTER GENERATED MAN'S VOICE: Rock. Paper. Rock. Paper. Rock. Scissors. Paper. Rock. Scissors. SANDEEP GUPTA: So there you go. In just a matter of-- [APPLAUSE] Pretty neat. It's really powerful to customize models like these super interactively with your own data. What if you want to train your data on somewhat of a larger scale? So here, AutoML comes in really handy. AutoML is a GCP cloud based service, which lets you bring your data to the cloud and train a custom, really high performing model specific to your application. Today, we are really excited to announce that we now support TensorFlow.js for AutoML. Meaning that you can use AutoML to train your model. And then with one click, you can export a model that's ready to be deployed in your JavaScript application. Using this feature, one of our early testers, the CVP Corporation, which is building some image classification applications for the mining industry, they were able to use this feature. And in just about five node-hours of training they improved their model accuracy from their manually trained model from 91% to 99% and get a much smaller and faster performing model. And then immediately, instantly deployed in a progressive web application for on-field use. So in addition to models, one of the big focus areas for us has been support for a variety of platforms. And because JavaScript is a versatile language which runs on a large bunch of platforms, TensorFlow.js can be used on all these different platforms. And today, again, we are really happy to announce that we now support integration with React Native. So if you are a React Native developer building cross-platform Native applications, you can use TensorFlow.js directly from within React Native and you get all the power of WebGL acceleration. We've looked at the capabilities of the library. Let's look at a couple of use cases. Modiface is an AR technology company based out of Canada. They have used TensorFlow.js to build this mobile application that runs on the WeChat mini program environment. They did this for L'Oreal, where it lets users try out these beauty products instantly running in these instant messaging applications. They had some strict criteria about model size and frame rate performance. And they were able to achieve all of those targets with TensorFlow.js running natively deployed on these mobile devices. In order to showcase the limits of what's possible with this, our team has built a fun game and an application to show how you can take a state of the art model, a very high resolution model that can do face tracking, and we have built this lip syncing game. So here what you will see is that a user is trying to lip sync to a song and a machine learning model is trying to identify the lips and trying to match it to how well you are doing lip syncing. And then because it's in JavaScript, it's in the web, we have added some visualization effects and some other AR, VR effects. So let's take a look. [MUSIC PLAYING] SPEAKER 1: (SINGING) Hey, Hey. Give me one more minute. I would. Hey, Hey. Give me on more, one more, one more. Hey, Hey. Give me one more minute. I would. Hey, Hey. Make it last for...Ohh, ohh. Hey, Hey. Give me one more minute. I would. Hey, Hey. Give me one more, one more, one more. SANDEEP GUPTA: OK. It's pretty cool. This demo, the creator of this demo is here with. He's at the TensorFlow.js demo station. Please stop by there, and you can try playing around with this. In the real world, we are beginning to see more and more applications of enterprise using TensorFlow.js in novel ways. Uber is using it for a lot of their internal ML tasks, visualization, and computation directly in the browser. And a research group in IBM is using it for on the field mobile classification of these disease carrying snails which spread certain communicable diseases. So lastly, I want to thank our community. The popularity and growth of this library is in large part due to the amazing community of our users and contributors. And thus, we are really excited to see that lot of developers are building amazing extensions and libraries on top of TensorFlow.js to extend its functionality. This was just a quick introduction to TensorFlow.js. I hope I've been able to show you that if you have a web or a Node ML use case, TensorFlow.js is the right solution for your needs. Do check out our more detailed talk later this afternoon, where our team will dive deeper into the library. And there are some amazing talks from our users showcasing some fantastic applications. tensorflow.org/js is your one source for a lot more information, more examples, getting started content, models, et cetera. You can get everything you need to get started. So with that, I would like to turn it over to Joseph Paul Cohan, who's from Mila Medical. And he will share with us an amazing use case of how their team is using TensorFlow.js. Thank you very much. [APPLAUSE] JOSEPH PAUL COHEN: Thanks. Great. I am very excited to be here today. So what I want to talk about is a chest X-ray radiology tool in the browser. We look at the classic or traditional diagnostic pipeline. There is a certain area where web based tools are used by physicians to aid them in a diagnostic decision, such as kidney donor risk or cardiovascular risk. These tools are already web based. With the advances of the learning, we now can do radiology tasks such as chest X-ray diagnostics, and now put them in the browser. Can you imagine such use cases where this is useful? In an emergency room, where you have a time-limited human. In a rural hospital where radiologists are not available or very far away. The ability for a non-expert to triage cases for an expert, saving time and money, And where we'd like to go is towards rare diseases. But we're a little data starved in this area to be able to do that. This project has been called "nice" by Yann Lecun. What we need to do to achieve this is run a state of the art chest X-ray diagnostic DenseNet in a browser. One thing, for preserving privacy of the data, while at the same time of allowing us to scale to millions of users with zero computational cost on our side. How do we achieve this? With TensorFlow.js, which allows us one second feed forward in this DenseNet model with a 12 second initial load uptime. We also need to deal with processing out-of-distribution samples, where we don't want to process images of cats or images that are not properly formatted X-rays. To do this, we're going to use an autoencoder with a SSIM score, and we're going to look at the reconstruction. And then finally, we need to compute gradients in the browser so show a saliency [INAUDIBLE] of why we made such a prediction. So we could ship two models, one computing the feed forward and the other one computing the gradient. Or we can use TensorFlow.js to compute the actual gradient graph and then compute it right in the browser, given whatever model we have already shipped. So this makes development really easy. And it's also pretty fast. Thank you. [APPLAUSE] TATIANA SHPEISMAN: Hi. I'm Tatiana. I'm going to talk today about MLIR. Before we talk about the MLIR, let's start from the basics. We are here because artificial intelligence is experiencing tremendous growth. All the three components, algorithms, data, compute have come together to change the world. Compute is really, really important because that's what enables machine learning researchers to build better algorithms to build new models. And you can see the models are becoming much, much more complex. To train a model today, we need several orders of magnitude compute capabilities than we needed several years ago. How do we build hardware which makes that possible? For those of you who are versed in hardware details, Moore's law is ending. This is also the end of Dennard scaling. We cannot anymore simply say, the next CPU is going to run at higher frequency. And because of that, that will power machine learning. What is happening in the industry is the explosion of custom hardware. And there is a lot of innovation, which is driving this compute which makes artificial intelligence possible. So if we look at what is happening, you look in your pocket. You probably have a cell phone. Inside that cell phone, most likely there is a little chip which makes artificial intelligence possible. And it's not just one chip. There is CPU, there is GPU, there is DSP, there is neural processing unit. All of that is sitting inside a little phone and seamlessly working together to make great user experience possible. In the data center, we see the explosion of specialized hardware also. Habana, specialized accelerations in CPUs, in GPUs, many different chips. We have TPUs. All of this is powering the tremendous growth of specialized compute in data centers. Once you have more specialized accelerators, that brings more complexity. And as we all know, hardware doesn't work by itself. It is powered by software. And so there is also a tremendous growth in software ecosystems for machine learning. In addition to TensorFlow, there are many other different frameworks which are trying to solve this problem. And actually, we've got a problem with the explosive growth of hardware and software. CHRIS LATTNER: So the big problem here is that none of the scales. Too much hardware, too much complexity, too much software, too many different systems that are not working together. And what's the fundamental problem? The fundamental problem is that we as a technology industry across the board are re-inventing the same kinds of tools, the same kinds of technologies, and we're not working together. And this is why you see the consequences of this. You see systems that don't interoperate because they're built by different people on different teams that solve different problems. Vendor X is working on their chip, which makes perfect sense. It doesn't really integrate with all the different software. And likewise, for the software people that can't know or work with all the hardware people. This is why you see things like you bring up your model, you try to get it to work on a new piece of hardware and it doesn't work right first time. You see this in the cracks that form between these systems, and that manifests as usability problems, or performance problems, or debugability problems. And as a user, this is not something you should have to deal with. So what do we want? What we'd really love to do is take this big problem, which has many different pieces, and make it simpler by getting people to work together. And so we've thought a lot about this. And the way we think that we can move the world forward is not by saying that there is one right way to do things. I don't think that works in a field that is growing as explosively as machine learning. Instead, what we think the right way to do this is, is to introduce building blocks. And instead of standardizing the user experience or standardizing the one right way to do machine learning, we think that we as a technology industry can standardize some of the underlying building blocks that go into these tools, that can go into the compiler for a specific chip, that can go into a translator that works between one system or the other. And if we build building blocks, we know and we can think about what we want from them. We want, of course, the best in class graph technology. That's a given. We want the best compiler technology. Compilers are really important. We want to solve not just training but also inference, mobile, and servers, and including all permutations. So training on the edge, super important, growing in popularity. We don't want this to be a new kind of technology island solution. We want this to be part of a continuous ecosystem that spans the whole problem. And so this is what MLIR is all about. MLIR is new system that we at Google have been building, that we are bringing to the industry to help solve some of these common problems that manifests in different ways. One of the things that we're really excited about is that MLIR is not just a Google technology. We are collaborating extensively with hardware makers across the industry. We're seeing a lot of excitement and a lot of adoption by people that are building the world's biggest and most popular hardware across the world. But what is MLIR? MLIR is a compiler infrastructure. And if you're not familiar compilers, what it's really saying is it's saying that it is providing that bottom level technology, low level technology that underpins building individual tools and individual systems that then get used to help with graphs and help with chips, and things like that. And so how does this work? What MLIR provides, if you look at it in contrast to other systems, is that it is not, again, a one size fits none kind of a solution. It is trying to be technology, technology that powers these systems. Like we said before, it of course, contains a state of the art compiler technology. And we have, both within Google, we have dozens of years of compiler experience within the team. But we probably have hundreds of years of compiler experience across the industry all collaborating together on this common platform. It is designed to be modular and extensible because requirements continue to change in our field. It's not designed to tell you the right way to do things as a system integrator. It's designed to provide tools so that you can solve your problems. If you dive into the compiler, there's a whole bunch of different pieces. And so there are things like low level graph transformation systems. There are things for code generation so that if you're building a chip you can handle picking the right kernel. But the point of this is that MLIR does not force you to use one common pipeline. It turns out that, while compilers for co-generation are really great, so are handwritten kernels. And if you have handwritten kernels that are tuned and optimized for your application, of course, they should slot into the same framework, should work with existing run times. And we really see MLIR as providing useful value that then can be used to solve problems. It's not trying to force everything into one box. So you may be wondering, though, for you, if you're not a compiler person or a system integrator or a chip person, what does this mean to you? So let's talk about what it means for TensorFlow. TATIANA SHPEISMAN: What it means for TensorFlow is it allows us to build a better system because integrating TensorFlow with the myriad of specialized hardware is really a hard problem. And with MLIR, we can build a unified infrastructure layer, which will make it much simpler for TensorFlow to seamlessly work with any hardware chip which comes out. For you as a Python developer, it simply means better development experiences. A lot of things that today might be not working as smoothly as we would like them to can be it is resolved by MLIR. This is just one example. You write a model. You try to run it through the TensorFlow Lite converter. You get an error. You have no clue what it is. And now we see issues on GitHub and try to help you. With MLIR, you will get an error message that says, this is the line of Python code which caused the problem. You can look at it and fix the problem yourself. And just to summarize, the reason we are building in MLIR is because we want to move faster and we want the industry to move faster with us. One of the keys to make industry work well together is neutral governance. And that's why we submitted MLIR as a project to LLVM. Now it is part of LLVM system. The code is moving soon. This is very important because LLVM has a 20-year history of neutral governance and building the infrastructure which is used by everybody in the world. And this is just the beginning. Please stay tuned. We are building a global community around MLIR. Once we are done, ML will be better for everybody. And we will see much faster advance of artificial intelligence in the world. ANKUR NARANG: I'm Ankur. I work at Hike. And I lead AI innovations over there in various areas, which I'm going to talk today. Formally, I have been working with IBM Research in New Delhi, and also some research labs here in Menlo Park. Here are some various use cases that we do using AI. The fundamental being Hike as a platform for messaging. And now we are driving a new social future. We are looking at a more visual way of expressing interactions between the users. So instead of typing messages in a laborious way, if one could use and get recommended stickers which could express the same way in a much efficient fashion, in a more expressive fashion, then it would be a more interesting and engaging conversation. So the first use case is essentially across multi-linguistic sticker recommendations, where basically, we address around eight to nine languages currently in India. And as we expand internationally, we will be expressing more number of languages. So we want to go hyperlocal and then as well as hyperpersonal. From a hyperlocal perspective, we want to address the needs of a person from his or her own personal language perspective. When you type, you would automatically get stickers recommended in the corresponding native language of the person. The second one is friend recommendation using social network analysis and deep learning, where we use graph embeddings and deep learning to recommend friends. The next one essentially is around fraud analytics. We have lots of click farms, where people try to misuse the rewards that are given on the platform in a B2C setting. And therefore, you need interesting, deep learning techniques and anomaly detection to address known knowns, known unknowns, and the unknown unknowns. Another one essentially is around campaign tuning, hyperpersonalization, and optimization to be able to address the needs of every user and make the experience engaging and extremely interactive. And finally, we have interesting sticker processing using vision models and graphics, which will be coming soon in later releases. Going further, we have a strong AI research focus. So we are passionate about research. We have multiple publications in ECIR this year, IJCAI demo. And we have an ArXiv publication. And we have [INAUDIBLE] to areas not directly related to messaging. But we had an ICML workshop paper, as well. Fundamentally, the kind of problems we address need to look at extensions and basically address the limitations of supervised learning problems. We need to address cases where there's a long tail of data, very less labels available, limited number of labels available, very costly to get those labels. And the same problems occur in NLP, Vision, reinforcement learning, and stuff like that. We are looking at meta learning formulations to address this stuff. At Hike, we are looking at 4 billion events per day across millions of users. We collect a terabytes of data, essentially using the Google Cloud with various tools on Google Cloud, including KubeFlow, BigQuery, Data Proc, and Dataflow. We use it for some of the use cases which I mentioned earlier. Essentially, I will look into one particular use case right now. It is on stickers. Stickers, as I mentioned, are powerful expressions of emotions, context, and with various visual expressions over there. The key challenge over there is discovery. If you have tens of thousands of stickers now going into millions and further into billions of stickers, how do you discover these stickers and be able to exchange at real time with a few milliseconds of latency while you are typing of personal interest? What we want to solve essentially is a chat context with time, event of the day, situation, recent messages, gender, language. And we want to predict what's the sticker that's most relevant to it. Building this, essentially, one needs to look at all the different ways a particular text is typed. One needs to aggregate, essentially, the semantically similar phrases to have the right encoding across these various languages and also between the languages and across the languages so that it does not affect the typing experience. And we need to deliver in the limited memory of the device as well as a few milliseconds of response time. So here in [INAUDIBLE] is a sticker recommendation flow. Where basically, given a chat on text and what the user is currently typing, we use a message model, which predicts using a classification model. It predicts the message, and those messages are mapped to the corresponding stickers. For prediction, essentially, we use a combination of TensorFlow learning at the server, TensorFlow Lite learning on the device. And in the combination, we want to deliver, basically, a few milliseconds of latency for getting the accurate stickers recommended. And here we use a combination of neural network and Trie. Obviously, we quantized the neural network on the device using TensorFlow Lite. And we are able to get the desired amount of performance. The stickers essentially, come-- so once the messages are predicted, the stickers are naturally mapped based on the tags of the stickers on what intent they are meant to deliver. And correspondingly to the message predicted, those stickers are delivered to the user. This is a complete flow. Basically, given a chat context, one predicts the message that the person is trying to express. Then one adds the user context from a hyperpersonalization perspective, consider sticker preferences, age, gender, and then goes to the relevant stickers. In the stickers, we basically, score using reinforcement learning algorithms. Maybe to begin with, then more complex going forward so that the right kind of stickers and the way people behavior on the platform is changing, the corresponding stickers also adapt to it at real time. Thank you. [APPLAUSE]
B1 中級 第2天主題演講(TF World '19) (Day 2 Keynote (TF World '19)) 3 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字