第2天主題演講（TF World '19) (Day 2 Keynote (TF World '19))

字幕列表影片播放

KONSTANTINOS KATSIAPIS: Hello, everyone.
Good morning.
I'm Gus Katsiapis.
And I'm a principal engineer in TFX.
ANUSHA RAMESH: Hi, everyone.
I'm Anusha.
I'm a product manager in TFX.
KONSTANTINOS KATSIAPIS: Today, we'll
talk to you about our end-to-end ML
platform, TensorFlow Extended, otherwise known as TFX,
on behalf of the TFX team.
So the discipline of software engineering
has evolved over the last five-plus decades
to a good level of maturity.
If you think about it, this is both a blessing and a necessity
because our lives usually depend on it.
At the same time, the popularity of ML
has been increasing rapidly over the last two-plus decades.
And over the last decade or so, it's
been used very much, very actively
both in experimentation and production settings.
It is no longer uncommon for ML to power
widely-used applications that we use every day.
So much like it was the case for software engineering,
the wide use of ML technology necessitates the evolution
of the discipline from ML coding to ML engineering.
As most of you know, to do ML in production,
you need a lot more than just a trainer.
For example, the trainer code in an ML production system
is usually 5% to 10% of the entirety of the code.
And similarly, the amount of time
that engineers spend on the trainer
is often dwarfed by the amount of time engineers
spend in preparing the data, ensuring it's of good quality,
ensuring it's unbiased, et cetera.
At the same time, research eventually
makes its way into production.
And ideally, one wouldn't need to change stacks
in order to evolve an idea and put it into a product.
So I think what is needed here is flexibility, and robustness,
and a consistent system that allows
you to apply ML in a product.
And remember that the ML code itself
is a tiny piece of the entire puzzle.
ANUSHA RAMESH: Now, here is a concrete example
of the difference between ML coding and ML engineering.
As you can see in this use case, it took about three weeks
to build a model.
It's about a year.
It's still not deployed in production.
Similar stories used to be common at Google as well,
but we made things noticeably easier over the past decade
by building ML platforms like TFX.
Now, ML platforms in Google is not a new thing.
We've been building Google's scale machine learning
platforms for quite a while now.
Sibyl existed as a precursor to TFX.
It started about 12 years ago.
A lot of the design code and best practices
that we gained through Sibyl have been incorporated
into the design of TFX.
Now, while TFX shares several core principles with Sibyl,
it also augments it under several important dimensions.
This made TFX to be the most widely used end-to-end ML
platform at Alphabet, while being available
on premises and on GCP.
The vision of TFX is to provide an end-to-end ML
platform for everyone.
By providing this ML platform, our goal
is to ensure that we can proliferate
the use of ML engineering, thus improving
ML-powered applications.
But let's discuss on what it means to be an ML platform
and what are the various parts that are required
to help us realize this vision.
KONSTANTINOS KATSIAPIS: So today, we're
going to tell you a little bit more
about how we enabled global-scale ML
engineering at Google from best practices
and libraries all the way to a full-fledged end-to-end ML
platform.
So let's start from the beginning.
Machine learning is hard.
Doing it well is harder.
And doing it in production and powering applications
is actually even harder.
We want to help others avoid the many, many pitfalls that we
have encountered in the past.
And to that end, we actually publish papers, blog posts,
and other material that capture a lot of our learnings
and our best practices.
So here are but a few examples of our publications.
They capture collective lessons learned more than a decade
of applied ML at Google.
And several of them, like the "Rules of Machine Learning,"
are quite comprehensive.
We won't have time to go into them today
as part of this talk obviously, but we
encourage you to take a look when you get a chance.
ANUSHA RAMESH: While best practices are great,
communication of best practices alone would not be sufficient.
This does not scale because it does not get applied in code.
So we want to capture our learnings
and best practices in code.
We want to enable our users to reuse these best practices
and at the same time, give them the ability to pick and choose.
To that extent, we offer standard and data parallel
libraries.
Now, here are a few examples of libraries
that we offer for different phases of machine
learning to our developers.
As you can see, we offer libraries for almost every step
of your ML workflow, starting from data validation
to feature transformations to analyzing
the quality of a model, all the ways
till serving that in production.
We also make transfer learning easy by providing TensorFlow
Hub.
Ml-metadata is a library for recording and retrieving
metadata for ML workflows.
Now, the best part about these libraries
is that they are highly modular, which
makes it easy to plug into your existing ML infrastructure.
KONSTANTINOS KATSIAPIS: We have found that libraries are not
enough within Alphabet, and we expect the same elsewhere.
Not all users need or want the full flexibility.
Some of them might actually be confused by it.
And many users prefer out-of-the-box solutions.
So what we do is manage the release of our libraries.
We ensure they're nicely packaged and optimized,
but importantly, we also offer higher-level APIs.
And those come frequently in the form
of binaries or components-- or containers, sorry.
ANUSHA RAMESH: Libraries and binaries
provide a lot of flexibility to our users,
but this is not sufficient for ML workflows.
ML workflows typically involve inspecting and manipulating
several types of artifacts.
So we provide components which interact
with well-defined and strongly-typed artifact APIs.
The components also understand the context and environment
in which they operate in and can be interconnected
with one another.
We also provide UI components for visualization
of the said artifacts.
That brings us to a new functionality we're
launching in TensorFlow World.
You can run any TFX component in a notebook.
As you can see here, you can run TFX components cell by cell.
This example showcases a couple of components.
The first one is ExampleGen. ExampleGen ingests data
into a TFX pipeline.
And this is typically the first component that you use.
The second one is StatisticsGen, which
computes statistics for visualization and example
validation.
So when you run a component like StatisticsGen in notebook,
you can visualize something like this,
which showcases stats on your data
and it helps you detect anomalies.
The benefit of running TFX components in a notebook
is twofold.
First, it makes it easy for users to onboard onto TFX.
It helps you understand the various components of TFX,
and how you connect them, and the order in which you can go.
It also helps with debugging the various steps of your ML
workflow as you go through the notebook.
KONSTANTINOS KATSIAPIS: Through our experience though,
we've learned that components aren't actually
sufficient for production ML.
Manually orchestrating components
can become cumbersome and importantly error prone.
And then also understanding the lineage of all the artifacts
that are produced by those components-- produced
or consumed by those components--
is often fundamental both from a debugging perspective,
but many times from a compliance perspective as well.
As such, we offer ways of creating task-driven pipelines
of components.
We allow you to stitch components together
in a task-driven fashion.
But we have also found that data scale and advanced use
cases also necessitate this pipeline to actually
be reactive to the environment, right?
So we found that over time, we need more something
like data-driven components.
Now, the interesting part here is that the components we offer
are the same components that could operate
both in a task-driven mode and in a data-driven mode,
thereby enabling more flexibility.
And the most important part is that the artifact lineage
is tracked throughout this ML pipeline,
whether its task- or data-driven, which
helps experimentation, debugging, and compliance.
So here's putting it all together.
Here is kind of a canonical production end-to-end ML
pipeline.
It starts with ExampleGeneration,
StatisticGeneration to ensure the data is
of good quality, proceeds with transformations
to augment the data in ways that make it easier to fit
the model, training the model.
After we train the model, we ensure
that it's of good quality.
And only after we're sure it meets the quality bar
that we're comfortable with do we actually
push to one of the serving systems of choice,
whether that's a server or a mobile application via TF Lite.
Note that the pipeline topology here
is fully customizable, right?
So you can actually move things around as you please.
And importantly, if one of the out-of-the-box components we
offer doesn't work for you, you can create a custom component
with custom business logic.
And all of this is under a single ML pipeline.
Now, what does it mean to be an end-to-end ML platform?
So I think there are some key properties to it.
And one is seamless integration.
We want to make sure that all the components
within the pipeline actually seamlessly interoperate
with each other.
And we have actually found that within Google, the value
added for our users gets larger as they move higher
up the stack-- you know, as they move higher
from the libraries going further up to components and further up
into the pipeline itself.
This is because operating at a higher level of the abstraction
allows us to give better robustness and supportability.
Another important aspect of an ML platform
is its interoperability with the environment it operates in.
So each of those platforms might be
employed in different environments-- you know,
some on premises, some on GCP, et cetera.
And we need to make sure that we interact with the ecosystem
that you operate in.
So TFX actually works with other parts of the fundamental parts
of the ML ecosystem, like Kubeflow Pipelines,
Apache Beam, Apache Spark, Flink, Airflow, et cetera.
This interoperability also gives us
something else that's very important here--
the flexibility, right?
So we allow customization of components and extension points
within the ML platform that allows you
to if something doesn't work out of the box for you,
it allows you to customize it to your business needs.
TFX is by no means a perfect platform,
but we strive to collect feedback and improve it,
so please give it to us.
ANUSHA RAMESH: Internally, TFX platform powers
several Alphabet companies.
Within Google, it powers several of our most important products
that you're probably familiar with.
Also, TFX powers by integrates with Cloud AI Platform, ML
Engine, and Dataflow products, and thus
helping you realize your ML needs robustly on GCP.
TFX also powers several of Cloud AutoML solutions
that automate and simplify ML for you, so check them out.
To the external world, TFX is available
as an end-to-end solution.
Our friends at Twitter, who spoke at the keynote yesterday,
talked about they have already published
like a fascinating blog post on how
they are ranking tweets on their home timeline using TensorFlow.
They are using TensorFlow Model Analysis and TensorFlow Hub
for sharing word embeddings.
They evaluated several other technologies and frameworks
and decided to go ahead with TensorFlow
ecosystem for their production requirements.
Similar to Twitter, we also have several other partners
who are using TFX.
I hope you will join us right after this talk
to hear from Spotify on how they are
using TFX for their production workflow needs.
We also have another detailed talk later today
called "TFX, Production ML Pipelines with TensorFlow."
So we have two great talks-- one by Spotify,
the other one a detailed talk on TFX.
If you're interested in learning more, check these two talks.
Visit our web page tensorflow.org/tfx to get
started.
Thank you.
[APPLAUSE]
TONY JEBARA: Very excited to be here.
So my name is Tony Jebara.
Today, I'm going to be talking to you about Spotify,
where I work today, and how we've
basically taken personalization and moved it onto TensorFlow.
I'm the VP of engineering and also
the head of machine learning.
And I'm going to describe our experience
moving onto TensorFlow and to the Google Cloud Platform
and Kubeflow, which has been really an amazing experience
for us and really has opened up a whole new world
of possibilities.
So just a quick note, as Ben was saying, before I started
at Spotify, I was at Netflix.
And just like today, I'm going to talk about Spotify's home
page, also at Netflix, I was working
on personalization algorithms and the home screen of Netflix
as well.
So you may be thinking, oh, that sounds like a similar job.
They both have entertainment, and streaming,
and home screens, and personalization,
but there are fundamental differences.
And I learned about those fundamental differences
recently.
I joined a couple of months ago, but the biggest
fundamental difference to me is it's a difference in volume
and scale.
And I'll show you what I mean in just a second.
So if you look at movies versus music or TV
shows versus podcasts, you'll see
that there's a very different magnitude of scale.
So on the movie side, there's about 158 million Netflix
users.
On the music side, there's about 230 million Spotify users.
That's also a different scale.
Also the content really is a massively different scale
problem.
There's only about 5,000 movies and TV shows
on the Netflix service.
Whereas on Spotify, we've got about 50 million tracks
and about half a million almost podcasts.
So if you think about the amount of data and content
you need to index, that's a huge scale difference.
There's also content duration.
Once you make a recommendation off the home screen
on, let's say, Netflix, the user is
going to consume that recommendation for 30 minutes
for a TV show, maybe several seasons sometimes, two hours
for a movie.
Only 3 and 1/2 minutes of consumption per track,
let's say, on Spotify.
And they don't replay as often on, let's say, movies,
but you'll replay songs very often.
So it's really a very different world of speed and scale.
And we're getting a lot more granular data about the users.
Every 3 and 1/2 minutes, they're changing tracks, listening
to something else, engaging differently with the service,
and they're touching 50 million-plus pieces of content.
That's really a very granular data.
And that's one of the reasons why
we had to move to something like TensorFlow
to really be able to scale and do something that's high speed
and in fact, real time.
So this is our Spotify home.
How many people here use Spotify?
All right, so about half of you.
I'm not trying to sell Spotify on anyone.
I'm just trying to say that many of you
are familiar with this screen.
This is the home page.
So this is basically driven by machine learning.
And every month, hundreds of millions of users
will see this home screen.
And every day, tens of millions of users
will see this home screen.
And this is where you get to explore what we have to offer.
It's a two-dimensional grid.
Every image here is what we call a card.
And the cards are organized into rows we call shelves.
And what we like to do is move these cards and shelves around
from a massive library of possible choices
and place the best ones for you at the top of your screen.
And so when we open up Spotify, we have a user profile.
The home algorithms will score all possible cards
and all possible shelves and pack your screen
with the best possible cards and shelves combination for you.
And we're doing this in real time based off
of your choices of music, your willingness
to accept the recommendation, how long you
play different tracks, how long you
listen to different podcasts.
And we have dozens and dozens of features
that are updating in real time.
And every time you go back to the home page,
it'll be refreshed with the ideal cards
and shelves for you.
And so we like to say there isn't a Spotify home
page or a Spotify experience.
Really, there's 230 million Spotify's-- one for each user.
So how do we do this and how did we do this in the past?
Well, up until our migration to GCP TensorFlow and Kubeflow,
we wrote a lot of custom libraries and API in order
to drive the machine learning algorithms
behind this personalization effort.
So the specific machine learning algorithm
is a multi-armed bandit.
Many of you have heard about that.
It's trying to balance exploration and exploitation,
trying to learn which cards and shelves are good for you
and score them, but also trying out some new cards and shelves
that it might not know if they're kind
of hidden gems for you or not.
And we have to employ counterfactual training,
and log propensities, and log some small amounts
randomization in order to train these systems in order
to avoid large-scale A/B tests and large-scale randomization.
Before we moved to TensorFlow, this
was all done in custom, let's say, APIs and data libraries.
And that had a lot of challenges.
So we'd always have to go back and rewrite code.
And if we wanted to compare different choices of the model
underneath the multi-armed bandit,
like logistic regression versus trees versus deep neural nets,
that involved tons of custom code rewriting.
And so that would make the system
really brittle, hard to innovate and iterate on.
And then when you finally pick something
you want to roll out, when you roll it out,
you're also worried that it may fail because
of all this custom stitching.
So then we moved over to the TensorFlow ecosystem.
And we said, hey, let's move on to techniques like TensorFlow
Estimators and TensorFlow Data Validation
to avoid having to do all this custom work.
And so for TensorFlow Estimator, what we can do
is now build machine learning pipelines
where we get to try a variety of models
and train and evaluate them very quickly--
some things like logistic regression,
boosted trees, and deep models--
and in a much faster kind of iterative process.
And then also migrating out to Kubeflow as well
was super valuable because that helped us manage the workload,
and accelerate the pace of experimentations, and roll out.
And so this has been super fast for automatically
retraining, and scaling, and speeding up our machine
learning training algorithms.
Another thing that we really rely on heavily
is TensorFlow Data Validation, which
is another part of the TFX offering.
One key thing we have to do is find bugs
in our data pipelines and our machine
learning pipelines while we're developing them,
and evaluating them, and rolling them out.
For example, we want to catch data issues
as quickly as possible.
And so one thing we can do with TFDV
is quickly find out if there's some missing data or data
inconsistencies in our pipelines.
And we have this dashboard that quickly
plots the distribution of any feature,
and the counts of different data sets, and so on, and also
kind more granular things like how much
is the user spending on the service,
what are their preferences, and so on, looking
at those distributions.
And we caught a bug like this one
on the left, which basically was showing us
that in our training data, the premier tier data
samples were missing from our training pipelines.
And then on the validation, the free shuffle tier data set
and samples were missing from our evaluation pipeline.
So this is horrible from a machine learning perspective,
but we caught it quickly.
We're able to now trigger alarms, and alerts,
and have dashboards, and look at these distributions daily,
so the machine learning engineers
don't have to worry about the data
pipelines into their system.
So now, we have Spotify paved path,
which is a machine learning infrastructure based off
of Google Cloud, Kubeflow, and TensorFlow.
And it has achieved significant lists off
of baseline systems and popularity-based methods.
And now, we're just scratching the surface.
We want to do many more sophisticated machine
learning types of explorations.
And we really view this as an investment.
It's an investment in machine learning engineers
and their productivity.
We don't want machine learning engineers to spend tons of time
fixing custom infrastructure, and catching kind
of silly bugs, and updating libraries, and having to learn
bespoke types of platforms.
Instead, we want to have them go on
to a great kind of lingua franca platform like GCP, Kubeflow,
and TensorFlow and really think about
machine learning, and the user experience,
and building better entertainment for the world.
And that's what we want to enable,
not necessarily building custom, let's
say, machine learning infrastructure.
And so if you're excited about working in a great platform
that's got kind of a great future ahead of it,
like TFX, and Google Cloud, and Kubeflow,
but also working on really deep problems around entertainment
and what makes people excited and engaged with a service,
and music, and audio, and podcasts,
then you can get this best of both worlds.
We're hiring.
Please look at these links and come work with us.
Thank you so much.
[APPLAUSE]
MIKE LIANG: Good morning, everyone.
My name is Mike.
I'm one of the product managers on the TensorFlow team.
And today, I'd like to share with you something
about TensorFlow Hub.
So we've seen some amazing breakthroughs
on what machine learning can do over the past few years.
And throughout this conference, you've
heard a lot about the services and tools that
have been built on top of them.
Machines are becoming capable of doing
a myriad of amazing things from vision
to speech to natural language processing.
And with TensorFlow, machine learning experts and data
scientists are able to combine data, and algorithms,
and computational power together to train machine learning
models that are very proficient at a variety of tasks.
But if your focus was to solve business problems
or build new applications, how can
you quickly use machine learning in your solutions?
Well, this is where TensorFlow Hub comes in.
TensorFlow Hub is a repository of pretrained ready-to-use
models to help you solve novel business problems.
It has a comprehensive collection of models
from across the TensorFlow ecosystem.
And you can find state-of-the-art research
models here in TensorFlow Hub.
Many of the models here also can be composed into new models
and retrained using transfer learning.
And recently, we've added a lot of new models
that you can deploy straight to production from cloud
to the edge through TensorFlow Lite or TensorFlow.js.
And we're getting many contributions
from the community as well.
TensorFlow Hub's rich repository of models
covers a wide range of machine learning problems.
For example, in image-related tasks,
we have on a variety of models for object detection, image
classification, automatic image augmentation,
and some new things like image generation for cell transfers.
In text-related tasks, we have some of the state-of-the-art
models out there, like BERT and ALBERT,
and universal sentence encoders.
And you've heard about some of the things
that machines can deal with with BERT just yesterday.
These encoders can support a wide range of natural language
understanding tasks, such as question and answering, text
classification, or sentence analysis.
And there are also video-related models too.
So if you want to do gesture recognitions,
you can use some of the models there or even video generation.
And we've recently actually just completely upgraded
our front-end interface so that it's a lot easier to use.
So many of these models can be easily found
or searched going to TensorFlow Hub.
We've invested a lot of energy in making these models
in TensorFlow Hub easily reusable
or composable into new models, where you can actually
bring your own data and through transfer learning,
improve the power of those models.
With one line of code, you can bring these models right
into TensorFlow 2.
And using the high-level Keras APIs or the low-level APIs,
you can actually go and retrain these models.
And all these models can also be deployed straight into machine
learning pipelines, like TFX, as you've
heard about earlier today.
Recently, we've added support for models
that are ready to deploy.
These pretrained models have been
prepared for a wide range of environments
across the TensorFlow ecosystem.
So if you want to work in a web or a node-based environment,
you can deploy them into TensorFlow.js
or if you are working with mobile [INAUDIBLE] devices,
you can employ some of these models through TensorFlow Lite.
In TensorFlow Hub, you can also discover ready-to-use models
for Coral edge TPU devices.
And we recently start adding these.
These devices combine TensorFlow Lite models
with really efficient accelerators.
That allows companies to create products
that can run inference right on the edge.
And you can learn more about that at coral.ai.
So here's an example of how you can use TensorFlow Hub to do
fast, artistic style transfer that
can work on an arbitrary painting
style or generative models.
So let's say you had an image of a beautiful yellow Labrador,
and you wanted to see what that style would
look like in Kandinsky.
Well, with one line of code, you can
load one of these pretrained style transfer models
from the Magenta team at Google, and then you
can just apply it to your content and style image
and you can get a new stylized image.
And you can learn more about some simple tutorials
like that in this link below.
Or let's say you wanted to train a new text
classifier, such as predicting whether a movie review had
a positive or negative rating.
Well, training a text embedding layer
may take a lot of time and data to make that work well,
but with TensorFlow Hub, you can pull
a number of pretrained text models
with just one line of code.
And then you can incorporate it into TensorFlow 2.
And using standard APIs like Keras,
you can retrain it on your new data set just like that.
We've also integrated an interactive model visualizer
in beta for some of the models.
And this allows you to immediately preview
what the model would do and run that model within the web page
or on a mobile app, like a Playground app.
For example, here is a model from the Danish Mycological
Society for identifying a wide range of fungi
as part of the Svampeatlas project.
You can directly drag an image onto the site
and the model will run it in real time
and show you the results, such as what
mushrooms were in that image.
And then you can click on it to go and get more information.
Many of the TensorFlow Hub models
also have Colab links, so you can play with these models
with the code right inside the browser
and powered by the Google infrastructure with Colab.
In fact, the Google machine learning fairness team also
has built some Colab notebooks that
can pull text embeddings and other embeddings
straight into their platform so that you can assess
whether there are potential biases for a standard set
of tasks.
And you can come by our demo booth
if you want to learn more about that.
TensorFlow Hub is also powered by the community.
When we launched TensorFlow Hub last year,
we were sharing some of the state-of-the-art models from
DeepMind and Google.
But now, a wide range of publishers
are beginning to share their models
from a diverse set of areas, such as Microsoft AI for Earth,
the Met, or NVIDIA.
And these models can be used for many different tasks,
such as from studying wildlife populations
through these camera traps or for automatic visual defect
detections in industries.
And Crowdsource by Google is also
generating a wide range of data through the Open Images
Extended data sets.
And with that, we can get an even richer set
of ready-to-use models across many different specific data
sets.
So with hundreds of models that are
pretrained and ready to use, you can
use TensorFlow Hub to immediately begin
using machine learning to solve some business problems.
So I hope that you can come by our demo booth
or go to tfhub.dev.
And I'll see you there.
Thank you.
[APPLAUSE]
UJVAL KAPASI: So the TensorFlow team with TF 2
has solved a hard problem, which is
to make it easy for you to easily express your ideas
and debug them in TensorFlow.
This is a big step, but there are additional challenges
in order for you to obtain the best results for your research
or your product designs.
And I'd like to talk about how NVIDIA is solving
three of these challenges.
The first is simple acceleration.
The second is scaling to large clusters.
And finally, providing code for every step of the deep learning
workflow.
One of the ingredients of the recent success of deep learning
has been the use of GPUs for providing
the necessary raw compute horsepower.
This compute is like oxygen for new ideas and applications
in the field of AI.
So we designed and shipped Tensor Cores in our Volta
and Turing GPUs in order to provide an order of magnitude
more performance capability, compute capability
than was previously available.
And we built libraries, such as cuDNN,
to ensure that all the important math functions inside of TF
can run on top of Tensor Cores.
And we update these regularly as new algorithms are invented.
We worked with Google to provide a simple API
so you can from your TensorFlow script,
easily activate these routines in these libraries
and train with mixed precision on top of Tensor Cores
and get speed-ups for your training
with examples here, for instance, 2x to 3x faster,
which helps you iterate faster on your research,
and also maybe within a fixed budget of time,
get better results.
Once you have a trained model, we
provide a simple API inside of TensorFlow
to activate TensorRT so you can get
drastically faster latency for serving your predictions, which
lets you deploy perhaps more sophisticated models
or pipelines than you would be able to otherwise.
But optimizing the performance of a single GPU is not enough.
And let me give you an example.
So Google, last year, released a model called BERT.
As Jeff Dean explained yesterday,
this model blew away the accuracy
on a variety of language tasks compared to any approach
or model previous to it.
But on a single GPU, it takes months to train.
Even on a server with eight GPUs,
it takes more than a week.
But if you can train with 32 servers, or 256 GPUs,
training can complete with TensorFlow in mere hours.
However, training at these large scales
introduces and poses several new challenges
at every level of the system.
If you don't properly codesign the hardware and software
and precisely tune them, then as you add more compute,
you will not get a commensurate increase in performance.
And I think NVIDIA is actually ideally uniquely suited
to solve some of these challenges
because we're building hardware from the level of the GPU
to servers to supercomputers, and we're
working on challenges at every level on hardware
design, software design, system design, and at
the boundaries of these.
You know, the combination of a bunch of our work on this
is the DGX SuperPOD.
And to put its capabilities sort of in visceral terms,
a team at NVIDIA recently was able to on the DGX SuperPOD,
as part of Project Megatron, train the largest language
model ever, more than 8 billion parameters,
24 times larger than BERT.
Another contribution that NVIDIA is making
and what we're working on is providing reliable code
that anyone from individuals to enterprises
can build on top of.
NVIDIA is doing the hard work of optimizing, documenting,
qualifying, packaging, publishing, maintaining code
for a variety of models and use cases
for every step of the deep learning workflow from research
to production.
And we're curating this code and making
it available to everyone, both at ngc.nvidia.com,
but also other places where developers might frequent,
such as GitHub and TF Hub, which you just heard about as well.
So I hope that in the short time,
I was able to convey some of the problems
that NVIDIA is working on, the challenges we're working on,
and how we're making available to the TensorFlow
community, along with Google, simple APIs
for acceleration, solving scaling challenges,
putting out DGX SuperPODs, building DGX SuperPODs,
and curating code that anyone can
build on top of for the entire deep learning workflow.
Thank you for your time.
I hope you enjoy the rest of the conference.
ANNA ROTH: So the world is full of experts,
like pathologists who can diagnose diseases, construction
workers who know that if a certain tube is more than 40%
obstructed, you have to turn that machine off
like right now, people who work in support and know how to,
like, kind of triage tickets.
And one of the exciting things about kind
of the past few years is that it's become increasingly easy
for people who want to take some thing that they know how to do
and teach it to a machine.
I think the big dream is that anybody could
be able to go and do that.
It's what I spent my time on in the past few years.
I've worked on the team that launched Cognitive Services.
And I spent the past few years working on customvision.ai.
It's a tool for building image classifiers and object
detectors.
But it really has never been easier to build machine
learning models, like the tooling is really good.
We're all here at TensorFlow World.
Computational techniques have gotten faster,
transfer learning easier to use.
You have access to compute in the cloud.
And then educational materials have, like, never
been better, right?
One of my hobbies is to go and, like,
browse the fast.ai forums just to see
what learners are building.
And it's completely inspiring.
That being said, it's actually still
really hard to build a machine learning model.
In particular, it's hard to build
robust production-ready models.
So I've worked with hundreds-- actually,
by this point, thousands of customers,
who are trying to automate some particular task.
And a lot of these projects fail.
You know, it's really easy to build your first model.
And sometimes, it's actually kind of a trick, right?
Like, you can get something astonishingly good
in a couple of minutes.
You get some data off the web, like model.fit,
and like a few minutes later, I have
a model that does something and it's kind of uncanny.
But getting that to be robust enough
to use kind of in a real environment
is actually really tough.
So the first problem people run into,
it's actually hard to transfer your knowledge to a machine.
So like this might seem trite, but when people first train
object detectors, actually a lot of people
don't put bounding boxes around every single object.
Like, the model doesn't work.
Or they get stuck on the kind of parsimoniousness.
So for example, I had one guy in Seattle.
People like the Seahawks.
He wanted to train a Seahawks detector.
He puts bounding boxes around a bunch of football players
and discovers that he's actually really kind of built
a football person detector, as opposed to a Seahawks detector.
Like it's really upset when he kind of uploads
another information from another team
because the model didn't have that semantic knowledge
that the user had.
And so, like, you know, this is stuff
you can document away, right?
Like, you can kind of learn this in your first hour or so,
but it speaks to the unnaturalness of the way
in which we train models today.
Like when you teach something to a computer,
you're having to kind of give it data that represents
in some way a distribution.
That's not how you and I would normally teach something.
And it really kind of trips people up a lot.
But sure, so you grok that.
You figure it out.
You figure out, all right, the problem is building a data set.
That's really hard to do too.
And so I want to walk through one kind of hypothetical case.
So I get a customer.
And what they really wanted to do
was recognize when people would upload it to their online photo
store, like something that might be, like,
personally-identifiable information.
So for example, if you'd uploaded
a photo of a credit card or a photo of your passport.
So to start this off, they scrape some web data, right?
You just, like, go.
You use kind of like a search API
and you get a bunch of images of credit cards off the web.
You do evaluations.
All right, it looks like we're going to have
maybe a 1% false positive rate.
Well, that's not good.
I got a million user images I want to run this on.
Suddenly, I have 10,000 sort of potential false positives.
So then they kind of, but they build the model.
Let's see how it goes.
And when they try it out on real user data,
it turns out that the actual false positive rate,
as you might expect, is much, much, much higher.
All right, so now, the user has to take another round.
So now, let's add some negative classes, right?
We want to be able to kind of make
examples of other kinds of documents,
sort of non-credit card things, et cetera, et cetera.
But it's still OK, right?
We're on day one or day two of the project,
like this still feels good.
You know, we're able to kind of make progress.
It's a little more tedious.
Second round-- I think you guys kind of know
where this is going.
It doesn't work.
Still an unacceptably high number of negative examples
are coming up-- way too many false positives.
So now, we kind of go into kind of stage three
of the experience of trying to build a usable model, which
is, all right, let's collect some more data
and let's go kind of label some more data.
It starts to get really expensive, right?
Now, something that I thought was
going to take me a day in the first round,
I'm on like day seven of getting a bunch of labelers,
trying to get MTurk to work, and labeling kind
of very large amounts of data.
It turns out the model still doesn't work.
So the good news was at this point,
somebody said, all right, well, let's
try one of these kind of interpretability techniques,
[INAUDIBLE] saliency visualization.
And it turns out, the problem was thumbs.
So when you are using kind of-- when people take photos
on their phone of something like a document,
they're usually holding it, which
is not what you see in web-scraped images for example,
but it's kind of what you tend to do.
So it turned out that they had basically
built a classifier that recognized
are you holding something and is your thumb in the picture?
Well, that was not the goal, but OK.
But this isn't just kind of a one-off problem.
It happens all the time.
So for example, there's that really famous nature paper
from 2017 where they were doing like dermatology images.
And they kind of discover, all right, well,
having a ruler in an image of a mole
is actually a very good signal that that might be cancerous.
You might think we learned from that.
Except just a couple weeks ago, I think,
Walker, et al published another paper
where they said having surgical markings in an image,
so having marked up things around a mole, also
tended to trip up the classifier because, not unsurprisingly,
people don't tend to--
the training data didn't have any marked up skin
for people that didn't have cancerous moles.
And a lot of people, I think, particularly these people
who are sometimes on our team, look at that
and say it's user error, it's human error.
They weren't building the right distribution of data.
That's like extremely hard to do, even for experts.
And even harder to do for somebody who's just getting
started.
Because reality, real world environments
are incredibly complex.
This is where projects die.
Out of domain problems, which most problems people
want to actually do something in a real world environment,
whether it's a camera, a microphone, a website, where
user inputs are unconstrained, are
incredibly challenging to build good data for.
One of my favorite examples, I had a customer
who had built a system, [INAUDIBLE] camera, an IoT
camera.
And one day it hails.
And it turns out, it just hadn't hailed in this town before.
Model fails.
You can't expect people to have had data for hail.
Luckily, they had a system of multiple sensors,
they had other kinds of validation,
a human in the loop.
It all worked out.
But this thing is really challenging to do, rare events.
If I want to recognize explosions,
how much data am I going to have from explosions?
Or we had a customer who was doing hand tracking.
It turned out, the model failed the first time somebody
with a hand tattoo used it.
There aren't that many people with hand tattoos.
But you still want your model to work in that case.
Look, there's a lot of techniques for being
able to do this better.
But I thing it's worth recognizing that it's actually
really hard to build a model that's an important problem.
Once you build a model, you got to figure
if it's going to work.
A lot of the great work here is happening in the fairness
and bias literature.
But there is an overall impact for any customer or any person
who's trying to build a high quality model.
One of the big problems is that aggregate statistics
hide failure conditions.
You might make this beautiful PR curve.
Even the slices that you have look really great.
And then it turns out that you don't actually
have a data set with all the features in your model.
So let's say you're doing speech,
you may not have actually created a data set that says,
OK, well, this is a woman, a woman with an accent,
or a child with an accent.
All these subclasses become extremely important.
And it becomes very expensive and difficult to actually go
and figure out where your model is failing.
A lot of techniques for this.
Sampling techniques, pairing uninterpreted models,
interpreted models, things that you can do.
But it's super challenging for a beginner
to figure out what their problems might be,
and even for experts.
You see these problems come up of railroad systems
all the time.
Finally, when you have a model it
can be tough to actually figure out what to do with it.
Most of the programs that you use
don't have probabilistic outputs in the real world.
What does it mean for something to be 70% likely
or to have seven or eight trained models in a row?
It might more obvious for you.
But for an end user, it can actually hard to figure out
what actions you should take.
Look, nothing I've said today, I think,
is particularly novel for the folks in this room.
You've gone through all of these challenges before.
You've built a model, you've built
a data set, you've probably built it 18 times,
finally gotten it to work.
I had a boss who used to say that problems are inspiring.
And for me, there isn't a problem
that is more inspiring in figuring out how can we
help anybody who wants to automate some problem be
able to do so and be able to train a machine
and have a robust production ready model.
I can't think of a more fun problem.
I can't think of a more fun problem
to work on with everybody in this room.
Thanks.
[APPLAUSE]
SARAH SIRAJUDDIN: Welcome, everyone.
I'm Sarah.
I'm the engineering lead for TensorFlow Lite.
And I'm really happy to be here talking to you
about on device machine learning.
JARED DUKE: And I'm Jared, tech lead on TensorFlow Lite.
And I'm reasonably excited to share with you our progress
and all the latest updates.
SARAH SIRAJUDDIN: So first of all, what is TensorFlow Lite?
TensorFlow Lite is our production ready framework
for deploying machine learning on mobile and embedded devices.
It is cross-platform, so it can be used for deployment
on Android, iOS Linux based space systems, as well
as several other platforms.
Let's talk about the need for TensorFlow Lite
and why we build an on device machine learning solution.
Simply put, there is now a huge demand
for doing machine learning on the edge.
And it is driven by a need for building user experiences
which require low latency.
Further factors are poor network connectivity
and the need for user privacy preserving features.
All of these are easier done when
you're doing machine learning directly on the device.
And that's why we released TensorFlow Lite late in 2017.
This shows our journey since then.
We've made a ton of improvements across the board in terms
of the ops that we support, performance, usability,
tools which allow you to optimize your models,
the number of languages we support in our API,
as well as the number of platform
TensorFlow Lite runs on.
TensorFlow Lite is deployed on more than three billion devices
globally.
Many of Google's own largest apps are using it,
as are apps from several other external companies.
This is a sampling of apps which use TensorFlow Lite.
Google Photos, Gboard, YouTube, Assistant, as well as
leading companies like Hike, Uber, and more.
So what is TensorFlow Lite being used for?
We find that our developers use it
for popular use cases around text, image, and speech.
But we are also seeing lots of emerging and new use
cases come up in the areas of audio and content generation.
This was a quick introduction about TensorFlow Lite.
In the rest of this talk we are going
to be focusing on sharing our latest
updates and the highlights.
For more details, please check out the TensorFlow Lite talk
later in the day.
Today I'm really excited to announce
a suite of tools which will make it really easy for developers
to get started with TensorFlow Lite.
First up, we're introducing a new support library.
This makes it really easy to preprocess and transform
your data to make it ready for inferencing with a machine
learning model.
So let's look at an example.
These are the steps that a developer typically
goes through to use a model in their app
once they have converted it to the TensorFlow Lite model
format.
Let's say they're doing image classification.
So then they will likely need to write code which
looks something like this.
As you can see, it is a lot of code for loading, transforming,
and using the data.
With the new support library, the previous wall
of code that I showed can be reduced significantly to this.
Just a single line of code is needed
for each of loading, transforming, and using
the resultant classifications.
Next up, we're introducing model metadata.
Now model authors can provide a metadata spec
when they are creating and converting models.
And this makes it easier for users of the model
to understand what the model does
and to use it in production.
Let's look at an example again.
The metadata descriptor here provides additional information
about what the model does, the expected format of the inputs,
and what is the meaning of the outputs.
Third, we've made our model repository much richer.
We've added several new models across
several different domains.
All of them are pre-converted into the of TensorFlow Lite
model formats, so you can download them and use them
right away.
Having a repository of ready to use models
is great for getting started and trying them out.
However, most of our developers will
need to customize these models in some way, which
is why we are releasing a set of APIs which you can use
your own data to retrain these models
and then use them in your app.
We've heard from our developers that we
need to provide better and more tutorials and examples.
So we're releasing today several full examples which show code
not only how to use a model but how you
would write an end-to-end app.
And these examples have been written
for several platforms, Android, iOS, Raspberry Pi and even
Edge TPU.
And lastly, I'm super happy to announce that we have just
launched a brand new course on how to use
TensorFlow Lite on Udacity.
All of these are live right now.
Please check them out and give us feedback.
And this brings me to another announcement
that I'm very excited about.
We have worked with the researchers at Google Brain
to bring mobile BERT to developers
through TensorFlow Lite.
BERT is a method of pre-training language representations, which
gets really fantastic results on a wide variety
of natural language processing tasks.
Google itself uses BERT extensively to understand
natural text on the web.
But it is having a transformational impact
broadly across the industry.
The model that we are releasing is up to 4.4 times faster
than standard BERT, while being four times smaller with no loss
in accuracy.
The model is less than 100 megabytes in size.
So it's usable even on lower-end phones.
It's available on our site, ready for use right now.
We're really excited about the new use
cases this model will unlock.
And to show you all how cool this technology really
is, we have a demo coming up of mobile BERT running
live on a phone.
I'll invite Jared to show you.
JARED DUKE: Thanks, Sarah.
As we've heard, BERT can be used for a number
of language related tasks.
But today I want to demonstrate it for question answering.
That is, given some body of text and a question
about its content, BERT can find the answer
to the question in the text.
So let's take it for a spin.
We have an app here which has a number of preselected Wikipedia
snippets.
And again, the model was not trained on any
of the text in these snippets.
I'm a space geek, so let's dig into the Apollo program.
All right.
Let's start with an easy question.
[BEEPING]
What did Kennedy want to achieve with the Apollo program?
COMPUTER GENERATED WOMAN'S VOICE:
Landing a man on the moon and returning him safely
to the Earth.
JARED DUKE: OK.
But everybody knows that.
Let's try a harder one.
[BEEPING]
Which program came after Mercury but before Apollo?
COMPUTER GENERATED WOMAN'S VOICE: Project Gemini.
JARED DUKE: Not bad.
Hmm.
All right, BERT, you think you're so smart,
[BEEPING]
Where are all the aliens?
COMPUTER GENERATED WOMAN'S VOICE: Moon.
JARED DUKE: There it is.
[LAUGHTER]
Mystery solved.
Now all jokes aside, you may not have
noticed that this phone is running in airplane mode.
There's no connection to the server.
So everything from speech recognition
to the BERT model to text to speech
was all running on device using ML.
Pretty neat.
[APPLAUSE]
Now I'd like to talk about some improvements and investments
we've been making in the TensorFlow Lite
ecosystem focused on improving your model deployment.
Let's start with performance.
A key goal of TensorFlow Lite is to make your models run
as fast as possible across mobile and Edge CPUs, GPUs,
DSPs, and NPUs.
We've made many investments across all of these fronts.
We've made significant CPU improvements.
We've added OpenCL support to improve GPU acceleration.
And we've updated our support for all of Android Q and an API
ops and features.
Our previously announced Qualcomm DSP delegate,
targeting mid- and low-tier devices,
will be available for use in the coming weeks.
We've also made some improvements
in our performance and benchmark tooling
to better assist both model and app
developers in identifying the optimal deployment
configuration.
To highlight some of these improvements, let's
take a look at our performance just six months ago at Google
I/O using MobileNet for classification inference
and compare that with the performance of today.
This represents a massive reduction in latency.
And you can expect this across a wide range
of models and devices, both low end and high end.
Just pull the latest version of TensorFlow Lite into your app
and you can see these improvements today.
Digging a little bit more into these numbers,
floating point CPU execution is our default path.
It represents a solid baseline.
Enabling quantization, now easier
with post-training quantization, provides three times faster
inference.
And enabling GPU execution provides yet more of a speedup,
six times faster than our CPU baseline.
And finally, for absolute peak performance,
we have the Pixel 4 neural core accessible via the NNAPI
TensorFlow Lite delegate.
This kind of specialized accelerator,
available in more and more of the latest devices,
amongst capabilities and use cases
that just a short time ago were thought impossible
on mobile devices.
But we haven't stopped there.
Seamless and more robust moral conversion
has been a major priority for the team.
And we'd like to give an update on a completely new TensorFlow
Lite model conversion pipeline.
This new converter was built from the ground up
to provide more intuitive error messages when conversion fails,
add support for control flow, and for more advanced models,
like BERT, Deep Speech v2, Mask R-CNN, and more.
We're excited to announce that the new converter is
available in beta, and will be available more generally soon.
We also want to make it easy for any app developer
to use TensorFlow Lite.
And to that end, we've released a number of new first class
language bindings, including Swift, Objective-C,
C# for Unity, and more.
This complements our existing set of bindings in C++, Java,
and Python.
And thanks to community efforts, we've
seen the creation of additional bindings
in Rust, Go, and even Dart.
As an open source project, we welcome and encourage
these kinds of contributions.
Are model optimization toolkit remains the one-stop shop
for compressing and optimizing your model.
There will be a talk later today with more details.
Check out that talk.
We've come a long way, but we have many planned improvements.
Our roadmap includes expanding the set of supported models,
further improvements in performance,
as well as some more advanced features, like on device
personalization and training.
Please check out our roadmap on tensorflow.org
and give us feedback.
Again, we're an open source project
and we want to remain transparent
about our priorities and where we're headed.
I want to talk now about our efforts
in enabling ML not just on billions of phones
but on the hundreds of billions of embedded devices
and microcontrollers that exist and are
used in production globally.
TensorFlow Lite for microcontrollers
is that effort.
It uses the same model format, the same conversion pipeline,
and largely the same kernel library as TensorFlow for Lite.
So what are these microcontrollers?
These are the small, low power all-in-one computers
that power everyday devices all around us,
from microwaves and smoke detectors to sensors and toys.
It can cost as little as $0.10 each.
And with TensorFlow, it's possible to use them
for machine learning.
Arm, an industry leader in the embedded market,
has adopted TensorFlow as their official solution
for AI on Arm microcontrollers.
And together, we've made optimizations
that significantly improve performance
on this embedded Arm hardware.
We've also partnered with Arduino,
and just launched the official Arduino TensorFlow library.
This makes it possible for you to get started
doing speech detection on Arduino hardware in just
under five minutes.
And now we'd like to demonstrate TensorFlow
Lite for microcontrollers running in production.
Today, if a motor breaks down, it
can cause expensive downtime and maintenance costs.
But using TensorFlow, it's possible
to simply and affordably detect these problems before failure,
dramatically reducing these costs.
Mark Stubbs, co-founder of Shoreline IoT,
will now give us a demo of how they're using TensorFlow
to address this problem.
They've developed a sensor that can be attached
to a motor just like a sticker.
It uses a low power, always on TensorFlow model
to detect motor anomalies.
And with this model, their device
can run for up to five years on a single small battery,
using just 45 microamps with its Ambiq Cortex-M4 CPU.
Here we have a motor that will simulate an anomaly.
As the RPMs increase, it'll start to vibrate and shake.
And the TensorFlow model should detect this as a fault
and indicate so with a red LED.
All right, Mark, let's start the motor.
[HIGH-PITCHED MOTOR HUMMING]
Here we have a normal state.
And you can see this, it's being detected with the green LED.
Everything's fine.
Let's crank it up.
[MOTOR WHIRRING]
OK.
It's starting to vibrate, it's oscillating.
I'm getting a little nervous and frankly, a little sweaty.
Red light.
Boom.
OK.
The TensorFlow model detected the anomaly.
We could shut it down.
Halloween disaster averted.
Thank you, mark.
[APPLAUSE]
SARAH SIRAJUDDIN: That's all we have folks.
Please try out TensorFlow Lite if you haven't already.
And once again, we're very thankful for the contributions
that we get from our community.
JARED DUKE: We also have a longer talk later today.
We have a demo booth.
Please come by and chat with us.
Thank you.
[APPLAUSE]
SANDEEP GUPTA: My name is Sandeep Gupta.
I am the product manager for TensorFlow.js.
I'm here to talk to you about machine learning in JavaScript.
So you might be saying to yourself that I'm not
a JavaScript developer, I use Python
for machine learning, so why should I care?
I'm here to show you that machine learning in JavaScript
enables some amazing and useful applications,
and might be the right solution for your next ML problem.
So let's start by taking a look at a few examples.
Earlier this year, Google released the first
ever AI inspired Doodle, what you see on the top left.
This was on the occasion of Johann Sebastian Bach's birth
anniversary.
And users were able to synthesize a back style harmony
by running a machine learning model in the browser
by just clicking on a few notes.
Just in about three days, more than 50 million users
created these harmonies, and they saved them and shared them
with their friends.
Another team and Google has been creating these fun experiences.
One of these is called shadow art, where
users are shown a symbol of a figure,
and you use your hand shadow to try to match that figure.
And that character comes to life.
Other teams are building amazing accessibility applications,
making web interfaces more accessible.
On the bottom left, you see something called Createability,
where a person is trying to control a keyboard simply
by moving their head.
And then on the bottom right is an application
called Teachable Machine, which is a fun and interactive way
of training and customizing a machine learning model directly
in a browser.
So all of these awesome applications
have been made possible by TensorFlow.js.
TensorFlow.js is our open source library
for doing machine learning in JavaScript.
You can use it in the browser, or you can use
it server-side with Node.js.
So why might you consider using TensorFlow.js?
There are three ways you would use this.
One is you can run any of the pre-existing pre-trained models
and deploy them and run them using TensorFlow.js.
You could use one of the models that we have packaged for you,
or you can use any of your TensorFlow saved models
and deploy them in the web or in other JavaScript platforms.
You can retrain these models and customize them
on your own data, again, using TensorFlow.js.
and lastly, if you're a JavaScript developer wanting
to write all your machine learning directly
in JavaScript, you can use the low level ops API
and from scratch build a new model using this library.
So let's see why this might be useful.
First, it makes machine learning really, really accessible
to a web developer and a JavaScript developer.
With just a few lines of code, you
can bring the power of machine learning
in your web application.
So let's take a look at this example.
Here we have two lines of code with which we are just
sorting our library from our hosted scripts,
and we are loading a pre-trained model.
In this case, the body-pix model,
which is a model that can be used to segment
people in videos and images.
So just with these two lines, you
have the library and the model embedded in your application.
Now we choose an image.
We create an instance of the model.
And then we call the model's estimate person segmentation
method, passing it the image.
And you get back an array, an object
which contains the pixel mask of where there is the person
present in this image.
And there are other methods that can subdivide this
into various body parts.
And there are other rendering utilities.
So just with about five lines of code,
your web application has all the power of this powerful machine
learning model.
The library can be used both client-side and server-side.
Using it client-side in browser has lots of advantages.
You get the amazing interactivity and reach
of browser as a platform.
Your application immediately reaches all your users
who have nothing to install on their end.
By simply sharing the URL of your application
they are up and running.
You get the benefit of interactivity
of browser as a platform with easy access
to webcam, and microphone, and all of the sensors
that are attached to the browser.
Another really important point is
that because these are running client-side,
user data stays client-side.
So this has strong implications for privacy
sensitive applications.
And lastly, we support GPU acceleration through WebGL.
So you get great performance out of the box.
Using the server-side, TensorFlow.js supports Node.
Lots of enterprises use Node for their back-end operations
and for a ton of their data processing.
Now you can use TensorFlow directly with Node
by importing any TensorFlow saved model
and running it through TensorFlow.js Node.
Node also has an enormous NPM package ecosystem.
So you can benefit from that, and plug into the NPM
repository collection.
And for enterprises, where your entire back-end stack
is in Node, you can now bring all of the ML into Node
and maintain a single stack.
A natural question to ask is, how fast is it?
We have done some performance benchmarking.
And I'm showing here some results
from MobileNet inference time.
On the left, you see results on mobile devices
running client-side.
And on state of the art mobile phones,
you get really good performance with about 20
milliseconds inference time, which
means that you can run real time applications
at about 50 frames per second.
Android performance has some room for improvement.
Our team is heavily focused on addressing that.
On the server side, because we bind to TensorFlow's native C
library, we have performance parity
with Python TensorFlow, both on CPU as well as on GPU.
So in order to make it easy for you to get started,
we have prepackaged a collection of models, pre-trained models,
for most of the common ML tasks.
These include things like image classification,
object detection human pose and gesture detection,
speech commands models for recognizing spoken words,
and a bunch of text classification
models for things like sentiment and toxicity.
You can use these models with very easy wrapped high level
APIs from our hosted scripts, or you can NPM install them.
And then you can use these pre-trained models
and build your applications for a variety of use cases.
These include AR, VR type of applications.
These include gesture-based interactions
that help improve accessibility of your applications,
detecting user sentiment and moderating content,
conversational agents, chat bots,
as well as a lot of things around front end web page
optimization.
These pre-trained models are a great way to get started,
and they are good for many problems.
However, often, you have the need
to customize these models for your own use.
And here, again, the power of TensorFlow.js
with the interactivity of the web comes in handy.
I want to show you this application
called a Teachable Machine, which
is a really nice way of customizing a model in just
a matter of minutes.
I am going to test both the demo gods as well as the time buzzer
gods here and try to show this live.
What you're seeing here is--
this is the Teachable Machine web
page, which has the MobileNet model already loaded.
We are going to be training three classes.
These are these green, purple, and orange classes.
We will output words.
So let's say we will do rock for green, paper for purple,
and scissors for red.
We're going to record some images.
So let's record some images for rock.
I'm going to click this button here.
COMPUTER GENERATED MAN'S VOICE: Rock.
SANDEEP GUPTA: And I'm going to record some images for paper.
COMPUTER GENERATED MAN'S VOICE: Pa-- rock.
SANDEEP GUPTA: And I'm going to record
some images for scissors.
COMPUTER GENERATED MAN'S VOICE: Paper.
SANDEEP GUPTA: OK.
So there--
COMPUTER GENERATED MAN'S VOICE: Scissors.
SANDEEP GUPTA: We have customized our model
with these just about 50 images recorded for each class.
Let's see how it works.
COMPUTER GENERATED MAN'S VOICE: Rock.
Paper.
Rock.
Paper.
Rock.
Scissors.
Paper.
Rock.
Scissors.
SANDEEP GUPTA: So there you go.
In just a matter of--
[APPLAUSE]
Pretty neat.
It's really powerful to customize models
like these super interactively with your own data.
What if you want to train your data on somewhat of a larger
scale?
So here, AutoML comes in really handy.
AutoML is a GCP cloud based service,
which lets you bring your data to the cloud
and train a custom, really high performing model
specific to your application.
Today, we are really excited to announce that we now support
TensorFlow.js for AutoML.
Meaning that you can use AutoML to train your model.
And then with one click, you can export
a model that's ready to be deployed in your JavaScript
application.
Using this feature, one of our early testers,
the CVP Corporation, which is building some image
classification applications for the mining industry,
they were able to use this feature.
And in just about five node-hours of training
they improved their model accuracy
from their manually trained model from 91% to 99%
and get a much smaller and faster performing model.
And then immediately, instantly deployed in a progressive web
application for on-field use.
So in addition to models, one of the big focus areas for us
has been support for a variety of platforms.
And because JavaScript is a versatile language which
runs on a large bunch of platforms,
TensorFlow.js can be used on all these different platforms.
And today, again, we are really happy to announce
that we now support integration with React Native.
So if you are a React Native developer building
cross-platform Native applications,
you can use TensorFlow.js directly from within React
Native and you get all the power of WebGL acceleration.
We've looked at the capabilities of the library.
Let's look at a couple of use cases.
Modiface is an AR technology company based out of Canada.
They have used TensorFlow.js to build
this mobile application that runs on the WeChat mini program
environment.
They did this for L'Oreal, where it lets users try out
these beauty products instantly running
in these instant messaging applications.
They had some strict criteria about model size and frame rate
performance.
And they were able to achieve all of those targets
with TensorFlow.js running natively deployed
on these mobile devices.
In order to showcase the limits of what's possible with this,
our team has built a fun game and an application
to show how you can take a state of the art model, a very
high resolution model that can do face tracking,
and we have built this lip syncing game.
So here what you will see is that a user
is trying to lip sync to a song and a machine learning
model is trying to identify the lips
and trying to match it to how well you are doing lip syncing.
And then because it's in JavaScript, it's in the web,
we have added some visualization effects
and some other AR, VR effects.
So let's take a look.
[MUSIC PLAYING]
SPEAKER 1: (SINGING) Hey, Hey.
Give me one more minute.
I would.
Hey, Hey.
Give me on more, one more, one more.
Hey, Hey.
Give me one more minute.
I would.
Hey, Hey.
Make it last for...Ohh, ohh.
Hey, Hey.
Give me one more minute.
I would.
Hey, Hey.
Give me one more, one more, one more.
SANDEEP GUPTA: OK.
It's pretty cool.
This demo, the creator of this demo is here with.
He's at the TensorFlow.js demo station.
Please stop by there, and you can
try playing around with this.
In the real world, we are beginning
to see more and more applications of enterprise
using TensorFlow.js in novel ways.
Uber is using it for a lot of their internal ML
tasks, visualization, and computation
directly in the browser.
And a research group in IBM is using it
for on the field mobile classification
of these disease carrying snails which spread
certain communicable diseases.
So lastly, I want to thank our community.
The popularity and growth of this library is in large part
due to the amazing community of our users and contributors.
And thus, we are really excited to see that lot of developers
are building amazing extensions and libraries on top
of TensorFlow.js to extend its functionality.
This was just a quick introduction to TensorFlow.js.
I hope I've been able to show you
that if you have a web or a Node ML use case,
TensorFlow.js is the right solution for your needs.
Do check out our more detailed talk later this afternoon,
where our team will dive deeper into the library.
And there are some amazing talks from our users showcasing some
fantastic applications. tensorflow.org/js is your one
source for a lot more information, more examples,
getting started content, models, et cetera.
You can get everything you need to get started.
So with that, I would like to turn it over to Joseph Paul
Cohan, who's from Mila Medical.
And he will share with us an amazing use
case of how their team is using TensorFlow.js.
Thank you very much.
[APPLAUSE]
JOSEPH PAUL COHEN: Thanks.
Great.
I am very excited to be here today.
So what I want to talk about is a chest X-ray radiology
tool in the browser.
We look at the classic or traditional diagnostic
pipeline.
There is a certain area where web based tools
are used by physicians to aid them
in a diagnostic decision, such as kidney donor
risk or cardiovascular risk.
These tools are already web based.
With the advances of the learning,
we now can do radiology tasks such as chest X-ray
diagnostics, and now put them in the browser.
Can you imagine such use cases where this is useful?
In an emergency room, where you have a time-limited human.
In a rural hospital where radiologists are not
available or very far away.
The ability for a non-expert to triage cases
for an expert, saving time and money,
And where we'd like to go is towards rare diseases.
But we're a little data starved in this area
to be able to do that.
This project has been called "nice" by Yann Lecun.
What we need to do to achieve this
is run a state of the art chest X-ray diagnostic DenseNet
in a browser.
One thing, for preserving privacy of the data,
while at the same time of allowing
us to scale to millions of users with zero computational
cost on our side.
How do we achieve this?
With TensorFlow.js, which allows us one second feed forward
in this DenseNet model with a 12 second initial load uptime.
We also need to deal with processing
out-of-distribution samples, where
we don't want to process images of cats or images
that are not properly formatted X-rays.
To do this, we're going to use an autoencoder with a SSIM
score, and we're going to look at the reconstruction.
And then finally, we need to compute gradients
in the browser so show a saliency [INAUDIBLE] of why
we made such a prediction.
So we could ship two models, one computing the feed forward
and the other one computing the gradient.
Or we can use TensorFlow.js to compute the actual gradient
graph and then compute it right in the browser,
given whatever model we have already shipped.
So this makes development really easy.
And it's also pretty fast.
Thank you.
[APPLAUSE]
TATIANA SHPEISMAN: Hi.
I'm Tatiana.
I'm going to talk today about MLIR.
Before we talk about the MLIR, let's start from the basics.
We are here because artificial intelligence is
experiencing tremendous growth.
All the three components, algorithms, data,
compute have come together to change the world.
Compute is really, really important
because that's what enables machine learning
researchers to build better algorithms to build new models.
And you can see the models are becoming much, much more
complex.
To train a model today, we need several orders of magnitude
compute capabilities than we needed several years ago.
How do we build hardware which makes that possible?
For those of you who are versed in hardware details,
Moore's law is ending.
This is also the end of Dennard scaling.
We cannot anymore simply say, the next CPU is going to run
at higher frequency.
And because of that, that will power machine learning.
What is happening in the industry
is the explosion of custom hardware.
And there is a lot of innovation,
which is driving this compute which
makes artificial intelligence possible.
So if we look at what is happening,
you look in your pocket.
You probably have a cell phone.
Inside that cell phone, most likely
there is a little chip which makes artificial intelligence
possible.
And it's not just one chip.
There is CPU, there is GPU, there is DSP,
there is neural processing unit.
All of that is sitting inside a little phone
and seamlessly working together to make
great user experience possible.
In the data center, we see the explosion
of specialized hardware also.
Habana, specialized accelerations in CPUs,
in GPUs, many different chips.
We have TPUs.
All of this is powering the tremendous growth
of specialized compute in data centers.
Once you have more specialized accelerators,
that brings more complexity.
And as we all know, hardware doesn't work by itself.
It is powered by software.
And so there is also a tremendous growth
in software ecosystems for machine learning.
In addition to TensorFlow, there are
many other different frameworks which are
trying to solve this problem.
And actually, we've got a problem
with the explosive growth of hardware and software.
CHRIS LATTNER: So the big problem here
is that none of the scales.
Too much hardware, too much complexity, too much software,
too many different systems that are not working together.
And what's the fundamental problem?
The fundamental problem is that we as a technology
industry across the board are re-inventing
the same kinds of tools, the same kinds of technologies,
and we're not working together.
And this is why you see the consequences of this.
You see systems that don't interoperate because they're
built by different people on different teams that
solve different problems.
Vendor X is working on their chip,
which makes perfect sense.
It doesn't really integrate with all the different software.
And likewise, for the software people
that can't know or work with all the hardware people.
This is why you see things like you bring up your model,
you try to get it to work on a new piece of hardware
and it doesn't work right first time.
You see this in the cracks that form between these systems,
and that manifests as usability problems, or performance
problems, or debugability problems.
And as a user, this is not something
you should have to deal with.
So what do we want?
What we'd really love to do is take this big problem, which
has many different pieces, and make
it simpler by getting people to work together.
And so we've thought a lot about this.
And the way we think that we can move the world forward
is not by saying that there is one right way to do things.
I don't think that works in a field that
is growing as explosively as machine learning.
Instead, what we think the right way to do this is,
is to introduce building blocks.
And instead of standardizing the user experience
or standardizing the one right way to do machine learning,
we think that we as a technology industry
can standardize some of the underlying building blocks that
go into these tools, that can go into the compiler
for a specific chip, that can go into a translator that
works between one system or the other.
And if we build building blocks, we know
and we can think about what we want from them.
We want, of course, the best in class graph technology.
That's a given.
We want the best compiler technology.
Compilers are really important.
We want to solve not just training but also inference,
mobile, and servers, and including all permutations.
So training on the edge, super important,
growing in popularity.
We don't want this to be a new kind of technology island
solution.
We want this to be part of a continuous ecosystem that
spans the whole problem.
And so this is what MLIR is all about.
MLIR is new system that we at Google have been building,
that we are bringing to the industry
to help solve some of these common problems that
manifests in different ways.
One of the things that we're really excited about
is that MLIR is not just a Google technology.
We are collaborating extensively with hardware makers
across the industry.
We're seeing a lot of excitement and a lot of adoption
by people that are building the world's
biggest and most popular hardware across the world.
But what is MLIR?
MLIR is a compiler infrastructure.
And if you're not familiar compilers,
what it's really saying is it's saying
that it is providing that bottom level technology, low level
technology that underpins building individual tools
and individual systems that then get used to help with graphs
and help with chips, and things like that.
And so how does this work?
What MLIR provides, if you look at it in contrast
to other systems, is that it is not,
again, a one size fits none kind of a solution.
It is trying to be technology, technology
that powers these systems.
Like we said before, it of course,
contains a state of the art compiler technology.
And we have, both within Google, we
have dozens of years of compiler experience within the team.
But we probably have hundreds of years
of compiler experience across the industry
all collaborating together on this common platform.
It is designed to be modular and extensible because requirements
continue to change in our field.
It's not designed to tell you the right way to do things
as a system integrator.
It's designed to provide tools so that you
can solve your problems.
If you dive into the compiler, there's
a whole bunch of different pieces.
And so there are things like low level graph transformation
systems.
There are things for code generation
so that if you're building a chip
you can handle picking the right kernel.
But the point of this is that MLIR does not force
you to use one common pipeline.
It turns out that, while compilers for co-generation
are really great, so are handwritten kernels.
And if you have handwritten kernels that
are tuned and optimized for your application, of course,
they should slot into the same framework,
should work with existing run times.
And we really see MLIR as providing
useful value that then can be used to solve problems.
It's not trying to force everything into one box.
So you may be wondering, though, for you, if you're not
a compiler person or a system integrator or a chip person,
what does this mean to you?
So let's talk about what it means for TensorFlow.
TATIANA SHPEISMAN: What it means for TensorFlow
is it allows us to build a better
system because integrating TensorFlow with the myriad
of specialized hardware is really a hard problem.
And with MLIR, we can build a unified infrastructure layer,
which will make it much simpler for TensorFlow
to seamlessly work with any hardware chip which comes out.
For you as a Python developer, it simply
means better development experiences.
A lot of things that today might be not working
as smoothly as we would like them to can be it
is resolved by MLIR.
This is just one example.
You write a model.
You try to run it through the TensorFlow Lite converter.
You get an error.
You have no clue what it is.
And now we see issues on GitHub and try to help you.
With MLIR, you will get an error message that says,
this is the line of Python code which caused the problem.
You can look at it and fix the problem yourself.
And just to summarize, the reason we are building in MLIR
is because we want to move faster
and we want the industry to move faster with us.
One of the keys to make industry work well together
is neutral governance.
And that's why we submitted MLIR as a project to LLVM.
Now it is part of LLVM system.
The code is moving soon.
This is very important because LLVM
has a 20-year history of neutral governance
and building the infrastructure which is
used by everybody in the world.
And this is just the beginning.
Please stay tuned.
We are building a global community around MLIR.
Once we are done, ML will be better for everybody.
And we will see much faster advance
of artificial intelligence in the world.
ANKUR NARANG: I'm Ankur.
I work at Hike.
And I lead AI innovations over there
in various areas, which I'm going to talk today.
Formally, I have been working with IBM Research
in New Delhi, and also some research labs here
in Menlo Park.
Here are some various use cases that we do using AI.
The fundamental being Hike as a platform for messaging.
And now we are driving a new social future.
We are looking at a more visual way of expressing interactions
between the users.
So instead of typing messages in a laborious way,
if one could use and get recommended stickers which
could express the same way in a much efficient fashion,
in a more expressive fashion, then
it would be a more interesting and engaging conversation.
So the first use case is essentially
across multi-linguistic sticker recommendations,
where basically, we address around eight to nine languages
currently in India.
And as we expand internationally,
we will be expressing more number of languages.
So we want to go hyperlocal and then as well as hyperpersonal.
From a hyperlocal perspective, we
want to address the needs of a person
from his or her own personal language perspective.
When you type, you would automatically
get stickers recommended in the corresponding native language
of the person.
The second one is friend recommendation
using social network analysis and deep
learning, where we use graph embeddings
and deep learning to recommend friends.
The next one essentially is around fraud analytics.
We have lots of click farms, where
people try to misuse the rewards that
are given on the platform in a B2C setting.
And therefore, you need interesting,
deep learning techniques and anomaly detection
to address known knowns, known unknowns,
and the unknown unknowns.
Another one essentially is around campaign tuning,
hyperpersonalization, and optimization
to be able to address the needs of every user
and make the experience engaging and extremely interactive.
And finally, we have interesting sticker processing
using vision models and graphics, which will be
coming soon in later releases.
Going further, we have a strong AI research focus.
So we are passionate about research.
We have multiple publications in ECIR this year, IJCAI demo.
And we have an ArXiv publication.
And we have [INAUDIBLE] to areas not directly related
to messaging.
But we had an ICML workshop paper, as well.
Fundamentally, the kind of problems we address
need to look at extensions and basically address
the limitations of supervised learning problems.
We need to address cases where there's a long tail of data,
very less labels available, limited number of labels
available, very costly to get those labels.
And the same problems occur in NLP, Vision, reinforcement
learning, and stuff like that.
We are looking at meta learning formulations
to address this stuff.
At Hike, we are looking at 4 billion events per day
across millions of users.
We collect a terabytes of data, essentially using the Google
Cloud with various tools on Google Cloud,
including KubeFlow, BigQuery, Data Proc, and Dataflow.
We use it for some of the use cases
which I mentioned earlier.
Essentially, I will look into one particular use
case right now.
It is on stickers.
Stickers, as I mentioned, are powerful expressions
of emotions, context, and with various visual expressions
over there.
The key challenge over there is discovery.
If you have tens of thousands of stickers now going
into millions and further into billions of stickers,
how do you discover these stickers
and be able to exchange at real time
with a few milliseconds of latency
while you are typing of personal interest?
What we want to solve essentially
is a chat context with time, event of the day, situation,
recent messages, gender, language.
And we want to predict what's the sticker that's
most relevant to it.
Building this, essentially, one needs
to look at all the different ways a particular text is
typed.
One needs to aggregate, essentially,
the semantically similar phrases to have the right encoding
across these various languages and also between the languages
and across the languages so that it does not
affect the typing experience.
And we need to deliver in the limited memory of the device
as well as a few milliseconds of response time.
So here in [INAUDIBLE] is a sticker recommendation flow.
Where basically, given a chat on text
and what the user is currently typing,
we use a message model, which predicts using a classification
model.
It predicts the message, and those messages
are mapped to the corresponding stickers.
For prediction, essentially, we use a combination
of TensorFlow learning at the server, TensorFlow
Lite learning on the device.
And in the combination, we want to deliver, basically,
a few milliseconds of latency for getting the accurate
stickers recommended.
And here we use a combination of neural network and Trie.
Obviously, we quantized the neural network
on the device using TensorFlow Lite.
And we are able to get the desired amount of performance.
The stickers essentially, come--
so once the messages are predicted,
the stickers are naturally mapped
based on the tags of the stickers on what intent
they are meant to deliver.
And correspondingly to the message
predicted, those stickers are delivered to the user.
This is a complete flow.
Basically, given a chat context, one
predicts the message that the person is trying to express.
Then one adds the user context from a hyperpersonalization
perspective, consider sticker preferences, age, gender,
and then goes to the relevant stickers.
In the stickers, we basically, score
using reinforcement learning algorithms.
Maybe to begin with, then more complex going forward
so that the right kind of stickers and the way people
behavior on the platform is changing,
the corresponding stickers also adapt to it at real time.
Thank you.
[APPLAUSE]