與WebRTC的實時通信：2013年穀歌I/O大會。 (Real-time communication with WebRTC: Google I/O 2013)

字幕列表影片播放

JUSTIN UBERTI: Hi everyone.
Thanks for coming to the session on WebRTC for
plugin-free realtime communication.
I'm Justin Uberti, tech lead for WebRTC at Google.
And with me today is-- hey, has anyone seen Sam?
SAM DUTTON: Hey.
JUSTIN UBERTI: Sam Dutton, coming to you live from WebRTC
on Chrome for Android.
[APPLAUSE]
SAM DUTTON: On a beautiful Nexus 7.
We got this low-res to cope with the Wi-Fi here.
That seems to be working pretty well.
JUSTIN UBERTI: That was quite an entrance.
Why don't you come up here and introduce yourself?
SAM DUTTON: Yeah.
Hey.
I'm Sam Dutton.
I'm a developer advocate for Chrome.
JUSTIN UBERTI: So we're here to talk to you today about the
great things that WebRTC's been working on and how you
can use them.
So what is WebRTC?
In a nutshell, it's what we call realtime communication--
RTC--
the ability to communicate live with somebody or
something as if you were right there next to them.
And this can mean audio, video, or even just
peer-to-peer data.
And we think WebRTC is really cool.
But there's a lot of other people who are really excited
about WebRTC as well.
And one of the reasons is that WebRTC fills a critical gap in
the web platform, where previously, a native
proprietary app like Skype could do something the web
just couldn't.
But now we've turned that around and changed that so we
have a web of connected WebRTC devices that can communicate
in realtime just by loading a web page.
So here's what we're trying to do with WebRTC, to build the
key APIs for realtime communication into the web, to
make an amazing media stack in Chrome so that developers can
build great experiences, and to use this network of
connected WebRTC devices to create a new
communications ecosystem.
And these kind of seem like lofty goals.
But take this quote from the current CTO of the FCC who
said he sees traditional telephony fading away as voice
just becomes another web app.
So we're trying to live up to that promise.
And right now, you can build a single app with WebRTC that
connects Chrome, Chrome for Android, Firefox, and very
soon, Opera.
I'm especially excited to announce the as of this week,
Firefox 22 is going to beta, which is the very first
WebRTC-enabled version of Firefox.
So within a matter of weeks, we will have over one billion
users using a WebRTC-enabled browser.
[APPLAUSE]
JUSTIN UBERTI: And I think that just gives a good idea of
the size of the opportunity here.
And we respect that number to grow very significantly as
both Chrome and Firefox get increased adoption.
For places where we don't have WebRTC-enabled browsers, we're
providing native, supported, official tool kits on both
Android, and very soon, iOS, that can interoperate with
WebRTC in the browser.
[APPLAUSE]
JUSTIN UBERTI: So here are just a handful of the
companies that see the opportunity in WebRTC and are
building their business around it.
So that's the vision for WebRTC.
Now let's dig into the APIs.
There are remain categories of API that exist in WebRTC.
First, getting access to input devices--
accessing the microphone, accessing the webcam, getting
a stream of media from either of them.
Secondly, being able to connect to another WebRTC
endpoint across the internet, and to send this audio and
video in realtime.
And third, the ability to do this not just for audio and
video, but for arbitrary application data.
And we think this one is especially interesting.
So because there's three categories,
we have three objects.
Three primary objects in WebRTC to access this stuff.
The first one, MediaStream, for getting access to media,
then RTCPeerConnection and RTCDataChannel.
And we'll get into each one of these individually.
Sam, why don't you tell us about MediaStream?
SAM DUTTON: Yeah, sure.
So MediaStream represents a single source of synchronized
audio or video or both.
Each MediaStream contains one or more MediaStream tracks.
For example, on your laptop, you've got a webcam and a
microphone providing video and audio streams, and they're
synchronized.
We get access to these local devices using the getUserMedia
method of Navigator.
So we just look at the code for that, just highlight that.
And you can see that getUserMedia there, it takes
three parameters, three arguments there.
And the first one, if we look at the constraints argument
I've got, you can see I'm just specifying I want video.
That's all I'm saying.
Just give me video and nothing else.
And then in the success callback, we're setting the
source of a video using the stream that's returned by
getUserMedia.
Let's see that in action, really simple example here.
And you can see when we fire the getUserMedia method, we
get the allow permissions bar at the top there.
Now, this means that users have to explicitly opt in to
allowing access to their microphone and camera.
And yeah, there we have it.
Using that code, we've got video
displayed in a video element.
Great.
What really excites me about these APIs is when they come
up against each other, like in this example.
What's happening is, that we've got getUserMedia being
piped into a canvas element, and then the canvas element
being analyzed, and then producing ASCII, just like
that, which could make a good codec, I think.
JUSTIN UBERTI: It would be a good codec.
You can press it using just gzip.
SAM DUTTON: Yeah, smaller font sizes, high resolution.
Also, another example of this from Facekat.
Now what's happening here is that it's using the head
tracker JavaScript library to track the position of my head.
And when I move around, you can see I'm moving through the
game and trying to stay alive, which is quite difficult.
God, this is painful.
Anyway--
whoa.
OK, I think I've flipped into hyperspace there.
And an old favorite, you've may well have seen a webcam
toy which gives us access to the camera, kind of photobooth
app, uses WebGL to create a bunch of slightly psychedelic
effects there.
I quite this old movie one, so I'll take
that and get a snapshot.
And I can share that with my friends, so beautiful work
from Paul Neave there.
Now you might remember I said that we can use the
constraints object.
The simple example there was just saying, use the video,
nothing else.
Well, we can do more interesting things with
constraints than that.
We can do stuff like specify the resolution or the frame
rate, a whole stack of things that we want
from our local devices.
A little example from that, if we go over here.
Now, let's look at the code, actually.
If we go to the dev tools there, you can see that I've
got three different constraints objects, one for
each resolution.
So when I press the buttons, I use the QVGA constraints,
getUserMedia, and then with the VGA one, I'm getting high
resolution.
And for HD, I'm getting the full 1280 by 720.
We can also use getUserMedia now for input from our
microphone.
In other words, we can use getUserMedia to provide a
source node for web audio.
And there's a huge amount of interesting stuff we can do
with that processing audio using web audio, from the mic
or wherever.
A little example of that here--
I'll just allowed access to the mic, and you can see, I'm
getting a nice little visualization there in the
canvas element.
And I can start to record this, blah
blah blah blah blah--
[AUDIO PLAYBACK]
-To record this, blah blah blah blah blah--
[END AUDIO PLAYBACK]
SAM DUTTON: And yeah, you can see that's used recorder.js to
save that locally to disk.
GetUserMedia also now-- this is kind of experimental, but
we can use getUserMedia to get a screen capture, in other
words data coming directly from what we see on screen,
not from the audio video from the mic and the camera.
Probably the simplest if I show you an example of this,
so yeah, a little application here.
And when I click to make the call, allow, and you can see
there that I get this kind of crazy hall of mirrors effect,
because I'm capturing the screen that I'm capturing, and
so on and so on.
Now that's quite nice.
But it would be really useful if we could take that screen
capture and then transmit that to another computer.
And for that, we have RTCPeerConnection.
JUSTIN UBERTI: Thanks, Sam.
So as the name implies, RTCPeerConnection is all about
making a connection to another peer and over this peer
connection, we can actually then go and
send audio and video.
And the way we do this is we take the media streams that
we've got from getUserMedia, and we plug them into the peer
connection, and send them off to the other side.
When the other side receives them, they'll pop out as a new
media stream on their peer connection.
And they can then plug that into a video element to
display on the page.
And so both sides of a peer connection, they both get
streams from getUserMedia, they plug them in, and then
those media streams pop out magically encoded and decoded
on the other side.
Now under the hood, peer connection is
doing a ton of stuff--
signal processing to remove noise from audio and video;
codec selection and compression and decompression
of the actual audio and video; finding the actual
peer-to-peer route through firewalls, through NATs,
through relays; encrypting the data so that a user's data is
fully protected at all times; and then actually managing the
bandwidth so that if you have two megabits, we use it.
If you have 200 kilobits, that's all we use.
But we do everything we can hide this complexity from that
web developer.
And so the main thing is that you get your media streams,
you plug them in via Adstream to peer
connection, and off you go.
And here's a little example of this.
SAM DUTTON: Yeah, so you can see here that we've created a
new RTCPeerConnection.
And when the stream is received, the callback for
that in gotRemoteStream there attaches the media we're
getting from a video element to the stream.
Now, at the same time, we're also creating what's called an
offer, giving information about media, and we're setting
that as the local description, and then sending that to the
callee, so that they can set the remote description.
You can see that in the gotAnswer function there.
Let's have a little look at RTCPeerConnection on one page,
a very simple example here.
So what we've got here is getUserMedia here,
just start that up.
So it's getting video from the local camera here, displaying
it on the left there.
Now when I press call, it's using RTCPeerConnection to
communicate that video to the other--
yeah, the other video element on the page there.
This is a great place to start to get your head around
RTCPeerConnection.
And if we look in the code there, you can see that it's
really simple.
There's not a lot of code there to do that, to transmit
video from one peer to another.
JUSTIN UBERTI: So that's really cool stuff.
A full video chat client in a single web page, and just
about 15 lines of JavaScript.
And we talked a bit quickly through the whole thing around
how we set up the parameter of the call, the offers and
answers, but I'll come back to that later.
The next thing I want to talk about is RTCDataChannel.
And this says, if we have a peer connection which already
creates our peer-to-peer link for us, can we send arbitrary
application data over it?
And this is the mechanism that we use to do so.
Now one example where we would do this would be in a game.
Like, take this game.
I think it's called Jank Wars or something.
And we have all these ships floating around onscreen.
Now, when a ship moves, we want to make sure that's
communicated to the other player as quickly as possible.
And so we have this little JSON object that contains the
parameters and the position and the velocity of the ships.
And we can just take that object and stuff it into the
send method, and it will shoot it across the other side where
it pops out as onMessage.
And the other side can do the same thing.
It can call send on its data channel, and it works pretty
much just like a WebSocket.
That's not an accident.
And we tried to design it that way, so that people familiar
with using WebSockets could also use a similar API for
RTCDataChannel.
And the benefit is that here, we have a peer-to-peer
connection with the lowest possible latency for doing
this communication.
In addition, RTCDataChannel.
can be either unreliable or reliable.
And we can think about this kind of like UDP versus TCP.
If you're doing a game, it's more important that your
packets get there quickly than they're
guaranteed to get there.
Whereas if you're doing a file transfer, the files are only
any good if the entire file is delivered.
So you can choose this as the app developer, which mode you
want to use, either unreliable or reliable.
And lastly, everything is fully secure.
We use standard DTLS encryption to make sure that
the packages you send across the data channel are fully
encrypted on their way to the destination.
And you can do this either with audio and video, or if
you want to make a peer connection for just data, you
can do that as well.
So Sam's going to show us how this actually works.
SAM DUTTON: Yeah, so again, another really simple example.
We're creating a peer connection here, and once the
data channel is received, in the callback to that, we're
setting the receive channel using the
event.channel object.
Now, when the receive channel gets a message, kind of like
WebSocket really, we're just putting some text in a local
div there, using event.data.
Now, the send channel was created with
createDataChannel.
And then we got a send button.
When that's clicked, we get the data from a text area, and
we use the send channel to send that to the other peer.
Again, let's see this in action.
This is, again, a good place to start-- one page demo, with
all the code for RTCDataChannel, so type in
some text, and we hit send, and it's transmitting it to
the other text area.
A great place to start if you're looking at
RTCDataChannel.
Something a little more useful here, a
great app from Sharefest.
Now, Sharefest is using RTCDataChannel.
to enable us to do file sharing.
I think I'm going to select a nice photo here I've got of
some cherries.
And it's popeye, is the URL.
And now Justin is going to try and get that up on screen on
his side, just to check that that's gone through.
So like I say, this is doing file sharing using
RTCDataChannel, and there's a huge amount
of potential there.
There we go.
Those are the cherries.
JUSTIN UBERTI: I love cherries.
SAM DUTTON: These are beautiful Mountain View
cherries, actually.
They were really, really nice.
JUSTIN UBERTI: All this data is being sent peer-to-peer,
and anybody else who connects to the same URL will download
that data peer-to-peer from Sam's machine.
And so none of this has to touch Sharefest servers.
And I think that's pretty interesting if you think about
things like file transfer and bulk video distribution.
OK, so we talked a lot about how we can do really clever
peer-to-peer stuff with RTCPeerConnection.
But it turns out we need servers to kind of get the
process kicked off.
And the first part of it is actually making sure that both
sides can agree to actually conduct the session.
And this is the process that we call signaling.
The signaling in WebRTC is abstract, which means that
there's no fully-defined protocol on
exactly how you do it.
The key part is that you just have to exchange session
description objects.
And if you think about this kind of like a telephone call,
when you make a call to someone, the telephone network
sends a message to the person you're calling, telling them
there's an incoming call and the phone should ring.
Then, when they answer the call, they send a message back
that says, the call is now active.
Now, these messages also contain parameters around what
media format to use, where the person is on the network, and
the same is true for WebRTC.
And these things, these session description objects,
contain parameters like, what codecs to use, what security
keys to use, the network information for setting up the
peer-to-peer route.
And the only important thing is that you just send it from
your side to the other side, and vice versa.
You can use any mechanism you want--
WebSockets, Google Cloud Messaging, XHR.
You can use any protocol, even just send it as JSON, or you
can use a standard protocols like SIP or XMPP.
Here's a picture of how this all works.
The app gets a session description from the browser
and sends it across through the cloud to the other side.
Once it gets the message back from the other side with the
other side's session description, and both sessions
consider passed down to WebRTC in the browser, WebRTC can
then set up and conduct the media link peer-to-peer.
So we do a lot to try to hide the details of what's inside
the RTCSessionDescription, because this includes a whole
bunch of parameters--
as I said, codecs, network information,
all sorts of stuff--
this is just a snippet of what's contained inside a
session description right now.
Really advanced apps can do complex behaviors by modifying
this, but we designed API so that regular apps just don't
have to think about it.
The other thing that we need servers for is to actually get
the peer-to-peer session fully routed.
And in the old days, this wouldn't be a problem.
A long time ago, each side had a public IP address.
They send each other's IP address to each other through
the cloud, and we make the link
directly between the peers.
Well, in the age of NAT, things are more complicated.
NATs hand out what's called a private IP address, and these
IP addresses are not useful for communication.
There's no way we can make the link actually peer-to-peer
unless we have public address.
So this is where we bring a technology called STUN.
The STUN server we can contact from WebRTC, and we say,
what's my public IP address?
And basically, the request comes into the STUN server, it
sees the address that that request came from, puts the
address into the packet, and sends it back.
So now WebRTC knows its public IP address, and the STUN
server doesn't have to be in the party anymore, doesn't
have to have media flowing through it.
So here, if you look at this example, each side has
contacted that STUN server to find out what its public IP
address is.
And then it's sent the traffic to the other IP address
through its NAT, and the data still flows peer-to-peer.
So this is kind of magic stuff, and it usually works.
Usually we can make sure that the data all flows properly
peer-to-peer, but not in every case.
And for that, we have a technology called TURN built
into WebRTC.
This turn things around and provides a cloud fallback when
a peer-to-peer link is impossible, basically asks for
a relay in the cloud, saying, give me a public address.
And because this public address is in the cloud,
anybody can contact it, which means the call always sets up,
even if you're behind a restrictive, or
even behind a proxy.
The downside is that since the data actually is being relayed
through the server, there is an operational cost to it.
But it does mean the call works in almost all
environments.
Now, on one hand, we have STUN, which is super cheap,
but doesn't always work.
And we have TURN, which always works, but has
some cost to it.
How do we make sure we get the best of both worlds?
Here's TURN in action, where we try to use STUN and STUN
didn't work.
And we couldn't get the things to actually
penetrate the NATs.
So instead, we fell back.
Only then did we use TURN, and sent the media from our one
peer, through the NAT, through the TURN server, and to the
other side.
And this is all done by a technology called ICE.
ICE knows about STUN and TURN, and tries all the things in
parallel to figure out the best path for the call.
If it can do STUN, it does STUN.
If it can do TURN, well then I'll fall back to TURN, but
I'll do so quickly.
And we have stats from a deployed WebRTC application
that says 86% of the time, we can make things
work with just STUN.
So only one out of seven calls actually have to run through a
TURN server.
So how do you deploy TURN for your application?
Well, we have some testing servers, a testing STUN server
that you can use, plus we make source code available for our
own STUN and TURN server as part of
the WebRTC code package.
But the thing I would really recommend is the long name,
but really good product--
rfc5766-turn-server--
which has Amazon VM images that you can just take,
download, and deploy into the cloud, and you've got your
TURN server provisioned for all your users right there.
I also recommend restund, another TURN server that we've
used with excellent results.
One question that comes up around WebRTC is, how is
security handled?
And the great thing is that security has been built into
WebRTC from the very beginning, and so this means
several different things.
It means we have mandatory encryption for
both media and data.
So all the data that's being sent by WebRTC is being
encrypted using standard AES encryption.
We also have secure UI, meaning the user's camera
microphone can only be accessed if they've explicitly
opted in to making that functionality available.
And last, WebRTC runs inside the Chrome sandbox.
So even if somebody tries to attack WebRTC inside of
Chrome, the browser and the user will be fully protected.
So here's what you need to do to take advantage of the
security in WebRTC, is really simple.
Your app just needs to use HTTPS for
actually doing the signaling.
As long as the signaling goes over a secure conduit, the
data will be fully secured as well using the standard
protocols of SRTP for media or Datagram TLS
for the data channel.
One more question that comes up is around making a
multi-party call, a conference call.
How should I architect my application?
In the simple two-party case, it's easy.
We just have a peer-to-peer link.
But as you start adding more peers into the mix, things get
a bit more complicated.
And one approach that people use is a mesh, where basically
every peer connects to every other peer.
And this is really simple, because there's no servers or
anything involved, other than the signaling stuff.
But every peer has to send and copy this data
to every other peer.
So this has a corresponding CPU and bandwidth cost.
So depending on the media you're trying to send--
for audio, it can be kind of higher.
For video, it's going to be less-- the number of peers you
can support in this topology is fairly limited, especially
if one of the peers is on a mobile device.
To deal with that another architecture that can be used
is the star architecture.
And here, you can pick the most capable device to be what
we call the focus for the call.
And the focus is the part that's actually responsible
for taking the data and sending a copy to each of the
other endpoints.
But as we get to handing multiple HD video streams, the
job for a focus becomes pretty difficult.
And so for the most robust conferencing architecture, we
recommend an MCU, or multipoint control unit.
And this is a server that's custom made for relaying large
amounts of audio and video.
And it can do various things.
It can do selective stream forwarding.
It can actually mix the audio or video data.
It can also do things like recording.
And so if one peer drops out, it doesn't interrupt the whole
conference, because the MCU is taking care of everything.
So WebRTC is made with standards in mind.
And so you can connect things that
aren't even WebRTC devices.
And one thing that people want to talk from WebRTC is phones.
And there's a bunch of easy things they can be dropped
into your web page to make this happen.
There's a sipML5, which is a way to talk to various
standard SIP devices, Phono, and what we're going to show
you now, a widget from Zingaya to make a phone call.
SAM DUTTON: OK, so we've got a special guest joining us a
little bit later in the presentation.
I just wanted to give him a call to see if he's available.
So let's use the Zingaya WebRTC phone app now.
And you could see, it's accessing my microphone.
[PHONE DIALING AND RINGING]
SAM DUTTON: Calling someone.
I hope it's the person I want.
[PHONE RINGING]
SAM DUTTON: See if he's there.
CHRIS WILSON: Hello?
SAM DUTTON: Hey.
Is that you, Chris?
CHRIS WILSON: Hey, Sam.
How's it going?
It is.
SAM DUTTON: Hey.
Fantastic.
I just want to check you're ready for your gig later on.
CHRIS WILSON: I'm ready whenever you are.
SAM DUTTON: That's fantastic.
OK, speak to you soon, Chris.
Thanks.
Bye bye.
CHRIS WILSON: Talk to you soon.
Bye.
SAM DUTTON: Cheers.
JUSTIN UBERTI: It's great-- no plugins, realtime
communication.
SAM DUTTON: Yeah, that situation, we had
a guy with a telephone.
Something we were thinking about is situations where
there is no telephone network.
Now, Voxio demonstrated this with something called Tethr,
which is kind of disaster communications in a box.
It uses the open BTS cell framework-- you can see, it's
that little box there-- to enable calls between feature
phones via the open BTS cell through WebRTC to computers.
You can imagine this is kind of fun to get a license for
this in downtown San Francisco, but this is
incredibly useful in situations where there is no
infrastructure.
Yeah, this is like telephony without a
carrier, which is amazing.
JUSTIN UBERTI: So we have a code lab this afternoon that I
hope you can come to, where I'll really go into the
details of exactly how to build a WebRTC application.
But now we're going to talk about some resources that I
think are really useful.
The first one is something called WebRTC Internals.
And this is a page you can open up just by going to this
URL while you're in a WebRTC call.
And it'll show all sorts of great statistics about what's
actually happening inside your call.
This would be things like packet loss, bandwidth, video
resolution and sizes.
And there's also a full log of all the calls made to the
WebRTC API that you can download and export.
So if a customer's reporting problems with their call, you
can easily get this debugging information from them.
Another thing is, the WebRTC spec has been
updating fairly rapidly.
And so in a given browser, the API might not always match the
latest spec.
Well, adapter.js is something that's there to insulate the
web developer from the differences between browsers
and the differences between versions.
And so we make sure that adapter.js always implements
the latest spec, and then thunks down to whatever the
version supports.
So as new APIs are added, we polyfill them to make sure
that you don't have to write custom version code or custom
browser code for each browser.
And we use this in our own applications.
SAM DUTTON: OK, if all this is too much for you, good news
is, we've got some fantastic JavaScript frameworks come up
in the last few months, really great abstraction libraries to
make it really, really simple to build WebRTC apps just with
a few lines of code.
Example here from SimpleWebRTC, a little bit of
JavaScript there to specify a video element that represents
local video, and one that represents the remote video
stream coming in.
And then join a room just by calling the joinRoom method
with a room name--
really, really simple.
PeerJS does something similar for RTCDataChannel--
create a peer, and then on connection, you can send
messages, receive messages, so really, really easy to use.
JUSTIN UBERTI: So JavaScript frameworks go a long way, but
they don't cover the production
aspects of the service--
the signaling, the STUN and TURN service we talked about.
But fortunately, we have things from both OpenTok and
Vline that are basically turnkey WebRTC services that
handle all this stuff for you.
You basically sign up for the service, get an API key, and
then you can make calls using their production
infrastructure, which is spread
throughout the entire globe.
They also make UI widgets that can be easily dropped into
your WebRTC app.
So you get up and running with WebRTC super fast.
Now, we've got a special treat for you today.
Chris Wilson, a colleague of ours, and a developer in the
original Mosaic browser, and an occasional musician as
well, is going to be joining us courtesy of WebRTC to show
off the HD video quality and full-band audio quality that
we're now able to offer in the latest version of Chrome.
Take it away, Chris.
CHRIS WILSON: Hey, guys.
SAM DUTTON: Hey, Chris.
How's it going?
CHRIS WILSON: I'm good.
How are you?
SAM DUTTON: Yeah, good.
Have you got some kind of musical instrument with you?
CHRIS WILSON: I do.
You know, originally you asked me for a
face-melting guitar solo.
But I'm a little more relaxed now.
I/O is starting to wind down.
You can tell I've already got my Hawaiian shirt on.
I'm not ready for some vacation.
So I figured I'd bring my ukulele and hook it up through
a nice microphone here, so we can listen to how that sounds.
SAM DUTTON: Take it away.
Melt my face, Chris.
[PLAYING UKULELE]
SAM DUTTON: That's pretty good.
JUSTIN UBERTI: He's pretty good.
All right.
SAM DUTTON: That was beautiful.
Thank you, Chris.
[APPLAUSE]
CHRIS WILSON: All right, guys.
JUSTIN UBERTI: Chris Wilson, everybody.
SAM DUTTON: The audience has gone crazy, Chris.
Thank you very much.
JUSTIN UBERTI: You want to finish up?
SAM DUTTON: Yeah.
So, we've had-- well, a fraction over 30 minutes to
cover a really big topic.
There's a great lot of more information out there online,
some good stuff on HTML5 Rocks, and a really good
e-book too, if you want to take a look at that.
There are several ways to contact us.
There's a great Google group--
discuss-webrtc--
post your technical questions.
All the kind of new news for WebRTC comes through on
Google+ and Twitter stream.
And we're really grateful of all the people, all of you
who've submitted feature requests and bugs.
And please keep them coming, and the URL for that is
crbug.com/new.
So thank you for that.
[APPLAUSE]
JUSTIN UBERTI: And so we've built this stuff into the web
platform to make realtime communication
accessible to everyone.
And we're super excited because we can't wait to see
what you all are going to build.
So thank you for coming.
Once again, the link.
And now, if you have any questions, we'll be happy to
try to answer them.
Thank you very much.
SAM DUTTON: Yeah.
Thank you.
[APPLAUSE]
AUDIENCE: Hi.
My name is Mark.
I like to know, because I'm using Linux and Ubuntu, how
finally can I get rid of the talk plugin for using Hangouts
in Google+?
JUSTIN UBERTI: The question is, when can we get rid of
that Hangouts plug-in?
And so unfortunately, we can only talk about
WebRTC matters today.
That's handled by another team.
But let's say that there are many of us who
have the same feeling.
AUDIENCE: OK.
Great.
[LAUGHTER]
AUDIENCE: Can you make any comments on Microsoft's
competing standard, considering they kind of hold
the cards with Skype, and how maybe we can go forward
supporting both or maybe converge the two, or just your
thoughts on that?
JUSTIN UBERTI: So Microsoft has actually been a great
participant in standards.
They have several people they sent from their team.
And although they don't see things exactly the same way
that we do, I think that the API differences are sort of,
theirs is a lot more low-level, geared for expert
developers.
Ours is a little more high-level, geared for web
developers.
And I think that really what you can do is you can
implement the high-level one on top of the low-level one,
maybe even vice versa.
So Microsoft is a little more secretive about what they do.
So we don't know exactly what their timeframe
is relative to IE.
But they're fully participating.
And obviously, they're very interested in Skype.
So I'm very optimistic that we'll see a version of IE that
supports this technology in the not-too-distant future.
AUDIENCE: Very good to hear.
Thank you.
AUDIENCE: My question would be, I think you mentioned it
quickly in the beginning.
So if I wanted to communicate with WebRTC, but one, I'm
using a different environment than the browser.
Let's say I want a web application to speak to a
native Android app.
So what would be the approach to integrate that with WebRTC?
JUSTIN UBERTI: As I mentioned earlier, we have a fully
supported official native version of pure connection,
PureConnection.Java, which is open source, and you can
download, and you can build that into your native
application.
And it interoperates.
We have a demo app that interoperates with
our AppRTC demo app.
So I think that using Chrome for Android in a web view is
one thing you can think about.
But if that doesn't work for you, we have a native version
that works great.
AUDIENCE: OK.
Thank you.
AUDIENCE: Hi.
My question would be, are there any things that to be
taken care between cross-browser compatibility
for this Firefox Chrome?
Anything specific that needs to be taken
care, or it just works?
JUSTIN UBERTI: There are some minor differences.
I mentioned adapter.js covers some of the things where the
API isn't quite in sync in both places.
One specific thing is that Firefox only supports the opus
codec, and they only support DTLS encryption.
They don't support something called S-DES,
that we also support.
So for right now, you have to set one parameter in the API,
and you can see that in our app RTC source code, to make
sure that communication actually uses
those compatible protocols.
We actually have a document, though, on our web page, the
documents exactly what you have to do, which is really
setting a single constraint parameter when you're creating
your peer connection object.
SAM DUTTON: Yeah.
If you go to webrtc.org/interop.
JUSTIN UBERTI: Yeah.
That works at org/interop.
AUDIENCE: OK.
Thank you.
AUDIENCE: When a peer connection is made and it
falls back to TURN, does the TURN server, is it capable of
unencrypting the messages that go between the two endpoints?
JUSTIN UBERTI: No.
The TURN server is just a packet relay.
So this stuff is fully encrypted.
It doesn't have the keying information to
do anything to it.
So the TURN server just takes a byte, sends a byte, takes a
packet, sends a packet.
AUDIENCE: So for keeping data in sync with low latency
between, say, an Android application and the server,
how would both the native and the Android Chrome
implementations of WebRTC fare in terms of battery life?
JUSTIN UBERTI: I don't really have a good answer for that.
I wouldn't think there would be much difference.
I mean, the key things that are going to be driving
battery consumption in this case--
are you talking about data, or are you talking
about audio and video?
AUDIENCE: Data.
JUSTIN UBERTI: For data, the key drivers of your power
consumption are going to be the screen and the network.
And so I think those should be comparable between Chrome for
Android and the native application.
AUDIENCE: OK, cool.
Thanks.
AUDIENCE: With two computers running Chrome, or what have
you seen glass-to-glass latency?
JUSTIN UBERTI: Repeat?
AUDIENCE: Glass-to-glass, so from the camera to the LCD.
JUSTIN UBERTI: Oh, yeah.
So it depends on a platform, because the camera can have a
large delay built into it itself.
Also, some of the audio things have higher
latencies than others.
But the overall target is 150 milliseconds end-to-end.
And we've seen lower than 100 milliseconds in best case
solutions for glass-to-glass type latency.
AUDIENCE: OK.
And how are you ensuring priority of your data across
the network?
JUSTIN UBERTI: That's a complex
question with a long answer.
But the basic thing, are you saying, how do we compete with
cat videos?
AUDIENCE: No, just within the WebRTC, are you just--
how are you tagging your packets?
JUSTIN UBERTI: Right, so there is something called DSCP where
we can mark QoS bits-- and this isn't yet implemented in
WebRTC, but it's on the roadmap, to be able to tag
things like audio as higher priority than, say, video, and
that as a higher priority than cat videos.
AUDIENCE: So it's not today, but will be done?
JUSTIN UBERTI: It will be done.
We also have things for doing FEC type mechanisms to protect
things at the application layer.
But the expectation is that as WebRTC becomes more pervasive,
carriers will support DSCP at least on the bit from coming
off the computer and going onto their network.
And we have that DSCP does help going through Wi-Fi
access points, because Wi-Fi access points to give priority
to DSCP-marked traffic.
AUDIENCE: Thank you.
AUDIENCE: So in Chrome for iOS being limited to UI web view
and with other restrictions, how much of WebRTC will you be
able to implement?
JUSTIN UBERTI: So that's a really interesting question.
They haven't made it easy for us, but the Chrome for iOS
team has already done some amazing things to deliver the
Chrome experience that exists there now.
And so we're pretty optimistic that one way or another, we
can find some way to make that work.
No commitment to the time frame, though.
AUDIENCE: What are the mechanisms for a saving video
and audio that's broadcast with WebRTC, like making video
recordings from it?
JUSTIN UBERTI: So if you have the media stream, you can then
take the media stream and plug it into things like the Web
Rdio API, where you can actually get the raw samples,
and then make a wave file and save that out.
On the video side, you can go into a canvas, and then
extract the frames from a canvas, and you can save that.
There isn't really any way to sort of save it as a .MP4,
.WEBM file yet.
But if you want to make a thing that just captures audio
from the computer then it stores on a server, you could
basically make a custom server that could do that recording.
That's one option.
AUDIENCE: So the TURN server is open--
but you said the TURN server doesn't capture.
JUSTIN UBERTI: No.
AUDIENCE: It can't act as an endpoint.
Do you have server technology that acts as an endpoint?
JUSTIN UBERTI: There are people building
this sort of stuff.
Vline might be one particular vendor who does this, but
there's something where you can basically have an MCU, and
the MCU that receives the media could then do things
like compositing or recording of that media.
AUDIENCE: So presumably, the libraries for Java or
Objective C could be used to create a server
implementation?
JUSTIN UBERTI: Exactly.
That's what they're doing.
AUDIENCE: Hi, kind of two-part question that has to do around
codecs, specifically on the video side,
currently VP8, WebM.
Is there plans for H.264, and also what's the
timeline for VP9?
JUSTIN UBERTI: Our plans are around the VP family of
codecs, so we support VP8.
And VP9, you may have heard that it's sort of trying to
finalize the bit stream right now.
So we are very much looking forward to taking advantage of
VP9 with all its new coding techniques, once it's both
finished and also optimized for realtime.
AUDIENCE: And H.264, not really on the plan?
JUSTIN UBERTI: We think that VP9 provides much better
compression and overall performance than H.264, so we
have no plans as far as H.264 at this time.
AUDIENCE: OK.
AUDIENCE: Running WebRTC on Chrome or Android for mobile
and tablets, how does it compare with native
performance, like Hangouts on Android?
JUSTIN UBERTI: We think that we provide a comparable
performance to any native application right now.
We're always trying to make things better.
We still have Chrome for Android, the WebRTC's behind a
flag because we still have work to do around improving
audio, improvement some of the performance.
But we think we can deliver equivalent performance on the
web browser.
And we're also working on taking advantage of hardware
acceleration, in cases where there's hardware decoders like
there is on Nexus 10, and making that so we can get the
same sort of down-to-the-metal performance that you could get
from a native app.
AUDIENCE: So the Google Talk plugin is using not just
H.264, but H.264 SVC optimized for the needs of
videoconferencing.
Is VP8 and VP9 going to be similarly optimized
specifically in an SVC-like fashion for video conferencing
versus just the versions for file encoding?
JUSTIN UBERTI: So VP8 already supports temporal scalability
in the S part of SVC.
VP9 supports additional scalability modes as well.
So we're very excited about the new coding techniques that
are coming in VP9.
AUDIENCE: So we want to use WebRTC to do live streaming
from, let's say, cameras, hardware cameras.
And what are the things that we should take care of such
kind of an application?
And when you mentioned VP8 and VP9 support,
H.264 is not supported.
Assuming your hardware supports only H.264, WebRTC
can be used with Chrome in that case?
JUSTIN UBERTI: We are building up support for hardware VP8,
and later, VP9 encoders.
So you can make a media streaming application like you
described, but we're expecting that all the major SSE vendors
are now shipping hardware with built-in VP8
encoders and decoders.
So as this stuff gets into market, you're going to see
this stuff become the most efficient way to record and
compress data.
AUDIENCE: So the only way is to support VP8 in hardware
right now, right?
JUSTIN UBERTI: If you want hardware compression, the only
things that we support right now will be VP8 encoders.
AUDIENCE: That's on the device side, you know, the camera
which is on--
JUSTIN UBERTI: Right.
If you're having encoding from a device that you want to be
decoded within the browser, I advise you to do it in VP8.
AUDIENCE: Thank you.
JUSTIN UBERTI: Thank you all for coming.
SAM DUTTON: Yeah, thank you.
[APPLAUSE]