滿足包。音頻如何進入瀏覽器 - Sara Fecadu - JSConf US 2019年 (Meet the Packets: How audio travels into your browser - Sara Fecadu - JSConf US 2019)

字幕列表影片播放

Meet the Packets: How audio travels into your browser
Sara Fecadu KATIE: Hello. Welcome back. So, I keep forgetting
to do this and I apologize. But the big announcement right now is that the swag is ready. But do
not go get swag now because we're about to have a really awesome talk by Sara Fecadu.
I asked Sara for a fun fact and her fun fact was that she makes bakes a mean cookie which
unfortunately we can't all indulge in. So, as a follow up question, I said what prompted
you write this talk about an audio API. And she said, well, I had spent a year building
a checkout form and I just couldn't stand to look at it or think about it anymore and
I had to do something different. Which I think is something that literally all you have us
can probably identify really strongly with. So, anyways, Sara is gonna come up and talk
to us about the audio API. So, give it up for Sara.
[ Applause ] SARA: Hello. See if I can get my computer
started here. Okay. Welcome to my talk. Meet the packets. If not everyone has realized,
it's a play off meet the parents. I spent a lot of time working on that.
[ Laughter ] Let's see here. One second. Gonna progress?
No. Okay. We're gonna do it without the clicker. So, this will be interesting. As Katie said,
my name oh. My whole slide deck isn't progressing. Okay. One second. There we go. Okay. Thank
you for coming to talk. As Katie said, my name is Sara Fecadu. I am from Seattle, Washington.
And I don't have a ton of hobbies besides making cookies and listening to a lot of podcasts.
And by day I'm a software developer at Nordstrom. And Nordstrom is a clothing retailer founded
in 1901. While people don't usually associate 100 year old companies with tech, we have
a thriving tech org working on innovative ways to get you what you need and feel your
best. And a year ago I was hired on to do a rewrite of Nordstrom.com's redux. And as
of last May, we have been taking 100% of customer orders. Now, why am I talking about audio
streaming? Katie may have taken my joke here, but the answer is: Form fields. Our checkout
UI has 22 form fields. And they come in different groupings for different reasons. But many
of my waking moments over the past year have been spent thinking about these form fields.
And I just wanted to do anything else. So, I was sitting on my couch one night reading
a book on packet analysis, like one does, and watching a YouTube video. And I thought
to myself, how does that work? Like, on the packet level, how does audio video streaming
work? So, to answer the larger question, I started small with: What is audio streaming?
And audio streaming is the act of sending audio files over the network. And this talk
will be about on demand audio streaming. Now, the major difference between on demand streaming
and live streaming, is with on demand streaming we need all of the packets to get across the
wire. Whereas with live streaming, you may be more interested in keeping them up with
the event and a certain amount of packet loss is acceptable. Over the past few months, I
learned that audio streaming, even when limited to on demand, is as wide a subject as it is
deep. I have picked three topics that exemplify what audio streaming is. Why it's hard and
how to get started yourself. And we will talk about audio streaming protocols, TCP congestion
control and client players. Audio streaming protocols give us a stand how to encode, segment
and ship your code to the client. TCP congestion control handles congestion on the TCP layer
of the stack. And it is relevant with on demand audio streaming because we're shipping larger
audio files and we need every single packet to make its way to the client to play audio.
A client player is any network connected device with a play and pause button. So, this could
be your phone, your TV, your laptop, et cetera. And client players not only allow us to play
our audio, but when paired with modern audio streaming protocols, they hold a lot of decision
making power. Well, audio streaming protocols are the heart of audio streaming. And today
we'll talk about adaptive bitrate streaming it &s it benefits and how to convert your
own audio files to work with two popular audio streaming protocols. Before we get started,
I wanted to go over some terms that will come up. A codec encodes data and uses compression
techniques to get the highest quality for the smallest footprint. Encoding and trans
coding is converting it from one type to another. Trans coding can convert from digital to digital.
And then move from analog to other digital files. Bitrate is how many bits it takes to
encode a second of audio. And this number usually refers to the quality of the audio
file. When I think of playing music on the Internet, I think of an HTML5 audio tag with
a source attribute set to the path of my audio file. And this is a perfectly reasonable way
to do it. You can request and receive a single file containing an entire song. And it would
be referred to as progressive streaming and the major benefit here is you only have one
file to deal with. But let's say, for instance, you have a user and they have a slow network
connection and they can't download your one file. They're stuck. So, adaptive bitrate
streaming aims to solve this problem by encoding your audio in multiple bitrates and allowing
the client player to decide which quality is best for the user to listen to your audio
uninterrupted. This allows more users to access your audio. But it does add a layer of operational
complexity because now you've got a lot more work on moving parts. The audio streaming
protocols we'll talk about not only average adaptive bitrate streaming, but also use HTTP
web servers. They do this by encoding the file, segmenting they will, placing them on
a web server and then once requested, partial audio files are sent to the client one at
a time. Here is the secret to our modern audio streaming protocols is it's more of a series
of downloads than it really is a stream. But we'll refer to it as streaming anyway. The
two most popular audio streaming protocols today are HTTP lye streaming, or HLS, and
dynamic adaptive streaming over HTTP, MPEG DASH. It was created by Apple to support streaming
to mobile devices and it is default on all Mac OS and Apple devices. And MPEG DASH was
a direct alternative to HLS. It was created by the forum who want to make MPEG DASH the
international streaming. Let's look at them side by side. HLS takes the MPC, AAC, AC 3,
or EC 3, encodes them into fragmented MP4 files. Those segmented files are in a play
list. If you have multiple bitrate streams, each stream will be in a media play list and
all of your media play lists will be in a master play list. With MPEG DASH, it is agnostic,
in theory you can convert any into MPEG DASH. It will be fragmented into a fragmented MP4
file. That will be displayed in an XML manifest file called a media presentation description.
Okay. We've talked about what files will be used and what they'll be segmented into, but
how do you get it there? You've got this audio file. What tools allow you to convert the
audio file? Well, you've got options. But most of these options are paid options. Except
for FFmpeg. Which is an open source demand line tool that among other things allows you
to convert audio files to be HLS or MPEG DASH. However, I founded learning curve for FFmpeg
to be pretty steep. And a lot of the documentation for HLS and MPEG DASH were for video streams.
Instead I used Amazon elastic trans coder. It's an AWS offering that converts files of
one type to another. In our case, we're taking an audio file and converting it to be used
with HLS and MPEG DASH. It's pretty much plug and play. You tell Amazon elastic trans coder
what type of files you have and what type of files you want and it outputs the stream
for you. And even though it's easy to use, it's not a free service. So, if you were going
to be converting a lot of files, it may be worth your time to learn more about an open
source alternative like MPEG DASH. My workflow when working with Amazon Elastic Transcoder
was to upload to an AWS object store. I told Amazon Elastic Transcoder where my audio file
was and what settings I needed it to convert my audio files to. And Amazon Elastic Transcoder
output my streams into that same S3 bucket. And I downloaded them for us to explore. This
is the basic set of files you would get with an HLS stream. And it kind of looks like a
lot. But we're going to break it down into four groups. In the top left, the master play
list. In our case, we have two bitrate streams represented and they will be linked out from
the master play list. And then in the top right you'll see those media play lists which
have each bitrate stream. And those will contain all of our links to our transport stream files
which are the fragmented audio files represented in both the bottom left and the bottom right.
On the bottom right we have our 64K bitrate stream segmented audio files. And in the bottom,
oh. Did I get that backwards? I'm not really good at right and left. But in the bottom
section you'll have your fragmented audio files. We'll take a closer look at those so
you can see really what's in it. This is the entirety of the HLS master play list. It contains
information about the specific bitrate streams and links out to those media play lists that
represent the streams themselves. Let's look at the 64K bitrate stream media playlist.
It has even more information about the stream including caching information, the target
duration of each segmented audio file, and most importantly, links out to our transport
streams. This is what one of those fragmented audio times looks like. And there's something
a little interesting going on here. If you'll notice, it's color coded and I kept trying
to figure out why. But then I realized a transport stream has the file extension .ts. And something
else has the file extension .ts, TypeScript. Ignore the colors. It's just a binary coded
file. Now our MPEG DASH audio stream has fewer files and looks more manageable. But it's
similar. We have our media presentation description, which is an XML manifest file which contains
all of our information about the stream. Then below we have our two segmented audio files.
All of the segments are encapsulated in a single file, but within them there are segments.
That's why there are fewer files in the MPEG DASH audio stream than in the other audio
stream. Look at the description. See a lot of stuff here. But there are three important
elements. All bitrate streams are represented in a representation tag. And then all bitrate
streams are enclosed in an adaptation set. Within the representation tag, we do have
our URL to our audio files. And taking a look at one of those audio files we'll see if looks
fairly similar to the segmented audio file we saw with HLS. Minus the color coding because
it's a .MP4 versus .TS. visual studio is not confused in this case.
Earlier we talked about progressive streaming which is streaming an entire audio file in
one two. We used an audio element and a source attribute with the path of our audio file.
With MPEG DASH and HLS, it's very similar. But instead of having the path to our audio
file, we have the path to the master play list for HLS or media presentation description
for MPEG DASH. We're going to take a hard left here and we're gonna talk about the second
topic in my talk. Which is TCP congestion control. And TCP is a transport layer protocol
and it has mechanisms in both its sender and receiver which are defined by the operating
systems of each to react to and hopefully avoid congestion when sending packets over
the wire. And they are called TCP congestion control. And today we talk about packet loss
congestion control and why it isn't so great. And more specific, the congestion window and
duplicate acknowledgment in packet loss based congestion control. Before we get started,
somewhere terms, bandwidth is the rate at which data can be sent. And throughput is
the rate at which data can be received. The congestion window is a TCP variable that defines
the amount of data that can be sent before the acknowledgment is received by the sender.
Let's say you have a user who has requested your audio file from the server. Your audio
packets travel down the network stack, across the physical layer, up the data link layer
in the network layer and arrives at the transport layer and unfortunately there's congestion
right before we reached our destination. Now, traffic congestion and network congestion
have very similar beginnings. Either too many cars or too many packets have entered the
roadway and there's nowhere for them to go. With traffic, you have to wait it out. Luckily
for us, TCP congestion control allows them to flow over the wire, even during congestion.
And before we get to the specifics of these TCP congestion control algorithms, let's talk
about the TCP happy path. We're going to start with a single packet sent from the sender
to the receiver flowing through the receiver's buffer. And being acknowledged by the receiver
and having an acknowledgment packet sent back to the requester. We talked about the congestion
window, the amount of data before a sender receives an acknowledgment. Another way of
thinking about the congestion window is as a sending rate. As the sender receives acknowledgments,
the congestion window grows. And as the receiver's buffers fill and they drop all excess packets,
the sender responds by shrinking the congestion window. A second way of thinking about the
congestion window is as a bucket. And as packet loss occurs, the bucket shrinks. And as acknowledgments
are received by the sender, the bucket gross. There's a slight oversight in the bucket explanation
in that the receiver has no way of telling the sender that it is dropping packets due
to congestion. But one option the sender does have is to send a duplicate acknowledgment.
And a duplicate is if they're trying to send out of order packets. They send one, two and
three. For the purposes of our example, the receiver's not going to process them right
away. So, that when we send packet four, it's full and it has nowhere to go. So, packet
four dropped due to congestion. And they move on to process packet one, send an acknowledgment,
send for packet two and for three. However, when it looks at packet five, it says I can't
process you because this would be an out of order packet. drops packet five and sends
back for three. The sender is tipped off that it needs to sends packets four and five again.
So, a more truthful version of the bucket metaphor would be that the congestion window
shrinks as old acknowledgments are received by the sender. And the bucket window grows
as new acknowledgments are sent by the sender. The first b congestion control algorithms
were written in the 1980s and the most recent were a couple years ago. We will talk about
TCP Reno and BBR. TCP Reno is the classic. And BBR was created by Google engineers a
few years ago to address issues that they saw when using packet based algorithms. TCP
Reno starts with a congestion period where it's set at some rate increasing by. It's
set at some value, excuse me, increasing by some rate. And as the sender receives acknowledgments,
the congestion window grows by one. And as the sender adds packets, it is divided by
some rate. I have chosen path. So, it's divided by two. And the main issue with TCP Reno is
that it assumes that small amounts of packet loss are congestion. And in a world where
the sender doesn't know the state of the receiver's buffer and the receiver is unable to tell
the sender that it has room left to process packets, you have an Internet moving at a
fraction of the capacity. In 2016, BBR was created to help you get the most out of your
Internet connection. It looks for the place where sending rate is equal to bandwidth.
In theory, you should be able to send to the receiver and move on to the application without
any queuing. Some companies have reported positive outcomes when using BBR in their
production systems. Firstly, it only has to be implemented in the senders side and is
in Linux operating systems with kernel 4.9 or higher. And they found BBR increased bandwidth
for the low bandwidth users for 10 15%, and the bandwidth for their median group 5 7.
Additionally, users in Latin America and Asia saw additional increases. But is it a fair
algorithm? Fairness, or using your fair share of bandwidth is the goal of every TCP control
algorithm. And in experiments in Google and Spotify, they found that BBR was able to co
exist with congestion control algorithms like TCP Reno or QBIC. However, some researchers
found that BBR's initial start algorithm pushed QBIC spenders back to where they couldn't
reestablish their fair share of bandwidth. And this is an issue currently being look
the at both in and outside of Google. We've reached the final section in this talk. And
so far we've talked about how audio files are processed to be streamed and issues that
may occur as they travel to devices. We'll wrap up by talking about the role of the client
player and how to create your own audio strings. Now, I'm a pretty big fan of Spotify and I
use it regularly. But have you ever looked at what's being sent back from the web server
to create those audio streams? This should look pretty familiar to what we were looking
at with our segmented audio files with HLS and MPEG DASH. But when I first saw these,
I did not have this context. And I kept thinking, do I need to write some client side JavaScript
to get this to play on the Internet? Is there an NPM package I can use? Or is there something
simple and obvious going on here that's going right over my head. And luckily for me and
hopefully everyone who writes JavaScript for the web, there is. Because HLS and MPEG DASH
handed over a lot of responsibility to the clients that process their streams. And this
not only includes picking the correct quality of audio to play, but it also includes allowing
elements like the audio element to process segmented audio files without any modification.
And most browsers do this by leveraging the media sources extension API and the encrypted
media extensions API. Additionally, libraries like HLS.JS and Dash.JS are available while
cross browser support is low. As a side note, if you need to support iOS Safari, you need
HLS. But with most other browsers, you have options. So, it would have been really fun
to reverse engineer Spotify's audio player. But I got tired of reading their minified
code. So, I decided to make my own audio player. And I started with a cassette that I found
from a box of cassettes. And I chose it because it has the words "Map squad" written on it.
And I used my iPhone's voice memo application to record the audio so the quality is so so
at best. But it works. And you can try it right now. But maybe wait until the end of
the talk because I want to show you how it's made. The entire application is a single in
docs.HTML file with an audio element in the body. When loaded into the browser, the immediately
invoked function runs, the init function. And at the top, we define the audio that's
equal to our audio element. Next web see if the media sources extension API is supported
in our browser. If it is, we will assume we can use dash.JS to enable MPEG DASH in most
browsers. Pass it to the dash.JS media player. And when the player is initialized, our audio
will be loaded with it. If the media sources extension API is not available, we're going
to assume we're using iOS Safari and we need to have an HLS stream. We will do this by
setting the source attribute of our audio element to the master playlist or the past
to our master playlist. And that this file is all you need to stream audio to most browsers
in 2019. If you want to try it out in the browser for yourself, or you want to create
your own audio streams, please feel free to fork 24 repo. Thank you.
[ Applause ] KATIE: I'm sorry. I think that scared me more
than it scared you. Thank you so much, Sara. Can you believe that is the first talk she
has ever given at a conference? Yes. Amazing. All right. So, we have about a a 15 minute
break right now. So, go out and pick up your swag bags. And we'll see you back here at
3:00. Patricia Ruiz Realini is talking about the importance of your local library. Which
is pretty cool because I hang out at the library. We'll see you back here at 3:00. No, wait.
3:00, yeah.