字幕列表 影片播放
Creating an AI Musician with JavaScript
Thomas Drach KATIE: hello? There we go. That's me. Good
morning. Whoo! Is everybody ready? Yeah. We have a really, really cool talk to start the
day with. Thomas Drach is here. And when I, you know, he gave me some fun facts. But actually
we had a really fascinating conversation just now that I'm gonna share with you. So, back
in the '90s, the first two CDs that I ever bought. The first was DJ Jazzy Jeff and the
Fresh Prince, the rapper, parents don't understand, which was amazing. And I bought a Rick Astley CD
with a famous song on it that people use to Rickroll each other. And Thomas said his first
tape was Nsync. Clearly his musical taste is much better than me. Let's give it up for
Thomas Drach. [ Applause ]
THOMAS: Good morning. Thanks for being here early for this talk. I want to give a huge
thanks to JSConf for having me. They have been awesome. So grateful to be here. I also
want to thank any open source contributors. Because I feel like I'm using cheat code sometimes,
just like using someone else's code. If you're anything like me, I may have squirmed a little
bit at this AI acronym. And that's good. We're going through that today. But just your willingness
to be here, shows you're open to the ideas and pushing the boundaries. I think that's
commendable. Thanks for being here. Okay. My name's Thomas Drach like we talked about.
Thomasdrach on Twitter if you want to bug me there, please do. I consider myself a designer
and a bit of a hacker. Not in the sense of this awesome movie. But in the writing bad
code sense. I'm really good at that. I have a little design studio called Subtract where
I make what I hope are useful products. One of which we'll go through today. And I have
a product called Cleverstack too if you want to check that out. So, I want to start with
a man called Paul Thomas. He had a little garage in Phoenix called the Thomas Brothers
Garage. Today people might call him an entrepreneur or a founder. But he was actually an inventor.
He filed for over a hundred patents and I'm still trying to track most of them down. I
found about a dozen of them. Most reminded me of the Rube Goldberg machines. I don't
know if you have seen these before. They're simple machines that create this weird trap
like mechanism. All the patents I found from him were complex machines like that. Like
nothing you would actually use. It's kind of hard to see here. But the name of this
one is panel manufacturing method. And it's basically a patent for this giant machine
that creates these like brick and concrete slabs and then we're supposed to ship them
to job are sites for people to build houses with them. Kind of made me sad a little bit
because it just seemed like a crazy person was documenting and patenting all of these
things. But the machines that get built. This is the one that we were just talking about.
The manufacturing machine to make the giant slabs. And they actually had to design and
patent these, like, semitrucks to ship them to job sites. They had these weird little
trucks that would like transport like these palettes of bricks everywhere. And clearly
they didn't invent the screw or the lever or the racket pinion or these things, but
they combined something to make something new and useful. And it's especially interesting
to me because this man was my great grandfather and my namesake. Some of you might be familiar
with this Henry Ford quote. The funny thing is, Henry Ford never said this. I did research
and the first time it was attributed to him was in 1999 in the cruise industry news quarterly.
And other people started using it and now people say it on stages like this. Sometimes
it's paired with like the Steve Jobs quote and kind of create this is genius complex
of, they didn't know what they want. We have to show them or whatever. But I think there's
something that people miss about this quote in particular. I think it resonates for a
reason. But my interpretation of this quote is, big progress isn't necessarily just like
an iteration of the last thing, but it's like a mutation of something that happened before.
Maybe a little bit like this. We could accidently combine a few unrelated things to find something
new. This is Tim Berners Lee talking about inventing the Internet. He said I just had
to take the hypertext idea and connect it to the TC P&D NS ideas and can be ta da, I
had the World Wide Web. There's an old LinkedIn, his profile page, it just said like web developer.
But he goes on this interview, and I recommend listening or reading interviews from him.
He goes on just to attribute all these other inventions and says if these didn't happen,
the Internet, at least I wouldn't have created it at that time. I don't know what would have
happened. So, the definition of mutation in the dictionary is the changing of structure,
resulting in a variant form that may be transmitted to subsequent generations. Hendrix famously
took right handed guitars, flipped them upside down and then eventually changed music. And
he did so in part because it was before Les Paul invented the electronic guitar. They
didn't invent it to invent it, but because they wanted the acoustic guitar louder. And
Grace Hopper, is one of the inventers of what we now call programming. A big reason we're
here today. And she started with knobs and switches on the Mark I. All of these were
mutations. Like it was different enough. Hopper's was a mutation. And I think AI is a built
of a mutation, at least how we talk about it today. There's much more data, advances
in machine learning, compute power thanks to Moore's Law. And it kind of created the
opportunity for something like AI to work. This is what I get when I search "AI" on Google.
I don't know about you. But this isn't very helpful for me. So, I'll ask a little bit
different question today. I want to ask, what are intelligent machines? We might be able
to define this. Just intelligence + machines. So, let's define intelligence. This is a quote;
I'm just going to read it really quick. People generally distrust the concept of machines
that approach and thus why not pass our own human intelligence. I think a lot of people
feel like this today. And this quote was actually written in 1970 in the book called the architecture
machine actually by the person who founded the MIT Media Lab. And it goes on to say that,
machines must be aware of their context in order to be intelligent. So, you can't have
like a machine without using the context, interacting with the world. It's not intelligent
in that case. There's no lack of context in the new Tesla roadsters. So, for our purposes,
and I'm just gonna say intelligence means using context. So, now we can define machines.
This should be pretty easy. We go to the dictionary and find a mechanically, electronically, or
electrically operated device for a task. Sounds good to me. Okay. So, with intelligence and
machines defined, I would like to introduce you to the concept of somewhat intelligent
machines. And this is what we're gonna build today. And this is just something that uses
context and rapidly completes something that a human could not. And we're gonna do all
of it in JavaScript. So, this is the actual machine instrument, musician, AI, whatever
you want to call it. This is what we're gonna build today. I'm going to walk through how
to generate drumbeats using pre trained machine learning models, APIs, libraries, stuff like
that. We're going to piece it together. And I find it a little bit hard for me to follow
tiny code so it's gonna be a little pseudo codey. Like I said, the first thing we needed
were a couple libraries we're going to use Magenta. If you haven't heard of Magenta already,
please check it out. It's incredible. A couple of people have talked about it already here
at JSConf. And then we're going to use Tone, which is actually a dependency Magenta which
gives us an easier to code interface for musical stuff. All right. Let's play some drums. This
is what the data structure for the drums will look like. You can set up a step sequence,
but this is a step sequence in Magenta. There's a pitch, there's an attribute that tells it's
a sample based pitch, not a tonal keyboard like thing. And there's quantization info.
There's a method that does that for you so you don't have to worry about it. Okay. So,
all we need to play that note sequence is two lines of code. We're going to create a
new instance in the Magenta music player. And I'm going to call it player.star on that.
And we're gonna get something like this. [drumbeats]
This is just our basic pattern that we plugged in. Right? It's not that exciting. We kind
of want something a little bit more like this. Like feed it in and we'll get something better
in the material. All right. But in order to do that and do super quick ML crash course.
I am not the one to go in depth about this. But let's all get on the same page. Okay.
So, usually write functions something like this. We want something, we put something
in, we want something back. Machine learning, it's a little bit more abstract, right? We
don't know abstractly like what we how we would get there. Like, is this image a dog?
Here's the image. I don't know. Some of you might see
[ Laughter ] Memes like this. This is one of my favorites.
Chicken fingers and the goldendoodles, I think. I don't know how dogs and cats became like
the hello world of machine learning. But I'm not mad about it. So, here's what we call
training data. If it was training data, it was probably labeled. So, this is a dog. I
probably should have said like fried chicken. That's not an actual chicken. So, you feed
all that to the machine. The machine says all these are dogs. They have this weird odd
thing on their face. We call that a feature. That feature to us looks like a nose. The
machine goes, okay, there's a nose. It's probably a dog. So, we feed them the image and it's
gonna guess, dog. All of these are just like probabilities. For our purposes, we want to
give it some drums and we want some better drums in turn. So, that's where Magenta comes
in. Magenta has a couple different models available. All of these are super cool and
it seems like they're coming out with more every week, every month. So, there's a MusicRNN
model, a Music VAE and a Piano Genie. Right now the Piano Genie is a VAI as well. Just
quick, RNN stands for recounter neural network, a bunch of nodes. It's like one of those.
But it loops through itself. And a VAE is a variational auto encoder. If you're familiar
with encoding and decoding, it works similarly to that. For our purposes, we're going to
use the MusicRNN models just in the context of Magenta. They have a little bit better
support for like individual instruments like drums. And this is kind of what that might
look like. So, if you have nodes on the network, you have it looping through itself and you
have an in and an out. For us, we're going to put in our initial drum beat and we're
going to expect a generated drum beat in return. Okay. So, we picked our MusicRNN models. This
is what the actual checkpoint is. So, this is like a pre trained model. Trained with
millions of drumbeats and it has a sense of what drumbeats are. There's a kick on one,
there's a snare on two or something like that. So, these are the three lines of code that
we need to generate a new drumbeat. So, you just create a new instance of our MusicRNN
model, the checkpoint that we had. We initialize the model, it loads itself up. And then we
call this method continue sequence. We feed it in our note sequence, feed the number of
steps which is kind of arbitrary. Could be 16, 32 or whatever. And then we feed it a
number from zero to two. We'll go over the temperature a little bit letter. So, after
we do that, we just get a sample in return and we play the same way we played the other
one. So, this is what that looked like. This is gonna be generated beat with a temperature
of 1.5. [[drumbeat]
And if you generate it again, it's going to come up with now beats that we've never heard.
Cool. So, yeah. [ Applause ] all right. That was cool. But
it was a little bit of a blackbox. So, I want to go through what happens when we call them
with a continue sequence. We call it here in the three lines are of code. All we're
gonna do here is what's happening behind the scenes is we're gonna convert the note sequence
which is that drum thing. We're going to convert it to a Tensor. And then we're going to encode
the tensor to match the model, the checkpoint that you have. If you're wondering what a
tensor is, you probably already know. If you remember municipal math, is scalars and vectors.
A tensor has three dimensions. That's why you hair the word shape when talking about
machine learning. And especially TensorFlow. These are all but the last is tensor. And
then there's an internal method called sampleRNN. The inputs go into the TensorFlow library
and generates the next notes. If you want to get into the nitty gritty, TensorFlow.JS
is a great place to actually get your hands dirty there. It helps me to visualize like
this. Once more, the continueSequence. And the note here, the noteSequence, convert it
to a model, call the sampleRNN and get the new drums. I told you we were gonna talk about
temperature. It's interesting to me because it's one of the few inputs we have available.
Restructure train it with different drumbeats, which is kind of cool. But temperature is
like the level of entropy in the system. So, the lower the temperature, the more predictable
result we're gonna get. The higher, the less predictable it will be. So, just as an example,
and drop it down here to 0.2. Sounds like really similar to the original
drumbeat. And if we keep generating it, it's pretty much like the same, right? So, now
we're gonna try cranking it up to 1.5. [Drumbeats ]
So, a little bit more exciting, for sure. This is the temperature I like. More fun.
And after we do that, we just have like a little demo button here. It will generate
a new file. And then sometimes what I will do is I will drop it in this, garage band
and use it as a musician for my band. If you're wondering why there's no audio right now,
it's because we're not judging my music skills today. We're talking about JavaScript. Okay.
So, that was cool. It was like almost somewhat intelligent. I wanted to take it one step
further. So, I wanted to give the machine a little bit of motivation with applause.
So, depending on how much you applaud it, the machine would then generate a new temperature.
Here we go. The machine would generate a new temperature based on like the average amplitude
of a couple seconds period of time. I wanted more context to have a better definition of
our somewhat intelligent machine. So, I literally injected more context into it. So, this is
pretty simple. I'm just getting the user's microphone. And I have this little method
here called analyze sound. I'm going to use create script processer and just take the
average volume over a couple seconds. Okay. And against my better judgment, we're gonna
do a live demo. Okay. So, this is the drumbeat. That's the normal drum beat we programmed
in. Then we can generate one. Drop it down a little bit. So, this is like pretty cool.
Generating a new one every time. Okay. Now I need your help. So, I've created
this little perform feature. When I click the button, it's going to wait for applause
for a couple seconds and then it's gonna take that average amplitude over that period of
time, decide on what temperature to play, and then generate the beat based on that.
I promise I'm not trying to manufacture applause for myself. Maybe a little bit. Okay. So,
let's try this. On the count of three, be like nice. But loud. I'm on the count of
three, start applauding. I'm going to hit the button right after you start applauding
and then we'll see what happens. Live demos always work. So with, this should be great.
On the count of three, one, two, three. [ Applause ]
Yeah! All right. So, that's that. So, it actually goes from like zero to 2. It goes up pretty
it still is morning. But you're being considerate. I'm fine with that. Cool. So, that's that.
That is our somewhat intelligent machine. So, did we use context? I think so. We put
in our drumbeat. We took applause. We told it the steps we wanted. It definitely rapidly
completed something that we couldn't do on our own, right? We can generate like a dozen
or so drumbeats just in a couple seconds. So, I think we did it. Other people have created
some really cool things. This is called a neural computer. Usually play a couple notes,
an arpeggio, bounce back and forth. But this will take the temperature into effect. It
uses the improvRNN model from Magenta. I really like it. The Magenta team created
kind of like what we just did, but inside of able10. If you use it, you can do what
we just did, and generate right inside. And the Flaming Lips actually created this thing
called the framing the Flaming Lips and Magenta created this thing called Fruit Genie.
And it was fruit, but it would say like orange and it would feed the model. And then they
created like these giant pool toy type things that had censors on them and then threw them
into the audience and asked people to feed it into the same model and create this, like,
melody. This is a little clip of what that looked like.
>> Written this song especially for tonight's occasion.
THOMAS: So, they threw out these things into the audience. And people and you could hear
it in the melody like cycle back. There's not like
>> Apple THOMAS: So, all of these things, all the stuff
we just talked about. All of it was just Tone.js and Magenta and we created our own as well.
We used a couple other previous inventions, sure. But that was kind of the point, right?
Combining these simple machines to kind of create something more complex. We didn't reinvent
the wheel, by any means. We didn't have to. We just created something a little bit smarter
than it was before with the tools that I had at our disposal. I think we can keep doing
this. We can keep like flipping our tools and creating things that are new and useful
for people and helpful and interesting. And hopefully the inventions that we piece together,
the sum will be greater than its parts. This is such an exciting time to be building stuff.
And I can't wait to see what we all build next. So, thank you.
[ Applause ] KATIE: Wow. Oh, my gosh, all right, I'm gonna
gush for a second about the Flaming Lips. They're one of my favorite bands. I've seen
them live four or five times. If you haven't seen them, even if you don't particularly
love their music, it's an amazing experience. You should go and do it. I'm going to stop
gushing about the Flaming Lips and now I'm going to gush about Thomas. That was really
cool and I really love his message that, you know, like he's not some kind of crazy genius.
He's just like a person who is really into music and really wanted to try something cool.
And that we all could do this with JavaScript. It's like amazing, right? Anyway, so, coming
up next we have Sophia Shoemaker is going to talk about building a PWA that had to work
off the grid in an African country which I can't remember which one, but we need to be
back here at 10:30 for that. So, you have a couple minutes to go out and switch rooms
if you want. But you shouldn't. You should stay here. All right. Thanks, everybody.
[ Applause ]