字幕列表 影片播放 列印英文字幕 (bell dinging) - Hello, and welcome to another Beginner's Guide to Machine Learning with ml5.js in JavaScript. So I'm here. It's been a while since I added a video to this playlist, and a bunch of things about the ml5 library itself have changed. There's a new release, 0.3.1. There is a brand new website, which you can find right here at ml5js.org. So to some extent, this video is really an update about the library, but I'm also going to look at one particular feature, a new feature of the library, sound classification. The machine learning model that I'm gonna use in this video is the Speech Command Recognizer, and this is a model available from Google as part of TensorFlow.js models. Now, so this is a really important distinction. I am not here to train a sound classifier. I might do that in a future video and show you about how to apply transfer learning, which is something I did with images, also to sounds. I just gonna make use of a freely available, pre-trained machine learning model. Anytime you use one of those things, even in just a playful and experimental way, which is what I'm doing, it's good to do a little bit of research and take a look at like well, how was this trained, what the data, what are the considerations around how the data was collected? And so I encourage you to read through the read me here on GitHub and in particular, to click over and read the original paper about this speech commands model, and there you'll see, if you look, it talks about some of the datasets like Mozilla's Common Voice dataset, 500 hours from 20,000 different people, this LibriSpeech, 1,000 hours of read English speech. I don't know how to say this, TIDY DIGITS, TIDIGITS, T DIGITS, 25,000 digit sequences, which apparently was probably neat, right? It's just like hours and hours of me reading this random number book over and over again. But so I encourage you to check out this paper, and you can also find code for how to use this model at TensorFlow.js in the tfjs models, GitHub repo itself. I also want to interrupt this video for a second to talk about how the sound classifier actually works. This is kind of a surprising little tidbit, and I'll come back to this more if at some point I create a video about training your own sound classifier. Now, there different ways you could do this. This isn't the way to make a sound classifier, but this is the way that this particular model works. It's actually shockingly, amazingly doing image classification. So if you image we have this thing that's called a convolutional neural network. This is the underlying architecture, the structure of that machine learning model that does the classification. Typically this kind of model is something that we would put images in. Like we might have images of cats. We might have an image of a turtle. That's not really turtle, but whatever. So the idea is that we're sending these images in and getting back a label and maybe a confidence score. So it's the same idea. The only thing is now we wanna send in audio and get back a label like up or one and a confidence score. So how would we convert sound into an image? Now, again, there are other neural network architectures which you could receive sound data in maybe a more direct fashion, but if you have ever looked at a graphic equalizer or some type of sound visualization system, I've made examples like this in p5, you can draw something that's often referred to as the spectrogram, which is basically a graph of all the various amplitudes of frequencies, the wave patterns of the sound itself. So if we took a one second spectrogram and made that into an image, we could then send that image into a convolutional neural network saying that's the image that is produced from the spectrogram of somebody saying the word, up. So underneath the hood, this machine learning system, even though it's designed to work with audio data, it first takes that audio data, converts it into an image and then sends it through a very similar types of neural network architecture to standard image classification models. And you can read more about that in that paper itself. However, I'm gonna show you how to access this model in a quick way with the ml5 library. And this is the new as of today, which is I dunno. What's today's date? June 13th, 2019 (laughing). I'm gonna show you how to use this with the ml5 library as it stands today. So I'm gonna click here under reference. One thing you should see, there's a lot of new features have been added to the ml5 library. I'm gonna come back and do videos about more of those, but the one I wanna highlight is sound classifier. So I'm gonna click on this, and for all of the different functions available in ml5, you'll find a documentation page with some narrative documentation, a little bit of a code snippet and then some written documentation about what the function names are and the various parameters and things like that. And by the way, I'm noticing now (laughing). This will hopefully not read. This is like a mistake (laughing). This is documentation that's actually for either Body-Pix or maybe the U-Net model, which does something called image segmentation. So we gotta get that fixed. I'm sure many GitHub issues and fixes will be out and done by the time you see this. So in case you've forgotten how to use the ml5 library, I'm just gonna show you as it's documented on the ml5 webpage. So first of all, you can go here to this Quickstart. You can actually just click on this open p5 web editor sketch with ml5js added. You know what, I'm gonna so that. That's the way I'm gonna do it. But you also could just put a script tag in your HTML page referencing the current version of the library, which, as I said, is 0.3.1 as of today, but probably while you're watching it, it will be a higher number. So lemme go and just open up this link here, and now I'm in the p5 web editor. You could see the name of the sketch is ml5js boilerplate. Thank you, Joey Lee who's a contributor to ml5. He's done a ton of work on the website and all of the different features. And oh, this should actually be 3.1. I'm gonna fix that, uh-huh. I'm gonna hit save, and then I'm gonna rename it to sound classifier. And I am going to then go over here and go to sketch.js, and I'm then I'm gonna run this, and we should see. There we go. So now we know it's working because there's a little console log to log ml5.version. If I hadn't imported the ml5 library, I wouldn't see that, and we see that here. So, what are we gonna do? Let's load the sound classifier. Now, most of the models, I haven't been using this in my previous videos, most of the models in ml5 are now actually available to you in preload, meaning you don't need a callback function. You can just load the model in preload, and it'll be ready by the time you get to setup. So I'm gonna make a variable called soundClassifier. In preload, I'm gonna say soundClassifier equals ml5.soundClassifier. Now, I need to tell it what model I want to load. So I need to, in here, put the name of the model I wanna load, and in theory, in the future, there might be a bunch of different options, different kinds of sound classifiers or maybe a sound classifier you've trained yourself that you wanna put in there, and I'll come back eventually, show you videos about how to do that. But for right now, I'm just gonna say SpeechCommands, and then I already forgot what it was called. So I'm gonna go back to the ml5 website, which is here. I'm gonna go to reference. I'm gonna go to soundClassifier, and I'm looking for it here. So it's SpeechCommands18w. This is a particular model that's been trained on 18 specific words, and you can see what those are. The 10 digits from zero to nine, up, down, left, right, go, stop, yes, no, that's 18. 10 digits, eight different words. All right, so now I'm gonna go, so it was 18w, and then, once that model is loaded, I need a callback. So I could just say soundClassifier.Classify, and I might just call it gotResults. So in other words, I'm. Oh, it's not defined, right? So I'm telling the sound classifier to classify. Now, by default, it's just going to listen to the microphone's audio. Maybe in the future, part of ml5 will be able to offer hooks to how you can, to connect it to a different audio source, but it's basically just gonna work with the microphone's audio. Then I can write a function called gotResults, and I'm gonna get rid of the draw loop 'cause I don't need that right now. Lemme just turn off auto refresh so that it doesn't keep refreshing. And then now, if you remember, ml5 employs error first callbacks, meaning the callback function requires two arguments, an error argument in case something went wrong, and a data or results or some other argument where the actual stuff is. So I'm gonna say error, and then I'm gonna say results. And then I could do a little like basic error handling. I'm just gonna say console.log something went wrong, and then I can also actually log the error, all right. And then, so now, and then I'm gonna say console.log(results). So let's see if we get anything. Oh, I have to run it again. And you could ignore this error. Oh, (gasping) something came in! Ready? Up. I just wanna stop and mention that if you're following this along, hopefully your browser is asking for permission to use the microphone. The reason why that didn't happen here in this video is because I've already set my browser to allow use of the microphone on the p5 Web Editor pages, but for security, you can't just access anybody's microphone from a webpage without the user giving permission. So hopefully you saw that happen, and if you you didn't, that might be why you run into an error if you haven't given that permission. This is getting a little hard to debug just because so much stuff is happening here on the console and this huge arrays, but there's actually something that I missed that I could add here, which is an options variable. So one of the things I could tell, there's a lot of things I can set as properties or parameters for how the sound classifier should work, but there's a very simple one, which I'm gonna just look it up in the documentation 'cause I don't remember. It's called the probabilityThreshold. I'm actually just gonna copy-paste this here. What this means is basically the sound classifier is going to trigger an event. Right now I'm console logging all of this information about what it thinks it heard based on a confidence level for how sure it is it heard one of those keywords. And right now, a lot of those events are triggering because I don't know what the default probability threshold is. Maybe it was .7. Maybe it's .5, but I'm gonna make that really high. I'm gonna say .95. So it has to have a 95%, the machine learning model has to calculate a 95% confidence score before it gives the event back to me in ml5. Once I've created that options variable with .95, I need to pass it into the constructor as the second argument. So now we pass it in there. I'm gonna run the sketch. I'm gonna say the keyword up, and then I'm gonna try to look into the console to see if that's what came in. Up. And there we go. Look at that! Now other stuff is coming in, but you saw it there! So rather than kind of debug with the console, let me actually put what I said onto the webpage itself. Also, to make this easier to see, let me actually console.log(results index zero label and results index zero, I believe it's called confidence. So rather than have this big array logging in the console, let me do this. All right, we need to have a 95% confidence, and I'm gonna run this. Up. Three, four, five, six, seven, eight. I'm quickly adding background color white to the HTML body because what I wanna then do, just quickly, before I finish this off, but to finish this off, let me just add a DOM element using the p5 DOM library. I'm gonna just say resultP for results paragraph. I'm gonna say resultP equals createP waiting, and then, right now, I'm gonna say resultP.html. Then I could turn these results into a string by using a string literal. So back tick and then put curly brackets. Put a colon here and curly brackets and a closed back tick, okay. And let me also say resultP.style, is it font size, font-size, just 32 point so we'll be able to see it. All right, here we go. Ready for this? One, two, five, up, down, left, right. Okay, so (clapping), you could imagine now what you could do with this. For example, you could control a game with your voice. And in fact, I'm gonna do that in one of my coding challenge videos. So take a look in this video's description. I'm gonna do a coding challenge where I program the Google Dinosaur game, and then I'm gonna add this sound classifier to have the dinosaur jump except it won't be a dinosaur. It'll be a unicorn, to have the unicorn jump when I say the keyword, up. All right, thanks for watching this additional ml5 tutorial video about sound classification in the browser. (bell dinging) (energetic dance music) (bell dinging)