Placeholder Image

字幕列表 影片播放

  • what's going on?

  • Everybody.

  • And welcome to part five of the reinforcement machine learning Siri's In this video and subsequent videos, we're gonna be talking about deep que learning or de que ends or deep que networks.

  • Um, and to start the prerequisites if you don't know deep learning stop, pump the brakes You gotta learn deep learning.

  • So, uh, if you want you go to the home page of python programming dot net click on the machine learning here and then do this deep learning basics with python tens flew in Caracas at least do like the 1st 2 the 1st 2 tutorials, I want to say, uh, getting it loading in euro data probably 1st 3 especially since we're using confidence.

  • Do the 1st 3 and then come back to this because otherwise you're gonna be so lost.

  • So Okay, Deep Cube learning basically got its start with this The following paper this human level control through deep reinforcement learning if you've ever looked up reinforcement learning.

  • First you found that all the tutorial suck and, uh, hopefully not this one.

  • And then, um and then you've seen the following image.

  • So this is how the D Deep Query network or Deep que network is gonna learn.

  • So what's gonna happen is you've got this input.

  • In this case, it's an image.

  • You've got some convolution layers.

  • They don't have to be convolution layers, fully connected layers.

  • You don't have to have those, but you've got some sort of deep neural network, and then you've got your output layer in your output layer is going to map directly to various actions that you could take on is gonna do so with a linear output.

  • So it's It's a regression model with many outputs, not just one.

  • Now, some people did try to do just one output per Esso.

  • It basically have a model per possible action.

  • That doesn't really work.

  • Well in that super.

  • I'm gonna take a really long time to train.

  • So anyways, here's another example.

  • It's beautiful, really beautiful.

  • Uh, you got input values.

  • So again, it doesn't have to be an image.

  • You know, this could be Delta X, delta y for the food delta X, delta y for the enemy, for example.

  • That could be your input Boom.

  • Then you've got your hidden layers again.

  • They could be convolution.

  • They could be dense.

  • They could be recurrent, whatever you want there.

  • And then here you got your output again with the linear activation.

  • So it's just gonna output thes scaler values.

  • Well, these are your cue values.

  • They map directly to the various actions you could take.

  • So in this case, um, let's say this was your output.

  • You would take the arg Maxwell, the hard Matt or the Max Value here is 9.5 and the values, you know, you're basically we were to map this and get the index values would be 0123 So argh!

  • Max would be one.

  • So whatever Action one is, uh, that would be the move that we would make.

  • Okay, so we've replaced this cute table with a deep neural network.

  • The benefit here is we can learn more complex environments.

  • First of all, we could learn more complex environments just because of deep neural Network is capable of actually mapping that.

  • Also, a deep neural network can take actions that it's never seen before.

  • So with Q learning if if a certain scenario presented itself and it was outside of any of the discreet combinations that we've ever seen, well, it's gonna take a random actions that got initialized randomly, a deep.

  • Nor network, on the other hand, is not.

  • It can actually recognize things that are similar, but it's never seen this thing before and it can act accordingly.

  • So first of all, deep neural network is gonna do better in that case, it.

  • So in that way it can solve for way more complex environments.

  • But also, as we saw as you just even barely increased that that discreet size of your cue table, the amount of memory that's required to maintain that beautiful just just explodes.

  • Right?

  • And it both in in terms of your observation, space size or you're discreet observations based size, but also in your actions.

  • So in our case, up to this point, we've only taken four.

  • I'd like to introduce a fume or actions moving forward.

  • So what we're gonna do is we're gonna keep the diagonal moves, but also allow for just straight cardinal, like up, down, left, right, a cz well, as don't move.

  • So So we're gonna introduce all of those as well, anyway.

  • Um so So for those two reasons, neural networks are way better.

  • The downside is nor networks are kind of finicky.

  • So we've got we're gonna have to handle for a lot of things that are finicky about neural networks.

  • Also, it's gonna take a lot longer to train.

  • So on an identical model or an identical environment like the blob end, where it took our cue table minutes Thio fully populate.

  • Basically, it's just a brute force operation, basically.

  • So if it's small enough, you know, your CPU can handle it where it took, you know, minutes for acute table is gonna take hours for Q learning, but the benefit is where it takes.

  • I'm sorry.

  • It's gonna take hours for deep que learning.

  • Um, but the benefit here is for certain environments where it takes us a long time like weeks for deep to learning to learn a new environment.

  • It would require, you know, petabytes of memory for acute, able to figure it out.

  • Okay, so So that's that.

  • There's your difference.

  • So really, they're gonna solve different types of environments and honestly, que learning is pretty much useless.

  • You can use it for cool novel little niche thing's for sure, but you're not gonna find too much use for it in the real world.

  • I don't think where is deep que learning?

  • You can actually start to apply it to really cool things.

  • So anyways, enough jibber jabber, Uh, the last concept I want to do before we get into, uh, actually, writing code is right here this learned value change.

  • So before the new cue function, basically was this whole thing.

  • Whereas the neural network kind of solves for this, like learning rate and all that, and just updating values.

  • This is solved through back propagation and fancy things.

  • But we still want to retain the future values because neural networks don't care.

  • They don't give a heck about future they care about right now.

  • What is this thing?

  • What is this exact value they don't care about?

  • Well, what does this chain of events do other than a recurrent neural network, I suppose, but really, over current neural network cares about the history, not the future.

  • So anyways, in this case, we still want to retain this.

  • So we are so going to use this, and basically the way we're gonna do this is every step this agent takes.

  • We still need up to attack you value.

  • So what we're gonna do is we query for the Q value we take that action or a random one, depending on Absalon.

  • Then we you know, we re sampled the environment, figure out what are you know what the next reward would be here, and then we can calculate this nuke you value, um, and then do a fit operation.

  • So people who are familiar with the neural networks are already like, Wow, that's a lot of Fitz.

  • Yep, Sure is.

  • Also, that's one fit at a time.

  • So as you're going to see when we go to write this code, we actually have to have to handle for that as well, because that would make for a very unstable neural network s.

  • So there's two ways that we're handling for that.

  • But with that, I think we're just gonna jump in the code.

  • I think it'll make more sense.

  • You coating it in covering it as we get to those points.

  • Um, okay, so hopefully that was enough information.

  • Like I said, all that you two rows have ever seen on deep you learning have been terrible.

  • And there's, like, so much information that's left out.

  • Um, that, you know, to get the overarching concept.

  • Honestly, this picture is enough.

  • But then when it comes time to actually look at code and how it really will work, like can you sit down and code it after you read someone's tutorial?

  • My hope is you really can't after this one.

  • Otherwise, I don't think a tutorial exist for doing it.

  • So So anyways, let's get started.

  • So the first thing we're gonna do is I'm going to create we're gonna at least hopefully code or agent, uh, least the model and talk about some of the parameters.

  • And then probably with next tutorial will do fit minute training, basically and all that.

  • So anyway, class de que en agents and this agent, let's just do defying create model first.

  • That should be fairly basic.

  • It's just going to be a continent.

  • We don't have to do a continent.

  • I'm just gonna do one just so you can more easily translate this to something else.

  • So the first thing you should do, what, any time you learn something is go try to apply it to something else, so try it and then you'll meet it cause you might make sense to you as you're listening to me, talk about it.

  • But then you go to try to apply it, and then suddenly it doesn't work.

  • You're confused or you realize, Oh, I don't really actually know how to do this.

  • So anyway, first thing you should do, try to apply it.

  • Someone complained recently about my drinking on camera, saying I don't want to hear me gulp.

  • Um, I'm drinking because my mouth is dry and the most annoying thing ever is listening to someone talked with a dry mouth.

  • So, uh, you're welcome, jerk.

  • So anyway, um, create model, Okay, so we'll start with model equals a sequential model.

  • And now let's go ahead and pick some importance.

  • So the first thing that I need to bring in is from Karen Stop models.

  • It's import import.

  • Sequential.

  • And then we're going to say from Cara, stop layers.

  • Let's import dense dropout con tu de max pool to D.

  • Um, activation activation flatten.

  • I think that's everything.

  • Sorry, that ran over my face.

  • My bad.

  • Anyway, there we go.

  • Dents, dropout, Combat duty, duty, Max, pool Tootie.

  • And actually it's Max pooling to de activation and flatten.

  • Okay, those air, all of the imports.

  • Also, that you always go to the text based version of editorials of code runs over my ugly mug.

  • Um, you can check those at office all.

  • Actually, I don't even know.

  • Maybe at the very end, it'll be there anyway.

  • By the time you need it, it'll be there.

  • Anyways, Um Okay, so we got that stuff.

  • Um, the other thing.

  • I'll go ahead and import to is from KERA.

  • Stop callbacks.

  • That's important.

  • Tensor board.

  • Um, we need other things, but I want to cover them when we get there.

  • So model is equal to a sequential model that we're gonna say model dot ad, and we're gonna start adding convert to capital D.

  • Let me fix that as well.

  • Cool.

  • Constitute, e uh, which will give us 256 convolutions.

  • And the convolution, um, window be a three by three, and then input underscore shape.

  • And this is going to be equal to, uh, will say, end dot observation, space values, observation space, and then close this off.

  • And this isn't quite yet exist.

  • We have to create the environment.

  • So a little uncertain if I'm gonna actually rewrite the environment.

  • It it's converted to object oriented programming.

  • Now, I may not actually like.

  • We might just copy and paste the updated environment.

  • I'll just talk about it because otherwise it'll take, like, an hour to go for that.

  • So anyways, but we will.

  • This will exist at some point in the near future.

  • Um, then once we've done that, I guess let me zoom out a little bit.

  • Since we're running out of space here, the next thing that we're gonna do is we'll say model dot and sets are con layer.

  • We're gonna add an activation, and the activation here will use rectified, linear and then, uh, model dot ad max pooling to D.

  • We'll use a two by two window again if you don't know what Max pooling is or convolutions, uh, check out that basics tutorial because I cover it, and I also have beautifully drawn photos.

  • If you really like my other photo, you'll love that those voters, uh then after the max pulling, we're just a model, that ad, and we're gonna add a dropout layer and we'll drop out 20% and then we're just gonna do the same thing again.

  • So this will be two by 2 56 So copy pasta.

  • We don't need to include the environment.

  • Shape together the input shape Rather um, this through.

  • Okay, so two by 2 56 then we're gonna say model dot ad We're going to do, ah, flatten here so we can pass it through dense layers.

  • And then we'll say model dot ad will throw in a dense 64 layer and then finally model dot ad a dense layer and it'll be end dot action space size, and then the activation activation will be linear.

  • And then model dot compile will say loss is MSC for mean squared air optimizer will be the Adam Optimizer with a learning rate of 0.1 Uh and then metrics we will track a chorus e.

  • Okay, so that is our model again.

  • Sample code is in.

  • There will be a link in the description to sample code.

  • So if you missed anything, you can check that out.

  • Okay, So, um and then Adam, we don't actually have Adam imported, so let's go ahead and grab that as well.

  • So from care ross dot Optimizers import Adam Just for the record, too, if anybody's watching this in the future, This is still tense or feel like one point 15 I don't really know, actually version online, but it's not tensorflow to yet s, so keep that in mind so something might change by then.

  • And if it has changed, check the comments.

  • Um, And then when it finally actually does truly matter what version on tensorflow I'm on, I will let you guys know.

  • Um, it's also kind of expected that you guys will know howto install tensorflow and care.

  • Awesome.

  • Not gonna cover that again for that.

  • Go to the basics tutorial.

  • So anyway, model compile.

  • Okay, then we will return that model.

  • Okay, so that's our model, then.

  • Uh, define, innit?

  • So then now we're gonna do the innit method for this model.

  • So we're gonna say a self dot model equals create underscore model.

  • And then So that is going What is your problem?

  • Why are you unhappy?

  • What?

  • What the heck am I missing?

  • Oh, self doubt, Karima.

  • Uh, more coffee.

  • Definitely necessary.

  • Okay.

  • Self dot model is create model.

  • So that is that's gonna be our main model.

  • But then we need what's gonna be called or target model.

  • Put the hex index.

  • Let's write that first.

  • So self dot we'll call this target Model equals self dot create model.

  • But then we're gonna say self dot target model dot set underscore weights.

  • And we want to set the weights to be exactly the same as self doubt.

  • Model dot Get underscore waits.

  • Okay, So what's going on here?

  • So way we're gonna wind up having to models here The reason why we want to do that.

  • Is there a few reasons?

  • But mostly it's because this model is going to be going crazy.

  • So first of all, we initiate the model itself is initialized randomly, as all neural networks are.

  • But we also are going to initialize with an F salon, likely of one.

  • So the, um, the agent is also gonna be taking random actions meaningless.

  • So initially, this model is going to be trying to fit to a whole bunch of random, and that's gonna be useless.

  • And but eventually, as as its explored as it's gotten rewards, it's gonna start toe, hopefully figure something out.

  • But the problem is, we're doing a dot predict for every single step this agent takes and what we wanna have is some sort of consistency in those DOT predicts that were doing because besides, doing a dot predict every single step, We're also doing a dot bit every single step.

  • So this model, especially initially, is just gonna be, like call over the place as its attempting to figure things out randomly.

  • So the way we're gonna compensate for that, we're gonna have to models.

  • So we've got self dot model.

  • This is the model that were dot fitting every step.

  • And then we have self doubt target model, this model where this is the one that we're doing a dot predict every step.

  • And I would even say this is This is what we don't predict against every step.

  • And then this is the one that gets gets trained every step.

  • Make note of that because you'll forget.

  • So, um, so then what happens is after every some number of steps or episodes or whatever, you'll re update your target model.

  • So you'll just set the weights to be equal toe model again.

  • So this just keeps a little bit of sanity in your predictions, right?

  • So in the predictions that you're making this is how you'll have a little bit of stability and consistency so your model can can actually learn something because it's just going to be there so much random.

  • It's it's gonna be very challenging initially and so long as you're doing things randomly.

  • Okay, so that's one way we are handling for the chaos that is about to ensue.

  • Next, we're gonna have self doubt replay, underscore memory, and that is going to be a dick.

  • You're Dae kyu.

  • Don't I always forget how to pronounce that to use that we're gonna say from collection collections, import dick you, um And if you don't know a dick Dick you is It is a set length.

  • Think of it as, like getting array or a list that you can say.

  • I want this list to be max size.

  • So we're gonna say Max Land equals replay memory size.

  • Let's go ahead and just set that real quick.

  • We'll just say boom.

  • And we'll set this to, um, 50,000.

  • Also, a cooler trick I recently learned was You can use underscores in place of Commons eso so python seizes as 50,000 but it's a little more human readable, like this number is a little harder.

  • But then, especially like once you have, like a number like this, the underscore suddenly becomes very useful.

  • So cool.

  • So don't do it.

  • Plus, though, underscore Cool.

  • So that's our replay memory size.

  • So what the heck is this?

  • So the replay memory.

  • So we talked about how we keep the predictions.

  • So how we keep the agent taking consistent ish.

  • So at least the agent is taking steps consistent with with the model overtime.

  • Okay.

  • Trying to have a great way to word that.

  • But we've got the prediction consistently consistent, see, sort of under control.

  • It's still gonna be crazy.

  • It's at the start, period.

  • But this is that we've got that settled.

  • Now we need to handle for the fit mint craziness that is going to ensue because because we said, Well, you know, we've got two models, so we slowed down 11 way of working with that model, but the other one's still going to be crazy.

  • So self doubt model, it's still gonna be dot fit every single step.

  • But not only is it getting a dot fit every single step, it's getting a dot fit of one value every single step.

  • So again, assuming you're not new to neural networks, you know that we train neural networks with a batch and part of the reason why we do that as one, It's quicker.

  • But it also tends to thio have better results.

  • Not necessarily the higher the batch you use.

  • At some point, the batch size gets too big.

  • But to have some sort of batch size used, it generally will result in a more stable model and just learning over time.

  • So if you just fit with one thing, it's gonna adjust all those weights in accordance to that one thing that's gonna get to the new thing, do another fit, and it's gonna just all the weights and accordance that one thing as opposed to If you throw in a batch of 64 things, it's gonna just all the weights in accordance to all 64 things.

  • So it's not gonna over fit toe one sample and then go to the next sample over fit to that sample, then go to another.

  • You know, I'm saying so, um, so we want to handle for that, too, in the way that we do that is we have this replay memory.

  • So we take this replay memory and that can be, let's, say, 50,000 steps.

  • Okay, and then we take a random sampling off those 50,000 steps, and that's the batch that we're gonna feed off of.

  • So train the neural network off.

  • So that's how we're getting stability in both the training network That's getting updated every single step.

  • But hopefully we're smoothing that out a little bit.

  • And then we're smoothing out the predictions because we're also not allowing the prediction model to be updated every step.

  • Instead, it's being updated.

  • Let's say every five episodes or whatever we intend to go with, these are all hyper parameters or really constants that we're gonna set at the top of our program.

  • Okay, replay memory.

  • Okay, so the next thing is gonna be self dot tensor, uh, tense or board, and that's going to be equal to modified tensor board that we're gonna set log underscore dir to be logs slash uh var dash bar and we're gonna make this an F string and we're going to import import time.

  • If you're want to say 3.5 or younger python, you can't do f strings.

  • You'll have to do a dot for matter or whatever.

  • Um and so for the logs were gonna say model, underscore name.

  • And then we're gonna say int time dot time.

  • This is just so we can kind of keep track of what's what on and then the next thing I want to do, I thought I was getting pissy at me for this.

  • Anyway, Uh, let's go ahead and set model name before I forget.

  • And for now, just it's whatever you want.

  • I'm gonna say it to 56 by two weeks later.

  • You might try 32 by 2 to 64 by two.

  • Anyways, by 2 56 by two.

  • I mean, the neural network.

  • It's a 2 56 by too confident.

  • So we've got tense aboard.

  • Done.

  • Obviously, this doesn't yet exist.

  • I'll talk about that in a moment, and then finally, self don't target update counter equals zero.

  • So we're gonna use this to track internally.

  • Um, when we're ready to update that target model.

  • So this you know, this thing here that we've explained why we want to do that.

  • So, uh, the next thing we want is, um I'm not gonna write out that.

  • So go to the text based version of the world and we're gonna grab this updated, Um, this 10th this updated tensor board class did we already think already pulled intense or bored.

  • But what I really want is this right here this whole class groups.

  • I want that comment too.

  • Gonna copy that.

  • I don't really see any benefit for us writing that, but I'll explain why we want that here in a moment.

  • Paste that in.

  • Cool.

  • So what this is doing is there's just modifying the tensor board functionality from tensorflow and care us.

  • So by default, Paris wants to update this this tense aboard file this log file every time we do a dot fit well, we're doing it all fit, Uh, you know, up about 200 times an episode, um, and then we're going to do, like, 25,000 episodes.

  • So, first of all, I'm sorry.

  • You think I used the word update log file, but it wants to create a new log file per dot fit.

  • Well, we don't want that.

  • We want just one log file.

  • So, um, so having this is gonna solve quite a few things.

  • This was written by Daniel, and the basically won.

  • The Io was taking a lot of time, so just simply writing the new file was sometimes taking longer than a dot fit.

  • But also, we've got all these extra files.

  • Each one was 200 kilobytes.

  • Not necessary.

  • And yes.

  • Oh, just a bunch of it was just not constructive to do this.

  • I guess that's the behavior.

  • Because most neural networks were just gonna fit once on some sort of cycle.

  • And it's not a big deal, but for deep que learning where you're fitting just constantly, Um, something needed to be done.

  • Okay, so that is our modified tensor board.

  • Uh, now, what we're gonna do is go back toward E qn agent.

  • Um, so that's our innit method?

  • Let me just kind of clean this up a little bit.

  • I want these kind of separated out, uh, kind of want that separated out.

  • Yeah, I can't Really.

  • Is that what I want?

  • So I'll just leave that so then create model.

  • Let's go ahead.

  • Let me see if I can Hopefully Well, we're not gonna finish this, but let's add in one more method, so it's a defined, um, And in this one, we're gonna say updates, underscore replay, underscore memory, uh, self and then transition train, train transition.

  • Can we type?

  • Um, so then we're gonna say self dot replay underscore memory dot Upend Trant.

  • Wow.

  • I really want to call that transition transition.

  • So transition is observation is just gonna be your observation, space action reward new observation space and then whether or not it was done.

  • So we need to do that so we can do that new cue formula.

  • So that's that's all we're doing there.

  • You'll hopefully that'll probably make more sense when we actually get to actually use self dot update, replay, memory or agent dot update replay memory.

  • Ah, Finally, Um, yeah, we'll do.

  • We'll do this one last matter and then we'll save train for the next next tutorial.

  • That's gonna take a while so define, uh, get cues.

  • So we will get cues Self terminal, Terminal state, and then step.

  • And then So this will be returned.

  • A self dot model underscore er predict and we're going to say the number pi array.

  • I think we've imported dump.

  • I will grab that soon.

  • Um, of the ST dot Reshape, uh, negative one.

  • And then we're gonna see asterisk ST dot shape.

  • All that's doing is unpacking state.

  • Um, and oh, did night?

  • I sure did.

  • Internal state.

  • That's for trained.

  • Right now we're just passing state here.

  • So peter ST dot shape and, um and then we're gonna do div div by 2 55 Predict and then zero So modeled on critics always returns a list.

  • Even though in this case were really only predicting against one element, it's still gonna be a list of one element.

  • Earn array of one element.

  • So we still want to zero with element there, did by 2 55 is because we're passing in this RGB image data.

  • So you can very easily normalize that data by just simply dividing by 2 55 Okay, so that's it.

  • Let's go ahead and go to the very top here, Let me important umpires and P So say import numb pie, as in p.

  • Let me go and save that.

  • And I think I'm gonna save train for the next tutorial because I was gonna take a while So quick shadow to, uh, the recent We're not recent.

  • Actually, these are my long term channel members.

  • People who have been sticking around for a while.

  • Reginal Robert seven months, Louise Fernando, seven months.

  • Stefan are Scoot Harwood.

  • Eight months.

  • Ampara hammer.

  • Eight months ever.

  • William Sands, nine months in average.

  • Eat 10 months.

  • That's quite a while.

  • You guys, you guys all for your support.

  • Um, And if you guys have any questions, comments, concerns, whatever, feel free to leave them below.

  • If anything up to this point is confusing or you want to get a little more information again comments below come join us in discord dot g slash syntax.

  • If you think I've done something wrong or said something wrong, um, I'm sure I have feel free to let me know.

  • Um and yeah, I'm hoping that you guys can get through this tutorial.

  • Siri's here and actually be able to use this stuff.

  • I think right now, I just like I said, I could not find a tutorial, uh, or a information to actually do this kind of stuff that actually made sense or actually explained everything.

  • So just a lot of copying and pasting going on.

  • So my hope is that you guys can actually get through this and know how to do this yourself.

  • So that's my goal.

  • So if you're having trouble getting to that point, let me know.

  • Because all I need to fix that.

  • So, anyways, I will see you guys in the next video where we will do the train method, and then we might only do the train method, but hopefully we can do the environment.

  • I'm thinking I might just copy the environment we like way made some tweaks to the environment, and we made that object oriented.

  • And so I might just, like, copy that, cause the environment honestly has not changed.

  • We just made object oriented just to make it easier.

  • Um, so you know, there's, like, little tweaks made, and I think I'm just gonna talk about the tweaks that were made.

  • Uh, and that being that way, hopefully, maybe in the next tutorial, we can actually do the train method here, talk about that deep que learning stuff and hopefully get a model training by the next video.

  • So I'll think about that.

  • You can leave your opinion below about pride.

  • By the time you've seen this, I will have already made my sitter anyway, Uh, yeah, till next time.

what's going on?

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

Deep Q Learning w/ DQN - Reinforcement Learning p.5 (Deep Q Learning w/ DQN - Reinforcement Learning p.5)

  • 6 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字