Placeholder Image

字幕列表 影片播放

  • what's going on?

  • Everybody And welcome to self driving cars with Python Carla and hopefully, some reinforcement learning.

  • We'll see.

  • Ah, where we left off.

  • We actually built the environment code that we're going to use sort of the environment layer on top of the Carla server client code.

  • Basically, uh And so this is this is where this is basically what translates information to our agent, which is what's going to train our actual model.

  • Now what we want to do today is actually code that agent.

  • But doing that brings up a different problem that we have.

  • And that is we want to be able to predict and trained at the same time, right?

  • That's pretty inherent to reinforcement learning.

  • The problem is, this is a relatively complex environment that requires a decent amount of processing, both just to display the environment.

  • But then the actual model that we're gonna wind up with is taking in a large amount of data because it's taking in this image data here.

  • So the model's gonna be large.

  • Okay, there's a lot of weights going on in a lot of calculations.

  • So So we've got that as a problem.

  • The other thing is we want to do we really want to do this of real time?

  • We could do this in synchronous mode with Carla, so it's not running full, like full simulation speed, right?

  • It's running in waiting for everyone to get their update.

  • And you could say I want this to run exactly 60 frames per second or 42 or what?

  • For that you wanted not good.

  • So So instead, what we have is this this challenge where we want to train and predict at the same time on something that isn't going to be super fast.

  • But yet at the same time, we want this to happen at I mean ideally 90 frames a second or 60 frames a second.

  • But we're probably not gonna see anything even remotely close to that.

  • Um, but we want as good as we can get.

  • So we're gonna end up using threats.

  • So, uh, and basically that way we can train and predict at the same time and hopefully not have so much delay in between doing these things.

  • So anyway, that's what we're going to be doing today, or at least starting to d'oh.

  • So what I'm gonna do is.

  • I'm just gonna come down here, uh, and then we're gonna begin some new code.

  • So there is a class D Q and agents, um, and let me just bring this over.

  • So if you haven't done that reinforcement learning serious, I strongly recommend you pause now and do that.

  • Siri's If you go to Python Permanent neck, go to machine learning, then click on reinforcement Learning.

  • Um, and then I would honestly just do the whole thing.

  • It's not really a long serious.

  • There's only six parts, but definitely want to do deep que learning and get an idea for how that works because we're gonna be using very, very everything's going to be very similar.

  • Um, just as it's been even up to this point.

  • So definitely check that out, because you're gonna be pretty confused otherwise, but well, anyway, here we go.

  • So define and we're gonna have our innit method self.

  • And we're gonna start off by saying self top model equals self.

  • Don't create model.

  • So we're obviously gonna have a method that creates model.

  • For now, we're not gonna worry about that.

  • Then we're gonna say itself up target underscore model, and that's gonna be the exact same thing.

  • Self.

  • Don't create model.

  • Then we're gonna say self dot target model dot set weights and we're gonna set the weights to be self dot model dot get waits.

  • So again, if these is that this line here is confusing to you, go back to that reinforcement learning Siris.

  • Okay, so we want this every now and then.

  • Basically, the target model's gonna update to the models.

  • So, basically, with reinforcement learning as a reminder, even if you have seen it, you have two models, right?

  • One model is the one that's actually being constantly trained.

  • The other one's the one that we predict against We want the one that we predict against to be relatively stable.

  • If we're always updating that model, we're going to get really volatile results is gonna be very hard for this model to actually get like, coherent results.

  • So instead, what we do is we kind of hold that model steady to predict against.

  • And then we're constantly training the other model and then after n number of episodes, or even steps or whatever you want to go with, usually it's episodes, though we're going to update the the what?

  • The other model.

  • The one that we're predicting against.

  • So anyway, cool Target model Papa block.

  • Okay, cool.

  • So now we're going to say is self doubt Replay underscore.

  • Memory is equal to Dick you or day Q?

  • I don't really know.

  • And they're gonna say Max, land equals lips.

  • Uh, and then this is replay memory signs.

  • So So again, if you don't know what that is, go back to that course.

  • But we want basically, this is the memory of previous actions.

  • And again, we pick a random set of actions again to help with just crazy volatility and actually keeping things relatively sane, at least attempting to.

  • So, um, let's go ahead and make these these two things since we're using them.

  • So let's go to the top.

  • Let's say from collections import Q, nice.

  • Got it.

  • And then we're gonna say replay memories size.

  • We're gonna set that to be 5000.

  • I know I've said it before, but yeah, the underscores is like a comma, right?

  • So that means 5000 and super useful, Like if you wanted to do five million, right?

  • That that's so much easier to read than that, right?

  • I can't just quickly glance it and be like, Oh, that's five million.

  • Or even if he especially once you start getting into more astronomical numbers, the underscore is super helpful.

  • Okay, uh, replay.

  • Memory size.

  • That's fine.

  • Um, let's go ahead and make a few more of these constants.

  • Wow, we're here.

  • We're gonna need replay memory size.

  • We're also going tohave.

  • Um, we can set men replay.

  • Memory size will set that to be 1000.

  • We're gonna set mini batch size.

  • We'll say that.

  • 16.

  • Uh, we're gonna say prediction.

  • Batch signs will set that to be one training batch size will be mini batch size divided by four no remainder.

  • That's what the double div is there.

  • Updates up.

  • Update target every.

  • So this is how many, basically, at the end of how many episodes.

  • So basically, every five episodes will update that target model we're going to be, and we're gonna end up using model name at some point.

  • For now, I'm gonna call this exception.

  • That's the model that I'm gonna use.

  • You can feel free to change it.

  • You can make your own custom model use different one.

  • I don't really care.

  • Uh, and, uh, throw in Katherine's other things to memory.

  • Fraction equals, and I'm gonna say 0.8.

  • This is how much of your GP are you gonna want to use?

  • So I'm I'm using an arty ex Titan card in for some reason, that card and I apparently all the r T X cards.

  • I have this weird issue where they attempt to allocate more memory than they have.

  • And the only way for me that I found overcome this is to use a memory for action that is less than one point.

  • Oh, basically.

  • So don't let the card attempt to allocate all the memory.

  • I don't know why that is.

  • I don't know if that's tensorflow specifically is fall towards that coud a tool kit or cootie nn I don't know whose fault that is, but it's just a thing that's happening right now, so I have to do this problem.

  • Most people maybe don't even have to do that at all.

  • I don't really know.

  • So, um, let's see.

  • So the other things we can say men reward is negative 202 for the model saving?

  • Um, actually, we wouldn't even use men reward anyway.

  • uh, I think that's the US All the stuff I'm gonna use for now, uh, we could also do our discount Absalon and all that stuff as well.

  • So it's going to that too.

  • So we're gonna set discount discount will set that 2.99 Then we're gonna set, uh, let's do episodes.

  • So how many episodes?

  • How many episodes do we want to?

  • D'oh ah, 100 is way too few.

  • But I'm just going to set it to something reasonable for now s So we got a discount.

  • So then, uh, Eppes Eppes not caps Epsilon IX.

  • This is gonna change.

  • Hopefully, uh, Absalon underscore decay 0.95 just to see it decay.

  • Probably later on, we're gonna wind up with something more like 0.9975 or 99975 Something like that.

  • Honestly, to check that I usually determined how many's episodes do I really want to go?

  • And then, uh, we're actually Japanese.

  • It depends either episodes or steps, depending on when you're gonna decide to decay.

  • Then I just read a four loop to determine.

  • Determine the number.

  • There's there's got to be, like some sort of function out there that you could just say, Hey, I want to do this many steps make Give me a reasonable decayed number.

  • Um, but for now, we'll go with that, uh, and then we're gonna say, men, men, Epps, ill epistle on equals This.

  • So 1/10 of a percent.

  • Then we'll have aggregate stats.

  • Every will say 10 episodes, and then we'll have, actually, Do we?

  • We've really gotta show preview.

  • Okay, so now that we've added, you know, 6000 lines of Constance and starting variables, let's go back to our actual agent code.

  • So replay memory cool.

  • The next thing we're gonna do is self Don't self dot tensor board.

  • And that's going to be a modified tensor board, I believe.

  • Oh, I closed it.

  • I believe that, um, the this will come up, Put this in the code as well, uh, longer equals.

  • Um, I think this is I think I use this as well in, uh, the Q learning stuff.

  • So if you've done that, Siri's, you should have this code somewhere, but I'll put in the text based version two for this modified tensor board function.

  • At some point, I'll show where to get it in until I have it pasted somewhere reasonable.

  • I'm not gonna show it.

  • So right now, I'm not gonna add this code.

  • Um, but anyway, just modifies tents or board to be a little more reasonable for the task of I am confident.

  • It's for Q learn.

  • Because the problem is we don't need as many updates to tensor board is tense aboard wants to d'oh.

  • So yeah, because it's gonna basically want to create a custom file per train session or trained, uh, loop, I want to say or something.

  • Or is it might be a perf dot fit, I think is the problem.

  • I can't I can't remember.

  • It's been too long.

  • But anyway, regardless, we want to modify it so it stops doing that nonsense.

  • And both for speed purposes and storage purposes.

  • Anyway, wrist at least Best buy the logger.

  • Uh, and it will be model.

  • It's an F string.

  • In case you didn't see that, I threw that f there, uh, model name and then dash dash, and then the int into value of time.

  • What time did we import time?

  • We did great.

  • Okay, so that specifies are tense aboard.

  • Um there was a self dot target underscore update underscore counter or to say that singled zero for now on then that basically gets updated every end of episode.

  • They were going to say Self dot graph equals t f dot get defaults graph.

  • And we're gonna do this in a separate threat again.

  • We have to.

  • Well, we don't have to use threatening, but we want to use threading.

  • And this is we need to specify this because we want to use this in a separate thread.

  • Ah, and we want to do that because we want to predict, and we want to train in different threads.

  • Self graph.

  • Okay, so now we want to do is self don't terminate equals false.

  • This is one of the flags that we're gonna use for our threads.

  • And then we're going to say self dot last longed episode equal zero.

  • Uh, this is just to help us keep track of tense or bored as we're doing things relatively, uh, a synchronously.

  • And then self dot training underscore initialized equals false again.

  • This will be another flag.

  • We just because we're gonna be doing stuff in separate threads.

  • We wanna have these flags to sort of keep track of what we're doing.

  • So this will be when we start running simulations.

  • So now what I want to do is create or actual model, so define, create model self.

  • And in this case, we could do a bunch of things, right.

  • You could You could begin by saying, model equals, you know, do the sequential model stuff and then model dot ad.

  • And then you start adding convolution layers or dense layers or recurrent layers or whatever that you want to do.

  • But what I'd like to do is straight use exception.

  • So, miss a base model equals exception.

  • We need to import exception.

  • But for now, I'm gonna say waits is none.

  • We don't want to start with any of their weights were going to say include Tom False input shape.

  • So top is that first layer since we're gonna be using a different layer, we don't want to include that top layer.

  • The input shape will be our heights by wit.

  • So input shape will be image, height, image with and then it's by three.

  • And then before I forget, let me go to the top and we want to actually import exceptions.

  • So, uh, let's just do that.

  • Um I don't really know if I want this at the end of all my imports or two separate Carla from Kara Ross.

  • I think I'll do it at the end, to be honest.

  • Ah, from Caracas.

  • Don't applications dot Exception.

  • We want to import exception done.

  • Easy.

  • So Okay.

  • So any time you take one of these models like these pre built models from tensorflow, what we need to do is usually change the input layer as well as change the output layer.

  • So the way that we're going to do that is we want to specify what do we want to use as input?

  • What do we want to use?

  • His output?

  • Basically.

  • So pretty much This is input.

  • Um, and then what?

  • We want to changes.

  • We're gonna say X equals base model, and we're gonna say dot output There was a X equals global average.

  • Uh, pooling to D Times X.

  • We need the import global average.

  • Pooling will do that in a moment.

  • Um, and then we're going to say is, predictions equals and it's a dense layer again.

  • We'll need to import that as well.

  • It's three neurons activation.

  • Activision will be linear and that's multiplied by X.

  • This is three because we have three options, right?

  • We can go left straight.

  • Basically, we can turn left.

  • Don't turn it.

  • Also go straight.

  • Turn right.

  • Or actually, it's not Don't turn it all, it's straight.

  • Go straight.

  • So Okay, so those are our possible predictions Then we're gonna say, is the actual model that we want to use is model, which again will also have to import where to, say inputs is evil to base model dot input and the outputs will be equal to predictions.

  • So basically, it could have been based model that output but way want to change a few things?

  • We want to make it a dense layer.

  • We could have made it pretty much just a dense layer, but we wanted to add the global average pulling here.

  • Uh, okay.

  • Predictions done.

  • Model dot Compile and smiled at Compile.

  • Uh, loss is going to be messy for mean squared error.

  • The optimizer we're going to use is going to be the Adam Optimizer again loft import that learning rate zero points.

  • Is there one here?

  • The metrics.

  • We wish to track accuracy Okay, Uh, return model, and we're good.

  • So, uh, we need to make all these imports so we'll go up to the top.

  • Will say from Kare aas.

  • Don't layers were going to import dense global average pulling in, Uh, and then from Cara stuck up Dima Misers, I think.

  • Import.

  • Adam, let me Triggerman Newt's make sure that's correct.

  • Optimizers.

  • Yes.

  • Uh, And then the other thing we want to grab is from KERA Stop models.

  • We want to import model, and I think that's everything again.

  • We're gonna find out if we're missing imports and stuff.

  • That's not the most important.

  • Our most challenging thing to figure out.

  • Ah, but I think that's good.

  • Let me just look real quick here.

  • You fixed this.

  • This should also alert me if I'm missing anything, Looks like we're good.

  • So okay, that's done.

  • Now we're going to define we're gonna update replay, memory, self and then with any transition.

  • And I'll explain that in a moment.

  • So a transition is going to basically contain all of the information that we need to train a, um, to train the model basically so or specifically, a deep que learning model.

  • So that is going to be the current.

  • Let me just say this.

  • Let me just throw in.

  • Transition equals and it's going to be It's like a list, right?

  • Uh, well, it's actually we'll call it a to pull because it's not going to change.

  • You're going to have a current underscore state the reward.

  • Nope.

  • So I have notes, action, reward Ah, new state.

  • And then whether or not we're done.

  • So this has all the information that we need when we're going to actually like train models and stuff like that Update these update the Q value, so to speak.

  • So, uh, so that's what the transition is and that will make will need to know that even more so in a moment.

  • But for now, over to say, self dot replay underscore memory dot upend transition.

  • So we don't really I need to know what a transition is here, but very soon we will because define train itself and this is where basically the magic happens.

  • So the first thing we need to do is make sure we have enough replay memory toe actually train.

  • If we don't have enough replay memory, one epsilon is likely very high we're just doing random things anyways.

  • It's essential that we train on this, but we won't have enough replay memory again.

  • So we have enough information for this model that actually start learning anything useful at all.

  • So the first thing we're gonna ask is if the len of self doubt replay memory is less than the minimum replay memories size, then we're just gonna return.

  • We don't really want to do anything.

  • Otherwise, we want to train, and the first thing we want to do is grab our mini batch to train on.

  • So to say, random dot sample of self dot replay memory and the sample size that we want is mini batch size, so that gets us our random mini batch.

  • Now we need to do is calculate or current states and then the future states so that we can actually do the training operation.

  • So current not caps current states, uh, equals a numb pyre Ray of and we're gonna say transition zero with again.

  • That's our current state right zero with element.

  • And when we get our future will be 0123 So transition three.

  • When we try to get our future states so, uh, transition 04 transition in.

  • Whatever that mini batch was.

  • Now, that state that's your observation.

  • What's our observation?

  • It's an image.

  • What do we want to do?

  • Every time we use images with neural networks, we want to divide by 255.

  • So again, this is just our way of scaling the information.

  • Before we were doing this scale operation in process image.

  • But now we're gonna add it here.

  • So current states, uh, cool.

  • So then what we're gonna say is with lips with self dot graph as defaults, we're going to specify our current cues list is self dot model pay attention to model versus target model self don't model, don't predict.

  • And then we want to predict, based on the current states and the batch size that we want to uses whatever prediction batch size that we said, I think we actually have that set toe one right now.

  • Yeah, uh, later we could we could try something bigger, at least for me.

  • Personally, this is a super heavy GPU, CPU and RAM task.

  • It is all of the things, so it's very challenging to run.

  • So anyway, current cues list Let's go and spell that correctly while we're at it.

  • Okay, so now that we've done that, we want to do this future future cues.

  • So future cues is going to be very similar to this.

  • So, in fact, I'm gonna copy Paste.

  • This could bite me in the butt.

  • So I guess new current states that will be transition 34 Transition in mini batch with self doubt graph as default.

  • That's fine.

  • Change this to be future Hughes List, not self dot model itself.

  • Target underscore model.

  • And it's not current states.

  • It's new current states.

  • I think that's good.

  • Let me know if you spot is you, uh, capital axe equals empty list.

  • Lower case.

  • Why equals also empty list.

  • Now what we want to dio his we're gonna enumerates over, um, all of our batches and do our training.

  • And basically, we want to make sure we only do our training when basically, well, we we need to create the Q values and we only want to do that if we didn't because there can be no future.

  • So we only want to update the cues when we have a future state.

  • So if it's a terminal state.

  • So I eat.

  • We crashed.

  • Then we actually set the queue to be whatever reward was just now.

  • But if we didn't crash, we actually want to set the Q based on that cue formula.

  • So what we're gonna say, Here's four throw four index comma and then again, this will be the entire to pull this right here, that the entire transition.

  • So for index, all this stuff in a new Marie, whatever mini batch is Oh, don't forget our colon here.

  • Okay for that, then what we want to say.

  • We want to make sure we're not actually done so, if not done, Max underscore.

  • Future q equals np dot max Future cues list.

  • Four.

  • The Specific Index index New.

  • Q.

  • He equals reward plus discount times Max Future Cube.

  • If we are done, though, the new Q is simply equal to that of reward.

  • Again.

  • This concept has been explained already in the reinforcement learning Siri's, but we all including myself, sometimes the refreshers.

  • So next thing is, uh, current cues equals current cues list for this index, and then we're gonna say current cues for that action that we took new cue.

  • So this is how we're gonna update that actual que value.

  • And then once we've done that, ex dot upend current states.

  • Why dot upend current cues?

  • Okay, so, four, if not okay, this needs to be tabbed over.

  • So that was almost a of me in tears.

  • Moment in the future.

  • Let's go wrong.

  • Okay, So, uh, all right, now that we've done that, um, we're going to throw into some code for the tensor board, so log this step equals fall.

  • So we want to make sure we're logging on Lee per episode.

  • If self dot tensor board dot step is greater than self dot last logged episode long this step will set that equal to true self self dot Last log episode equal self dot tensor board dot step Uh, okay, so now that we've updated these values and we've we've made, like, X and y and all that stuff, what we need to do is actually fit the x and y your current status by 2 55 So when we go and do that, something pulled down too.

  • Hear current state.

  • I just have to check.

  • Okay.

  • So current states are I just want to make 100%.

  • So current state is always going to be here.

  • So it has not been divided by 2 55 I had to make sure.

  • So what we want to do now is we're gonna say with self dot graph dot as default, we're going to say self dot model dot fits the num pyre ray of ex div to video five np array of why the batch size will be training batch size.

  • We're going to set verbose to zero shuffle to false would be unnecessary callbacks.

  • We're going to set to be self dot tensor board if long this if long step else none So callbacks will be self doubt tensor board If we're trying to log that step otherwise no call back.

  • Okay, so now we want to dio is, uh, update that log this step stuff.

  • So if long this step self don't target update counter plus equals one Then if self don't target update counter is greater greater than update target Every what we want to say is self taught target underscore model dot set waits equals lips actually don't set waits to self dot model dot get waits So again it just becomes a copy of that initial model or the actual model, uh, then self, not target update.

  • Counter equals zero.

  • Cool.

  • OK, so that's our train method.

  • Hopefully, that's good.

  • I suspect we're gonna have a lot of issues, but that's okay.

  • Um let me decide.

  • I think we'll finish.

  • We can finish the agent code here.

  • Well, we don't have that much more to do, So eso first of all, define get cues.

  • Uh, he's herself in state, so we want to pass the state, get Q values that's just gonna be returned.

  • Self, not model dot predicts numb pie array of whatever state happens to be passed.

  • And then we're gonna do a reshape negative one and then ST dot shape div 2 55 And we are expecting that to be a single thing.

  • So again, as always, prediction predicts you pass a list, you get a list back in this case where when we get accused, we're just getting one at a time, eh?

  • So we're gonna grab the zero with Element ST dot Shaped.

  • This will just be our image shape.

  • This has a little bit of Daniel sprinkled into it, and I I guess he likes making ugly.

  • Could hope you're watching.

  • Uh, what we're gonna do is define train in blue.

  • Thanks, Daniel.

  • Doing great.

  • So, uh, and s.

  • So in case you don't know what this means up here, this is just unpacking the shape of state, which is our observation, which is height with three.

  • So, uh, 4 86 64 Thisun was going on too long a way.

  • It's the height width times three or by three.

  • So anyways, just unpacking.

  • It's a nice, distinct waited thio coated in there, but that quickly and doesn't I?

  • Yeah, but he didn't know training in loop.

  • Um, basically, so far, we actually haven't seen this threading that I was referring to that we're gonna wind up needing you to do.

  • But this will be the first time we really see what's truly is going on here.

  • But recall we want to train and predict in different threads because we don't want training to slow down predictions.

  • We want to be able to make predictions as quickly as possible, and we want to train also as quickly as possible.

  • So the way that we're gonna do that is, um ah, couple of ways first of all, the initially for whatever reason, tense air flow again.

  • I don't know.

  • It's tense.

  • Her clothes fall, Is it?

  • And videos fault, is it Kudos fault?

  • I don't know.

  • But the very first time you actually start training or the first time you try to do a prediction.

  • It's really slow compared to subsequent predictions.

  • That's why if you wanted put something in the cloud, for example, that actually is like some sort of a p I that does predictions.

  • You want to make sure you're predicting in some sort of infinite loop, you would never want to, like, re initialize everything all over again.

  • But they're not just initializing and you're not just like loading in the weights.

  • For some reason, you're also doing some other thing that I don't know what it is, but that first prediction is slow.

  • So So what we're gonna say is, uh, we're just gonna randomly do, um if it mint and then what we're going to do from there is do sorry.

  • I said prediction, but I meant it is Wellit's prediction or train is just that first thing takes forever.

  • Anyway, um, the first thing we're gonna do is We're just going to do a quick fit mint on nothing, and then we're gonna fit from there.

  • So we're gonna say X equals np dot random dot uni form.

  • And we're going to size equals, uh, one.

  • And then we're going to do What is it?

  • Him, uh, am height?

  • Mm.

  • Wits.

  • And then three.

  • Um, how come you didn't unpack there, Daniel, as type as type np float 32.

  • Uh, then what we're essay is y equals np dot random dot uniform.

  • You need uniformed size equals 13 and then dawn as type np dot flute 32.

  • So again, we're just just gonna make it up.

  • We just want to do a quick fit mint on something.

  • So with self duck craft dot as defaults Ah, self dot model dot fit X y for booze falls batch size one.

  • Great.

  • So once we've done that, we will say self dot training initialized is equal to true.

  • And then we're going to just initiate this wild, true loop that goes on and on and on.

  • So while true gift self done terminating terminates, then we will return, and we're done.

  • Otherwise we're gonna self not train, and then we're gonna throw in a quick sleep.

  • 0.1 and we're done.

  • So later, we'll get into training and loop and how old that's gonna work.

  • I think I'll save that for the next tutorial.

  • This tutorial is quite long.

  • I think I might have breached 40 minutes on that one.

  • So we're done.

  • Uh, okay.

  • So if you have questions, comments, concerns over the code up to this point, feel free to leave below.

  • You can also come to chat with us, Hang out in the discord.

  • That's the score dot Gigi's last cent tax also shout out to recent Channel members who are not recent who have been with us for many months.

  • Um, it was Jean get Michael Severson and Ravi Krishnan nine months you d you Day men.

  • Shonda Bingley Meta Neue Reginald Roberts Bill Fall ins be Erica Bow tip Veeru Goat shoe.

  • Kessie ueki.

  • Uh, if I've mispronounced your name, feel free to let me know in the comments section, and I'll try to get it right next time.

  • Anyways, in the next video, what we're gonna do is, um Well, I haven't fully decided we should be able to finish all of the code.

  • And then maybe what I'll do is I'll show you at least where I'm at now, um, with this code and then maybe I'll continue tweaking.

  • Things are working on things and kind of share updates as we go because I hate to be the bearer of bad news, but this is probably not good enough.

  • For example, our reward function is probably no good.

  • We're gonna need something a little more complicated than what we've got.

  • Um, we've got a lot of variables going on here.

  • There's also highly likely to be some sort of bug in the code or, you know, some sort of assumption that I've made that is incorrect.

  • There's a there's just so much going on here that it's highly likely this does not work out of the gate, and instead it's gonna take a lot of tweaking and coming back to it.

  • T really figure out what's best, but you're free.

  • You can feel free to play with everything.

  • Um, maybe you can come up with something better than even I can or me and Daniel can.

  • So anyways, um, that's it for now.

  • Next Arturo.

  • Yet we'll wrap all this up and at least show it.

  • Hopefully starting to run and hopefully train to some degree.

  • And at least make sure that everything is working like Absalon decay.

  • Hopefully losses doing the thing we're not getting, like, perfect accuracy or any of the other red flags.

  • And we should overly see that after training for a few hours, average reward is going up.

  • That sort of thing.

  • We want to know that that something is happening.

  • Okay, so anyways, I will see you guys in the next video.

what's going on?

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

強化學習代理--用Carla和Python實現自動駕駛汽車第4頁。 (Reinforcement Learning Agent - Self-driving cars with Carla and Python p.4)

  • 1 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字