與訓練好的模型一起奔跑--深度學習在海瀾之家AI大賽中的應用第6頁。 (Running with Trained Model - Deep Learning in Halite AI competition p.6)

字幕列表影片播放

What's going on?
Everybody welcome to part six of the M l and Deep Learning with the highlight three challenge tutorials in this video, we're just gonna continue building on the last one, which I believe I left off fixing a slight bug that we weren't able to train for multiple E pox.
So let me quickly bring up tensor board tensor board.
Lauder Equal lips blogger equals longs doing it.
10 surfboard logger equals long slash.
And for some reason I get a bunch of this.
I think it's because we're really we're kind of overriding the steps with my crappy way of doing things, but we'll get through it.
So then, um, I found that the best way to get there is with local host Colin 6006.
If I use like 1 27 001 that doesn't work.
And if I use the machine name Colon 6006 that doesn't work either.
So anyways, this is what's working.
So let me just smoothies quite a bit.
Um, so the red line is three whole pox.
Uh, the orange and blue is just one iPAQ.
I'm gonna switch these two relative so you can kind of see those compared, Um, but really, after three A pox, Really?
I mean, there's a mean, maybe it's slightly comes up, or at least it levels out in the in sample.
But then at a sample, not too too impressive.
I have to switch it to relative to show you the iPAQ Val ac the again because of the way they were doing things.
So again, the orders line is actually only one iPAQ.
The red line is three pox in the blue line is one he parked.
The only difference is the Orange line was the time when we actually had to load the data and prepare it.
Whereas these two down here, we didn't.
So, as you can see preparing the data, it takes, like, three times as long.
So anyway, accuracy somewhere between 24 a half and 25%.
That's pretty good, considering, you know, random choice would be 20%.
And in this case, you know, we've trained a model off, you know, using a threshold.
Sure, but we're training off of models that are moving randomly.
So the fact that we got 4 to 5% over random is actually pretty impressive.
That tells me our threshold probably worked, but now it's time to find out.
So now we begin the massively iterative process where we've trained a model.
Now we want to run a bunch of games with that model and some threshold.
And then the hope is that our average or mean kind of Hal I collected has gone up a little bit.
Any little bit will help, especially these earlier games.
So, uh, so what I'd like to do is, First of all, let's go ahead and just copy training data.
In fact, I figured this That'll take too long.
But I'd like to do let me just delete this.
We'll just call this training data and I'm gonna call it underscore one just for the first set and then we'll make a new folder.
Call it training data.
Now that's empty.
And then we can run those stats that we did where we do the hist a gram, but also just like the mean on all that we can run some stats and see, Has the model actually improved at all?
We also could do the model versus the random and see which one wins more often or something like that.
But anyways, what we shall do now is implement the aye aye rather than random incentive on.
So I'm gonna copy Paste Center by and now that's just sent Abi.
And then it'll be Dash, not an ill will be.
Dash em.
Hell, Okay, let's pull that up.
And I am that I just close everything else.
Cool.
All right.
So we have to make a few imports also because of the way Like how light uses standard aRer outputs and stuff for the, you know, the game and python to talk to each other.
So we have to make sure we silence all that stuff.
So I'm gonna make a few imports here in Port Au s, Simon import cysts.
Well, it's a standard error equals sister s t d air.
And they were, say, at sister S t d air equals open.
Oh s dot definable with the intention to write.
And then we're going to import tensor flow as t f.
I hate that sublime.
Does that to me?
Space should not.
Where is it?
Oh, it's cause you hit enter for the new line.
I don't know how you get around that actually.
Anyway, system standard error equals standard air.
So the issue is when you first import tensorflow in, it's gonna tell you things.
Like what?
Back end.
You're not the back end that you're using.
Butt's gonna tell you things about, like, that's what caress it's gonna tell you things about, like, your GPU and stuff like that.
And we need to silence that.
So this is how we're doing it.
Basically, to silence all that output if you have it, if the game is gonna air out.
So now Ah, Santa ma m l can remain the name.
That's totally fine.
Uh, next thing we need to do is continue to silence any of the tensorflow outputs.
So, um, I guess we can still write this.
And I thought about just copying and pasting X.
This is just kind of cookie cutter code that was used last year to silence stuff from Hal I TTE Anyway, os dot environ, uh, and then what we want to say in here is T f c p p men long level, and we're gonna set that equal to three and string.
And then now we want to set the GPU options that's gonna be equal to t f dot All caps GPU capital.
Oh, options on this will be per process GPU memory fraction and we set that to 0.5 This is a small model.
We don't need much memory.
Why do we want to do this?
Well, by default, the TENSORFLOW is gonna allocate as much as it can and then keep stuff just floating in memory to attempt to make things as fast as possible for you.
The problem is, this is going to exhaust a lot of power Now.
The next thing is, I'd like to run three or four simultaneously simultaneous instances.
So you have to keep in mind that at any given time, upto like, say, 16 a eyes could be trying to run and make predictions at any given time.
So it's really important that we set this number very low now, depending on how many you can run at any given time.
Given your certain CPU and all that, you can adjust this fraction, but I've actually found it runs perfectly fine and actually it seems to run faster on a low GPU memory for action.
And I'm unsure why, but it seems much faster, like 10 times faster.
It's very strange.
Anyway, I think that must be some sort of bug or something.
That is very unintended, but definitely something I would look into in the future with a larger model.
Even so anyways, now we're gonna set our session cess equals T f session.
And then we're doing this so that when we loaded our model, it's gonna follow this session for us and use our Jeep you options that we set.
So config equals t f dot com vig proto and then GPU underscore Options equals GPU Underscore options.
Great.
Now we're ready to load in our model t f equals t f dot care.
Ah, stop models don't load.
Underscore model.
And then we give the model name.
I forgot the name and the other thing we want is random.
Underscore chance.
And I'm gonna call this secret stock choice.
And then in here, I'm gonna say 0 15 250.35 So this will be a random chance that the A I will choose a random move over the model's choice.
So we still wanna have some exploration like we can't just train models and then because the model isn't random anymore.
So we want to still have some random kind of permutations.
22 for the model to learn from new moves over time.
So, um, so, anyways, you know, 0.15 would be, you know, 15% chance of every move A 150.25 would be fourth of all the moves, 0.35 35% of the time will be Iran to move.
And then every time we initialize this, but some boss will be 35% random, some will be 25 some will be 15.
You can play with those numbers as you want.
We just know that we want some random exploration involved here.
The next thing I'd like to do is go into models and let's pick.
00 no.
We've made a horrible mistake.
That's a huge bummer to see.
We can train models pretty quickly.
That stinks.
Uh, luckily, I think that model should be fine, but let me fix that.
In the training code, we must have for gotten f it, uh, shoot.
Where is sent to train.
Um Mmm.
We've been I guess, that the model save.
Yeah.
Dang, what an idiot.
So, this one, it will be the three iPAQ model, which is fine.
I'm gonna, uh I'm not sure I would have used that one.
The other thing that might be wise to save, like, the score of the model.
So maybe, like, run a model that score right before the save.
Something like that.
Uh, whatever.
Uh, I'm gonna come into here, and I'm gonna rename this one, uh, models.
Hopefully, Uh, man, I just feel pretty bad because I wonder.
I hope that's a good bottle.
Probably.
What i'll do is I'll finish the code here, make sure it works, and then I'm probably gonna retrain that model, cause I don't have faith that I didn't accidentally run train and then I stopped it or whatever.
After a quick short, it's, uh, segment.
So, anyway, I'm gonna call this for now.
Phase one anyway.
Come back up to center train, er or not said to train close that.
Yeah, there.
If we run in trouble, come back to this script.
Come to the top in the model we wanna load.
Sent about ml.
Where?
Here.
Phase one.
All right.
Save the progress so far.
Okay, Now what we want to do is I think I actually kinda want I just want to leave the threshold the same for now, because I want to see if the mean has gone up.
Like, I just want to see if it's actually gotten better or not.
So I'm actually gonna leave the threshold the same with the hope that it actually doesn't prove same thing with total turns.
I'm gonna leave everything the same, make the same amount of ships, all that.
And instead, what I want to do is I want to come down to where we actually make our, uh, our choice.
So choice seekers, Choice range, Len, Direction, order.
So what we're gonna say is, if, ah, secrets dot choice range of int one out of random chance equals one, then we want to say choice equals secrets dot Choice.
Um, well, in fact, basically this I think that's what I want.
So basically, if it's one out of, you know, 15% or whatever, as you make that, uh, range, the question is, if that number is equal to one, then we'll make a random movement.
Otherwise, what do we want to say so else we're gonna say Model underscore Out equals model dot Predict And then we want to predict, uh, always prediction takes a list, even if it's one thing we want to predict.
We need to predict the numb pie array conversion of surroundings with a dot reshape negative one.
And then it's negative one by the size, by the size by three channels.
So in this case, we can say Lend surroundings Comma Len, it's not running on my face.
No surroundings.
Comma three.
Let me zoom out a little bit so we can see the full line model out Models predict numb pyre a conversion dot Reshape negative one by 33 by 33 by three.
Now this returns because you pass a list, it also returns a list.
So we want the zero with element.
So that's the actual prediction.
Now, even though we were using scale er's it is still gonna output to us a one hunt vector 100 Ray.
So we want to say prediction equals np dot Argh!
Max of model out now logging dot info.
We want to save this.
We're gonna say f prediction.
And that is gonna be direction, order prediction, and then choice choice equals prediction.
So then we leave all of this other stuff.
And so as we build the data, the data is built with either are random movement or we're not.
Um, just because I'm my brain is going slow today.
I don't know about you guys.
Uh, let's import lips, import secrets, paste this, uh, four I in range.
Let's say 100.
I'm gonna tab this over.
I'm gonna print or actually, if that equals one print.
Yes, I apologize for this.
I gotta I gotta make sure I'm doing this right.
So we're just gonna generate over this 100 times.
If that is equal to one, and then let's say random chance equals it should be about 15 of those.
Uh, let's do five perps 05 Should be about five.
Obviously, we'd have to run just a ton, and it's random five.
And then this should be about a 10% chance About 10.
Should be the average one we see Seems a little low, but it's probably correct, Steve 0.3.
So this should be a average of about 30 looks to be about right.
Okay.
Just want to make sure that logic was sound.
So in this case, the question is, if this is equal to one great, that's our 35% chance or whatever.
Um, okay, then we might allow bubble blower, make the choice, run the game.
Everything else stays the same.
We're not changing any of that.
So hot diggity.
Let's run it.
So it happens.
We need to change run game or Python version.
No longer is it the one that we have.
It's sent about dash ml now, dash passion, Mel and, uh, pride before I make the mistake, let's just highlight one of these.
Replace all.
Great.
Um, are we ready?
Well, I think we are ready.
I also think Oh, actually, as we run, the game will see it.
So this won't office.
Kate, what comes out?
So hey, let's run it.
I forget if python straight up runs what we want, it s o Python run game dot pie.
Let's see what happens.
So on initial, uh, shoot.
I know one thing That's probably cause a problem, I think.
Let me just break this along.
What did that say?
Communication failed.
Uh, okay, let's read one of these, okay, that Okay?
There's nothing in there.
And then what we'll do is we'll go into the replays.
I can't all check in a second.
Surely we're not timing out there right now.
This is gonna be a heat.
If you don't give me any error, This is gonna be huge pain in my took us.
Okay.
Okay, so we are timing out, okay?
I feel it going about sneeze, apologize.
Fight it.
Okay, so now, uh, no time out.
So, dash dash, no time out.
Now, here's a little trick.
Um, what I have found is the first time the model is loaded in the first time.
The model makes a prediction.
It takes a little long, and then all the subsequent ones are quite fast.
So for now, I'm gonna I'm gonna run this with a no time out because I don't care.
I'm trying to build data right now.
I don't care about that.
But in the future, probably what I would suggest you do to avoid that time out is just make a B s model.
Predict before you do gamed out.
Ready.
So once you've called gamed out, ready, everything needs to happen within two milliseconds, so keep that in mind.
But for now, uh, let me make sure I did that.
OK, Uh, well, set.
No time out, and we should be fine.
So let me just let's go again.
I don't know why it's hanging.
Okay.
Got it.
Um, stupid air, but also strange.
I didn't get why you didn't care to me.
So I know we're silencing errors to some degree, but then I copied Sentebale ml and I commented out to my understanding, all of the things that we were doing, that we're silencing the output.
And I just tried to run just this and see if it would load.
Uh, and it would not, so yeah.
Anyway, the issue was we put, we wrote phase one in here.
Well, phase one.
Unless that's a directory.
Maybe somewhere.
Phase one.
No, it just doesn't exist.
Phase one doesn't exist, so we should have got an error, but we weren't getting the error.
So cool.
Anyway.
Regardless, um, the Aye aye is running.
We can already see here.
Some of the outputs I would wager I started almost immediately whenever started running.
It looks like we're actually getting quite a bit of how light per per game.
We're already started building some training data.
Let me pull up the new training data here.
Um, so immediately, these.
I think these were the first to cames, but we might have just gotten lucky, but it will be very interesting.
Like, right now, the, uh the distribution isn't quite so heavily weighted towards the 41 hundreds, but I'll let it run for quite a while, actually.
Open up a couple more run quite a bit.
I'd like to get at least another you want at least probably somewhere between five and 10,000 games if you want to make a fair assessment.
But these takes so much longer to create, so we'll see if I have patients to do that.
But I might let it run all weekend or something.
But the real question is, is it actually better than the previous one?
But as you can see here, all of these games saved because they were all great games.
So anyways, once I have maybe about 1000 games or something, I'll check the distribution and see, And if it's pretty darn good, we can figure out what threshold do we want to use and so on.
So anyways, that's it.
It was a passing issue, stupid on my behalf.
But also, I'm curious why it didn't why that didn't come up.
Even when I created the copy script, ran the copy script and still like, Why didn't I get an error That said, Hey, this doesn't exist.
Very curious.
But whatever, uh, that's it for now.
Questions, comments, concerns, whatever.
Feel free to leave them below.
Otherwise, I'm gonna see you guys in the next video where we, um just basically we'll check the distribution here and then continue that iterative process and just kind of all probably I don't know what I'll do in the next video.
If all maybe already do the iterative process, I don't know.
Or maybe the next video.
We'll talk about the changes and then from there, do the interruptive process because basically after like at this point, nothing new happens really a TTE some point we want to add more ships and we want to change the threshold in all that.
This, In my opinion, this a I is way better.
I mean, is he is I mean, we don't have a 5000 game yet.
But look how few of them are at the 41 hundred's.
Almost instantly were in the 40 twos and up, so I would hazard to guess This is, ah, significantly improved.
But we'll see just by how much in a while, anyways.
That's all for now.
See in the next video.