用深度學習解決更復雜的數學問題--非常規神經網絡第11頁。 (More Complex Math with Deep Learning - Unconventional Neural Networks p.11)

字幕列表影片播放

what is going on?
Everybody.
And welcome back to the unconventional neural network Siris In this video, only doing is running over some of the results from the neural network that was trained just to do addition, we're gonna talk about some statistics.
We're gonna see how it did and then what we're going to be looking looking to do, moving forward.
So with that, let's hop in.
First of all, how did we do?
So one thing I did was I wrote a few different kinds scripts just for comparing how we're doing as time went on.
So so one thing week we can do is inside model.
We should have, um, the train log directory.
So one option we have is to just do si m d Here.
In addition, Python.
Uh, actually, sorry.
Tensor tense, sir.
Board logger equals train.
Underscore.
Log.
Run that.
And then we want to open this.
So let me just pull that up over here.
So these are the results.
So we can We can obviously see how loss was doing.
So actually, this model could have continued to train.
Well, actually, I'm sorry this problem isn't fully loaded yet.
We've prided to wait a minute.
Yes, you can see it's slowly certain level out.
We'll just we'll count.
Need to wait for that to to finish processing.
So when one way that we can see how models doing is to look at train log butt, part of the issue is with with loss.
The loss is really all about the weights.
And when when we're looking at something like a mathematic problem, or in our case, like when we're doing the Mt.
Chatbots, even there's a reason why we used other metrics like Blue and PPL because, uh, loss could mean, Sure, we're learning something.
But are we learning the thing we were hoping to learn?
That might be a different question, so you often want to come up with separate metric for that.
So what I did was just manually as this model trained, I just took the output Dev as they came through, and I just rename them quickly.
So as they were coming, um, I would just say at what step point was that depth.
So at five k steps, 10 case steps and and so on all the way to 57 k when it finished training um, after about 100 K, I just went to bed.
So that's why I didn't I didn't get anything in between there.
Hopefully, our tensor board is is done now, so we can kind of pull that up, and I'm just gonna refresh it.
So let's just open up loss.
Okay, so So we can actually see after about 1 20 k steps.
It looks like I'm gonna guess that the learning rate was taken down there as well, Let's see, at 120 0 actually, yes.
So right before I went to bed, Um, I recognize that the learning rate was one negative five or should have been wanting negative five.
Yeah.
I wonder why That says one you negative for, um weird.
Oh, this is negative.
One negative for So the middle ground here is 20 negative five.
Anyway, um, before I went to bed, that was gonna be the last epoch, and I thought it was still only about 50% accurate.
So I decided, You know what?
Let's add somewhere a pox.
And when I did that, it jumped back up.
So interestingly enough, we were actually kind of done learning.
Um then I allowed it to jump back up and look how we just boom just jumped away down in los Um so if anything looking back, this is actually the first time I'm looking at this, uh, that jump up in learning rate.
Basically, what that tells me is we jumped down way too early.
We shouldn't have jumped down that early because putting it back up like we were were fixing a plateau right there basically, but allowing it to jump back up again comes back down here, and that's probably what sealed the deal.
So cool.
I'll keep that in mind because we're gonna make things a little more complex here, So, uh, cool.
So that's that's the tensor board.
But that doesn't tell us, You know, learning rate doesn't tell us.
Did it actually learn math?
So instead, what I used was the output deva file.
So just open one of those up.
Ah, do sublime text and we can see Okay, that's basically the output from the, um, the model.
And it's because it's a character level model.
The spaces aren't actually spaces spaces between three outputs.
Now, the input to these, um, was basically would be like new data on then from.
So from this, we should get this.
Is that correct?
That's right.
Right.
Anyway, um six.
Yeah.
All right.
So yeah, from here, we should get this obviously minus the spaces.
And the question is, is this corrector is incorrect now, even at 5000 steps?
Um, I just I know that this is probably about Kurt lips.
I know it's about correct because they're pretty.
They're at least close there within the same range.
The same ballpark, right?
But not all these air.
Correct.
So what I did was I just kind of manually saved those output Debs as they go.
So those were just that's, like, our way of validating things.
So every 5000 steps would about 5000 steps.
It would go to the thes testing one's, which contains samples out of So they're out of sample testing from, like, the training data itself.
So this should be samples that the model has never seen, and this is how it does on those.
So what I did was at least started with this file.
So let me open that up and we'll talk about that.
So?
So this one opens Dev.
10 k.
But we can start with five k even.
And what it's gonna do is gonna open up the test file.
So test from it's gonna open up the output file, and that's gonna open up test dot too.
So So what was it supposed to be?
And then what we're gonna do is just create a couple of counters here.
We're gonna follow pep eight.
We should put our spaces here.
Okay?
And you're Anyway, um, we're gonna do is we're gonna count, basically, how often were we correct versus how many total samples did we have in the set?
And, uh, and then we can get an actual accuracy rating, so that was like, the first thing I started doing.
Um, man, I got a lot of pep.
Eight paid issues here.
Every now in a while, someone says my coat is ugly, and I say you're banned from the channel, so let's go ahead and run this.
So as Reese can see here, basically, um what the output of this is what the input waas.
What are desired output was and what the model said, Hey, this is the output of like this, like the output of 43.8000 plus 98.8.
Basically, it should have been 1 42,000 and some change, and we said 1 44 So actually, almost immediately, we can see Hey, this this model is at least on the right track.
But at this stage in the game, um, to me, I'm starting to think, you know, is it only getting close because it's basically brute?
Forcing this and it's it's finding another example where it was close to 43 8 away, plus something close to 98,000 got millions of samples.
Um, is it just getting close because it's it's just like putting the output of what that was back in back in that example?
And so at least my line of thinking here is one that could be the case so long is accuracy, you know, actual.
Where are you?
Actual accuracy doesn't improve.
That's one.
So that's one symptom of maybe it's just brute force.
The other symptom of brute force is doing something like we're going to do later, which would be tracking the total absolute difference of the model out versus the desired out.
So so So not only should accuracy be rising over time, but the total difference between accuracy and what we intended kind of like loss.
Only this is much more exact to our task.
Um, we'd like to see that going down as well over time.
And if that starts going down over time, we can be relatively comfortable that no, it's actually seeming to learn some sort of math here, Um and then, of course, at the final stage of the game, if it is able to get any accuracy, so the 0.0 means it got none of them right, if it's able to learn any accuracy and that starts going up them were to assume that it's learned math.
And my argument further is that we had 100,000 so it 0 to 100,000 plus 0 to 100,000.
So that's 100,000 combos here, 100,000 combos here to get the total number of combinations on both sides of that plus sign that be 100,000 times 100,000 which should come out to 10 1,000,000,000 possible combinations.
Now the output isn't 10 billion possible combinations of output, but as far as inputs concerned, that's 10 billion.
And we only fed the right Think three million So someone can feel for you.
Correct me.
Sometimes I get those these kinds of scales wrong and stuff like that.
But to my understanding, like, you know, you could have 0 to 100,000.
Here's a 100,000 and then even here, you've got five and they've got 100,000 examples of five.
Plus what?
So 100,000 times 100,000, 10 billion?
So it can't in my mind be brute force either there.
But it is simple math.
It's just edition.
Okay, let's see how we did later on.
So that was five K.
How about at I don't remember.
I think I had a 15 K.
Let's just test it real quick.
So even at 15 k, we got some.
Yea is in there and it's 1.2%.
You might be asking house at 1.2.
We have 500 samples, so that's why so anyways, we have some.
Suddenly we're making some accuracy now, Like I was saying, the other thing that you might want to start doing is tracking the total difference.
Let's see if we're Yeah, I'm doing that in this one.
So in this one Ah, we start tracking not only the total accuracy, but also the total difference.
So So, starting from let's just do five K and then we'll jump to 25 cakes at five K.
The total difference.
So the, you know, think of it again.
It's just like the loss.
The total is almost 400,000.
Ah, but by the time we jump to model after 25 k steps, the total loss or the total difference, I'm calling it lost because that's kind of what it is in my mind for this exact problem is not lost.
Anyway, the total difference that it was off was only 21,000.
That's a massive improvement after just a 20,000 steps.
Now I know everybody's waiting for the final, um result.
So let me open up comparing number three.
And basically what I did here is, rather than printing and, um, you know, the actual desired out in the model out.
I'm just running through.
Well, where did it go?
Oh, I opened it in idle.
I'm so used to using Idol it's open it in Sioux blame.
So in this one, I just iterating over house really mad.
But I reiterate over all of the checkpoints that we have, so we can kind of see how it does over time.
So my apologies.
I just have to fix it.
And then pretty soon it's gonna get angry at me over violating 78.
Anyway, go and run it.
And as we can see, the correct rate was zero.
Here, the sum of all the differences.
400 k very quickly.
It drunk jumps down all the way down to actually after the full 2 57 case steps, although there's quite the gap between one and three and 2 57 So I went to sleep.
Um, we got 100% accuracy.
So this model again, there's 10 billion plausible combinations.
The model got 100% accuracy on the ad.
A sample testing, no difference, obviously because of 100% accuracy.
Um, so we could definitely train a model to do math, So this was actually really surprising to me to get a perfect score through that, And the other thing we can do is we can actually run the difference.
I haven't actually done this.
So So if we find that it's actually not accurate, I'm gonna be pretty pretty angry on video.
But let's go ahead.
Run it.
So what was the words that could go wrong?
Um, so this is just running inference, not pie.
Now, let's just do like a 567 Plus that, um I don't even know what the answer should be.
It was bought the old trusty calculator.
Where is the old calculator?
Ah, so what do we even put in?
567 567 plus 40 to 14.
47 81.
So the model's top results was 47 81.
Now, the way this works is, um it has a thing called the beam beam with and that's how many predictions it's gonna plausibly make.
And then from there you score them.
Um, basically, this is the model's top choice, But you can see the model was thinking about maybe it's these other things too.
But it chose the top result as 47.
Anyone.
That's when we go with Ah, we could do something Super simple.
Oh, so this one that actually gets wrong five plus two.
It believed it was two sevens, actually.
All the way down here, I'm gonna go ahead and guess it's probably not that great at simple math, Um, or like, small numbers, because that I bet small numbers were just, like, super rare for that should work.
Yeah.
Interesting.
So, um, so with large ones, it works.
What, about $500 to 5000 plus seven.
Who gets that one wrong to?
This is really interesting that it's getting so many wrong.
Yet in the sample testing, it got a perfect score.
I just wonder if the things that I'm thinking to type in or just ran, you know, so not likely.
Let's try.
I'm curious.
Does it just never did get a, uh, single digits right now.
I've got the nine.
Right.
Um, so this should go over 100,000?
Yeah.
I don't even know if that's a correct answer.
Let's check.
It s O.
C.
Actually, I can't even tell.
I don't know if you guys can see this.
Yes, you can.
Okay, so that would be 95 421 plus 24 to 11 1 19 6 32 So it appears to be pretty good at large number addition, but I bet it's just so rare in the training data that it gets small numbers.
Let's try seven plus 107 plus 90 about 91 Bob LOB.
So should be 91 to 37.
Yeah, so what?
That's interesting.
So it's like little numbers that it gets wrong.
Tried two digits to 57.
Wrong.
Did it get 72 in any of the So none of the outputs didn't think 72.
So this one has learned, uh, multiple addition with large numbers, and that's also wrong.
Wow.
How did we get 100% accuracy on 500 samples and yet fail so many times through the inference?
I wonder if we have a difference in how the inference runs vs Heat 76 That's right, huh?
I don't know.
Maybe because, like, the same I don't know, I I can on Lee.
Ah, I can only guess why.
Why?
That's acting so much different.
It's only for the small numbers, though, For the most part, I think we missed one up here too, though, right?
This one?
What about?
So this should have been 8 52 is that even in its got a 51.
That's not even in there anyway.
Um, regardless, probably with more training data and mortar kind of manipulation.
I suspect I could I could get a model to be perfect.
Um, this one was just actually meant to just be a simple test.
I never actually intended even fully trained this model.
But I was curious.
I'm gonna go ahead and say this is mostly solved.
Uh, it's definitely solvable, I think with some tweaking, we could get it to be perfect, even on the little numbers.
Um, but now what I want to talk about is kind of moving forward into the next challenge.
So there's kind of 22 challenges I'd like to consider.
One would be just simply doing so like this was just addition, which is relatively simple.
I think the next check task would be something a little more complex, which would be doing all of the symbols.
So me open up math out of here.
Here we go.
Sublime.
Pull us over.
So in this case, I've just slightly modified to choose a random choice both in the training data and the testing data.
Um, and it just chooses adds a mold of one of those.
And then when it does, is create the training data.
And I'll just show you a quick example from to I don't know where it went here.
Yeah, So it's got some, uh, subtraction and multiplication and division addition.
It's got all of them and then obviously all the answers.
So with with that, the question becomes Now, in this case, I think I did 10 million samples.
Only Pull it up again, root quick.
I don't understood so used to it yet.
10 million.
So that's the one I'm gonna start running right now.
Also even more complex.
And I'm going to just probably compare the results in the next video all in the same video.
Also sample code.
All post.
If I forget, I plan to upload the models.
So the model for that I just showed you guys just the pure addition.
I'm gonna upload that model.
Shouldn't be too gigantic.
Let's see.
20 megabytes.
Yeah.
So I should I'll upload that one so you guys can come play around with it if you want, without having to train it yourselves.
Um and then these ones.
I'll probably upload if there's anything good about them.
But anyways, getting back into these.
Ah, so I've actually, this has this one's already started training.
And what I'm gonna do is I'm gonna show you what they look like now.
So model and they won't come into here.
Uh, no.
Censor board tensor board.
Lauder train.
Underscore.
Lord train Lorig.
Um perfect.
Okay, so this one is just simply one type of number followed by one type of the four plausible operators.
Followed by another number again Max 100,000.
And what I've got going on here is basically the graph of not only wth Ito Tal, So here this would be the total difference, right?
That total difference that I was showing you before where it started.
Office 400,000.
That got lower.
This is the total difference.
And then what I decided was probably smartest was to track the difference of each of these separately.
So the difference of the multiplication is right.
The multiplication.
They're gonna make much grander numbers than, say, division or subtraction, especially subtraction.
Um, implausibly division, like we can even see division is very.
It's already dropping into something reasonable.
Ah, whereas these air still you know Ah, at certain exponents.
So anyway, um so that's how this one's going.
I'm only 6000 steps in, so we still have quite a bit of training to go.
But that's how that models doing.
And now I'm gonna show you the most complex version and the most complex model that we've got to date.
Uh, and this one is basically the last step before I start mucking around with, um, with doing encryption.
So and by encryption, I mean both tryingto either break encryption, but also even just produce encryption.
Soto actually encrypt things and decrypt things with a neural network rather than some sort of strict rules.
Yes, that could go wrong very quickly.
Um, Anyway, um, it's just for fun.
So this is a far more complex model we can see.
The numbers are much, much higher.
What's the difference?
So let's bring up the difference.
So this one creates much, much more complex mathematical equations.
So let me pull up.
Um, are you lost?
Actually, I think I want to do new data, and then let's let we Well, we can check it out.
Um, but then again.
I'll probably post this code if I forget someone remind me because we're running through so many things right now.
Um, it's just that if if we were to type this all out by hand, um, one, I'm not sure there's any value to it.
Still, but also, it would just take, like, 10 videos before we could actually just get to the fun part.
Um, and this is really just more about having fun.
So test, too.
Let's open that one up.
So this is what the results that were attempting to draw, and I don't know if I can I can't zoom it with just a plus, But this is what we're coming from.
So these air, the equations that were generating only use Adam so I can zoom in.
Okay, So, um so as you can see these air far more complex, it's not just one number and another number.
It's many numbers.
So one it's gotta learn.
Pem das friend.
Uh and then it's also gonna have to learn.
So not just like the order of operations without parentheses.
But we've also thrown in parentheses to the equations.
And as you can see, some of these equations get quite long.
So let me, Elsa, zoom in a little bit cool.
And so there's a variable number of numbers is a variable number of operators and types of operators.
And then we've got parentheses and all that.
So the question is, can the model learn this something like this?
It's far, far more complex now, from what I can tell again, we until we start seeing like we do see some improvement here until I start seeing total accuracy doing something like right now we're missing everything.
Until I start seeing accuracy actually ticking up, I'm still a little skeptical that it's not just brute force.
And it's not just finding similar examples.
So we really need to see all of the above we need to see totals dropping like the totals of the multiplication Sze dropping that these are all differences, by the way.
So the difference of output versus desired output the absolute difference dropping multiplication dropping ads is dropping.
Subs is dropping.
Seems to be levelling out, though unfortunately, um, but anyway, um, this is still very hasn't We haven't even dropped learning right yet.
We're still at the first learning rate anyway.
um I want to see that happening.
Of course.
I want to see the learning rate continuing to drop down.
Let me see if I pull it up.
I think it was pride on another graph, actually.
Not learning, right.
I'm sorry.
Loss rate.
Where is Los?
I'm lost.
It must be on the second page of strays.
Next page, please.
There's our loss.
Oh, supermassive smooth at that.
Out.
Okay, so actually lost is doing quite well.
Smoothed out.
Anyways, it is dropping still.
So anyway, uh, these are the two things you guys ever look forward to.
Two different models are being trained simultaneously, and we will check the results on both of them.
Uh, in the next video.
This video's clearly getting way too long.
So anyways, I'm gonna cut it here if you have questions, comments, concerns.
If I forgot to put up one of these models were missing some some code or whatever.
You want to see the script, and it's not in the text based version of editorial.
Let me know.
I'll put it there.
If you've got questions, comments concerns, suggestions for improvement.
You feel the desire to support head to python programming dot net slash support.
Otherwise I'll see you in the next.
Tutorials were Hopefully we have a model that does these calculations, and then we can go to encryption taking over the world, that sort of thing.
See you next time.