與我們的哈拉機器人互動--用深度學習、Python和TensorFlow創建哈拉機器人 p.9 (Interacting with our Chatbot - Creating a Chatbot with Deep Learning, Python, and TensorFlow p.9)

字幕列表影片播放

What's going on?
Everybody welcome apart nine of our chat bought with Python in Tensorflow tutorial.
Siri's in this tutorial.
What we're really talking about is how you can interact with your chatbots, so there's quite a few ways for us to interact in a few different reasons why we might be interacting with her chatbots.
So let's go ahead and kind of cover all of those ways.
So first of all, the first way you're going, you're active there, chap butter, that you're going to kind of see how your chat but is doing is obviously, I guess, in stats intense or bored or something like that.
But generally, I mean, our goal here is to create a chap, but so we actually care most about the output of our chatbots.
The first way you're seeing output for your check pot is in the console as its training.
Every 1000 steps it's gonna show you like source reference and then an empty sources where it's, you know, the actual input, text references, the training, or that I'm sorry that testing output on and then the nnt is your chatbots actual response.
That is one way.
But that's only one at a time.
You really kind of want to see a lot of output.
T really gauge how your chapel it's actually doing, like, one outputs just not enough.
So that's kind of like the first way you're gonna interact with your chap out, in my opinion, or at least see higher chat.
But it's doing so I'm gonna open up the MMT model directory and head into the model directory.
So inside here is where everything goes.
This is where all your checkpoint files go and all that.
So just in case I forget to mention it later like, let's say you want to go into production, Um, and you want to make a move this chap out somewhere else, all you need to run this model live is the N M T code, right?
You need to have you prime.
Wanna bring h prams if you don't have the same exact settings file.
But you could go either way too, actually.
Um, but anyway, you're gonna want Checkpoint, probably H Paramus, and then the three files that correspond to what's which checkpoint file you want.
So in this case is would be Step one thousand's checkpoint s O you want all three, And then you'd also need to modify the checkpoint file to contain the number.
Right?
So if we actually wanted 1000 we would need to change this 2000.
Okay, Um and you might as well change all of these.
Or you could just remove them.
Basically, normally, this would be like for 96 45 44 so on, all the way down.
But I just threw in for 96 because I was actually kind of happy with model for 96.
So just copy and paste it into all of them.
Anyway, the first form of output is gonna come in these output depth.
So Output Dev is the result of your chap pot.
Every 5000 steps is going to take your test 2013 dot from file.
That's located in in your data directory.
So here.
Okay.
Uh, right there.
So then perhaps I tried to hit the back, but on my mouse and I'm in paper space right now, so it just actually did my browser.
Not here anyways.
Uh huh.
As I was saying so Yeah.
So your 1st 4 outputs gonna be that output, Dev and test, but output will look at real quick.
But anyways, um, so this is output from our actual model.
So you can see some examples really quickly of how our models doing right now, So that's pretty cool.
And you can either manually compare, you know, the input output.
Or I just wrote a simple kind of pairing script real quick, So I was gonna drag this over.
Um, and this is it.
So the output file output Dev the testing script bubble block.
So give the full paths to wherever these air located.
Then you can run this, and that is gonna output line by line.
And it's all just gonna look like this so you can more clearly see Hey, here's the input.
Here's the output.
There's really nothing interesting here, so I'm just don't see a point for me to, like, write this out on tutorial video for you guys.
You can go to the text based version of suit Auriol.
It's all there, so Yeah, so now we're gonna talk about the inference script s I'm gonna close this out, So that's one way.
You know, you can sort of interactive.
Really?
See how your chap what's doing in Mass?
But then there's gonna be a time.
We're like, Say you like the way that's looking.
You might want to start interacting with your chap on.
And chances are you're gonna come up with some questions that you tend to ask for.
You tend to see Oh, these are problematic questions.
You wanna ask them from time to time to see how you chat bots doing?
Um, if it's the same questions you're gonna ask every chap, though, what you should do is create is just add them to your test your test file.
You'll have needed to have done that before.
You did prepare data on all that, but just keep that in mind, like if if you're gonna keep doing the same stuff, you could just throw it into the test.
In that way, output of every 5000 steps is gonna do that for you.
But chances are you're gonna come up with new stuff or you're going to see new ways that your chat bott is kind of weak, and then each one is gonna be a little different.
Some chatbots liketo uncle over the place.
Some chat bots like toe not finished their thoughts.
Some of my chat bots have liked t give way too many links, tow things that don't really need links, stuff like that.
Each one has been a little different.
So, um so yeah.
Anyway, uh, the next way is with inference so inference that pie and things are highly likely to change.
So I'm just gonna throw that caveat out there, right this moment if I go to the get, Huh?
Uh, yeah, this is what it looks like.
And in here you have.
This is kind of the way things are right now.
And then there's Sendek Slab.
And in there, I've got a few things that have changed, including a modern inference, a modest bulk inference and then some scoring information.
Now I'm going to go over all these in a moment.
But I just want to say that as time goes on, chances are the scoring ends, the modifications to inference and picking the right one.
And again, I'll explain that a little bit a moment.
That's probably going to be implemented in a much better way by Daniel later.
What's Yeah, this guy, Anyway, I meant to just go to here in the in the main project directory.
So chances are if it's enough, time has gone on.
There's probably a better way than using my coat.
So just I'm just going to say that for now.
Uh, anyway, the default one is just inference that pie that's the one that doesn't do any scoring or anything.
So first, let's go ahead and run that one.
So I'm gonna go ahead and open up a terminal change directory and a desktop and empty chat by And let's see if I can't make this big good, cool.
So I'll run Python train, not pie, And this is gonna open up based on the checkpoint director.
The checkpoint files.
So again, if you wanted to test different checkpoint, you could go into model.
You could just edit this checkpoint filed to contain any checkpoint you want, and that will be the model it loads.
Okay, so just keep that in mind if you want to check a different one, and I highly suggest you do because even though let's say for 96 4 let's say let's say, for 96,000 compared to 4 95,000 might be it of surprisingly different jet, but front.
It might be significantly different.
It might be a lot better.
And then suddenly it for 97,000.
It's bad again.
You're not happy with it.
So I strongly suggest you check different ones.
Also, it's not always gonna be the case that the one that was trained on more data is necessarily better.
So, for example, I'm comparing this chap.
This is one that I created with 70 million pairs compared to the other one currently running on Twitter right now is straight trained on only about three million pairs.
And I honestly think the one that's on Twitter right now is better than this one.
So So that just goes to show you that, you know, it's not necessarily that the one that's been trained Maura's gonna is gonna be better.
So anyway, and there's a lot of other considerations to go into that, But just keep that in mind.
You want to test a few of them, So, uh, yeah.
Anyway, um, coming down here, what have I done?
Did I run?
Trained up?
I I must have ran trained up.
I'll break this.
I'm so used to doing python train up.
I am positive.
That's what I did.
Let's grow up and see.
That was dumb.
I sure did.
What?
Idiot?
Anyway, let's see if it's not gonna break until this operation is done.
So I'm just gonna start over, open in terminal, make a bigger change directory into desktop and anti python inference.
Stop, I, anyway.
Cool.
So now it's going to start the inference, and then it starts this interactive mode.
First response will take a walk that's gonna load everything.
But, hey, they're comma.
How you doing?
Okay, so this one, like I said, this one's gonna take a little longer.
And, like the program says, the other ones will usually come a little quicker.
Um, okay, so here you can see all of the outputs.
There's quite a few here.
What's going on?
Well, reusing beam search.
So that was one of the parameters and the hyper programs.
And this is one of the benefits of doing it is that you could get multiple outputs.
Now the default is 10.
If you want to change that, you can just in the model directory come to H Paramus.
And what you're gonna look for is both, uh, being with, I think, is what it was.
Yeah, being width, which is probably set to 10 right now.
I said it to 30 for the production model, and then it's like Nome translations per input.
Yeah.
Yeah.
There it is.
30 again.
A pretty sure.
The default be 10.
But you could make that 30.
So anyways, you can change those two things on.
Then you'll get up to 30.
So now you can see, though.
Hey there.
How you doing?
You can see the first response was Hey.
Okay, um, which is a little worse than Hey, how you doing?
Right, or how is it going?
How was your day?
But really probably best one is to say, like I'm good or I'm good.
Thanks.
Right.
Um, yeah.
And you can see if you are.
You know, I'm good.
Thanks for asking.
You know, that kind of stuff.
And obviously a lot of these hes like how this one just has a random Hey, back to the regular Hayes anyway.
So So?
So this is a slightly modified inference over the default one that came with an anti.
So the default one would just return.
Like the number one and your chat by normally in training, you know, like or if you were to push it, it would be the first return.
Whatever it happens to be.
Now, this one has a a few different scoring mechanisms built in.
One is for punks.
There is no one here, so you don't get to see it.
But anything that has an unknown token has really not user friendly, so you could almost never want to return.
Really?
Honestly, I think you never want to return something that has an unknown token in it.
That just looks really weird.
But anyway, you can see there's a lot of them to choose from in the 1st 1 isn't actually the best one we could argue about Which of these other ones is the best, but it's probably gonna be, you know, one of these.
I'm good.
I'm good.
Thanks.
Um Hey, how's your day?
I suppose, could be in okay, one or whatever or just I'm good.
So enter In my added layer, on top of that has been the mod it.
Oh, yeah.
I don't really I guess I don't really want to close this.
The mod it inference Now um I just wanted to show this because what I did was I wanted this to go into actual production, so mine's not gonna return all of these.
It's only gonna return one response.
Um, so, um, trying to get there anything else I wanna bring up here?
So, basically, what's gonna do that's gonna read these responses from an empty it's gonna use some really rudimentary natural language processing.
Basically gonna look for instances where we don't end on a punctuation, a proper punctuation.
Like if we end on a quote, that's bad.
If we don't have a period, we're gonna we're gonna say something with the period is far superior.
Also, he had a propensity to, like, not finished the links.
And the links are like formatted like Reddit links.
So a lot of times, he just wouldn't finish and give that closing parentheses.
So I used a simple regular expression to look for brackets followed by opening princes, but never a closed one.
That's always gonna be a problem.
So anyway, so I'm looking for stuff like that, and we could I guess I could pull up the scoring mechanism, actually.
So in sin, Texas lab, if you do want to use this stuff.
Basically, you'll take all the stuff you'll take.
Comparisons Ma did.
He's too modest files in scoring, and you just you dragged them into the root directory.
If you want to use them again, though, as time goes on, I'm very confident that Daniel would probably write something better than what I've written.
But, uh, anyway, so you can look at the scoring, though it's just a bunch of functions, that kind of score, um, so bad responses.
So Charles V to like to link very frequently to, uh, this list of burn centers in the United States, and it just got really annoying.
So we stopped letting that happen, and then also, some of the responses would continue the beginning of a link but didn't finish.
So calling that a bad response anyway, continuing on messed up link, this is what I was talking about.
It doesn't close a link.
Um, this is just a quick function to remove punctuation for later evaluation.
Stuff ends in equals.
Yeah, basically, a lot of links were ending an equal.
So be like youtube dot com, and then the V equals, and then we just stop.
Well, I don't really want that.
So if the thing ended in an equals, we lower the score hunk checker.
If there's any unknown token removed for from the score, if the answer.
So a lot of times he would just repeat the question which we've been song that output already.
If he does that removed from the score, um, ends and punctuation good, we add to score.
If he's just echoing, if he's very similar to the input question, um, we want to penalize and more stuff on Answer.
Echo Little Mobile.
Anyway, Eventually we get to the point where we return scorer, and then we're gonna score basically all the outputs and then go with the best score.
If there are a bunch of them that have, let's say that's the best score.
Is six.
Okay, let's say eso Let's say there's five answers that have a score of six.
We just randomly pick one.
Okay, so that is scoring.
And then now we'll do My ma did inference.
So, for example, let me just do python.
Ma did inference and hey there, comma.
How you doing?
I think it was that hopefully he doesn't respond.
Hey, Um, we'll see How's it going?
So unfortunately responded with a relatively similar question.
But obviously, how's it going to mean a lot of people when you say, What's up?
They say, What's up?
Backer?
Hey, how you doing?
Hey, gun like they pretty much just respond.
They never actually say how they're doing, so that might not be totally in balance.
Uh, how many that's do you?
All of it?
Well, that's a lot of cats.
That's a lot of cats to let's see if he responds to this, I can do that.
Okay.
It's gonna be empty.
We're going in here.
No, if you don't have any input, is gonna get pretty angry anyway.
Wow, that's a lot of that's Just try it again.
See if it gives the exact same input we've rude.
That's a lot of cat.
Um, are you better than version two on Twitter?
He's not.
I'm not sure if I'm better than two on Twitter anyway.
Um, yeah.
So that is my ma did kind of where it picks one decent answer.
But I'm sure if we were to kind of compare these like, for example, like I could break out of here Uh And then let's ask him how many cats does he own?
So let's just go back to inference up high.
And let's say, uh, how many cats do you were?
Rude.
Okay, So as you can see, there's a lot of answers to this, right?
He just repeats the question.
Is the 1st 1 all of them?
How many do you own?
I don't own a cat.
So I don't know another question of how many he owns or Hominy the questioner owns.
And then here's one where he doesn't finish his thought.
I don't think I've ever owned a cat, but right, um, and then weird, binary like answers and so on.
So anyway, as you can see, I'm sure even if Daniel doesn't come up with better scoring mechanisms, I'm sure other people will come up with better scoring.
Also, you know, each of the scores is fairly arbitrary as far as the number I'm subtracting or adding, if things were good, this is really kind of catered to make Charles V.
Two good.
I imagine that if I go through with a V three model, I think that probably will have to tweak it again to get the results I'm after anyway, Um, I think that's all for now, there's just really a lot of trial and error and research and development that's going on at this stage for me.
Like I said, I trained a model with 70 million pairs.
I did a full park on that data, and I just really wasn't all that impressed.
I even went back a little further to see if if I could get things to be a little better the second time around.
Um, and I just wasn't impressed.
So and that was with a 10 24 by six bi directional model with about 70,000 vocab.
If I recall, right, I just didn't like it.
So now I'm doing is going back to the 5 12 by two bidirectional, which is what Charles V two on Twitter is right now, which obviously is gonna change in time.
Um, but now I'm doing a 5 12 by two with 500,000 vocab on 24 gigabytes of the ram machine.
A shrink, uh, the batch size 2 32 which right now we don't have the option in the settings.
At least I don't think so.
In fact, I really should just add it right now, um, because it should be there, because that's one way you can keep a larger model.
But not, uh, you know, run out of the Rams.
Okay, so that is there.
It's just commented out.
So someone else must add.
I guess that's the less.
Okay.
So he added that inner someone did it.
Anyways, um, so the default is 1 28 But a good idea would be if you're running out of memory.
Uh, I'm doing 32 I'm just barely fitting it into that 24 gigs of the Rams.
So anyway, I'll see how good that model is.
Um, I have high hopes, but who knows?
But a lot of this is going to trial on air, and a lot of people were suggesting, Hey, you should do like Rube a spot above the law.
And I mean, to an extent, we are mean.
I am applying rules to the output and pretty much immediately.
Everyone isn't.
I'm not going with the first choice, and then we're doing having various rules being applied to it.
You could come up with other things like, for example, if you're hitting unknown tokens.
Well, that would be a really good time to use a Markov chain, right?
So you could figure out Okay, What's the best word to throw in here?
You could throw in a Markov chain or something like that to find just to fill any punk kind of spot.
So, um, yeah, there's definitely a lot of things that we can d'oh changes that we could make going forward.
But this is private.
Be the last chat about tutorial.
Besides, you know, just some filler updates as time goes on as far as what I'm figuring out.
But for example, the last model took about a week, actually, a little more than a week.
I think there's about eight days toe do even one park.
So it's just these things are gonna take a long time.
And now the 500 k model, a 2 32 batch size is stepping about Justus fast, and so that's gonna probably take a month to do an entire block.
So I doubt I'm gonna do an entire block with that model, But yeah, that these things are just gonna is gonna take a long time.
So anyways, I think that's all for now.
If you have questions, comments concerns your finding, you've done something and you made some cool change that you want to share.
Leave it below.
You can also come join us on the get hub, make a pole request or even fork if you want.
Like I said, some people thought they had some good ideas.
And my answer is always gonna be the same.
Fork it or make it make a pole request.
And, uh, yeah, so contribute.
Um, anyways, that's all for now.
Uh, questions conversely below us, I'll see another video.