字幕列表 影片播放
What's going on, buddy?
Welcome to part five of our chap, but with python in Tensorflow tutorial, Siri's in this tutorial.
What we're gonna be doing is hopefully actually putting things into our database.
So, uh, let's go ahead and get started.
So basically, where we left off was here, and we wrote our final bit of code was finding out whether or not the data was acceptable in the data.
Here, in this case, is going to be the body text.
Basically the text of the comment Should we even consider it?
So, um, initially, I've always actually put this right before all in searches, because that's like the order of operations in my head, but actually score a super cheap, so we might as well do that.
So if the score is greater than two, Sure.
But then what we're gonna do is I say, if acceptable, uh, body, then let's do these other operations.
I think that's a better way to do it rather than go through all of these operations and then ask, uh, toe, look for business.
Go through all of this only to find that the comment isn't acceptable.
Although most comments are probably more than acceptable, so I don't know, not really sure what the best way to go about it is to be honest.
Anyway, we'll do that.
It should be pretty cheap to do it anyway, So that's all right.
And really after business a at after the score is high enough.
We're pretty.
We're gonna insert the question is, are we gonna insert this as a new row where we're gonna insert this azan update?
So no matter what we're gonna insert, no matter what that means, we're gonna ask this question.
So I suppose this could save us in theory, a little bit of process and living long.
Um, well, I guess in this case, it would like if this was not true.
Whatever.
Anyways, um, the show must go on.
So if the score is greater, then the score and the comment itself happens to be acceptable.
Then what do we want to actually D'oh.
Well, if this is all the case, then we're gonna go ahead.
And, um, we would like to do a SQL insert.
Replace rule, please.
Comment.
Okay.
And for now, I'm just gonna leave that there.
Um, I'm just gonna put this year so I could quickly find it.
Now, if there isn't an existing comment score, what we're gonna want to do is come down here and we're going to say else.
Um and we've already checked if it's acceptable, so then we're gonna ask if parent data, Um, so if there is a parent that we have data for, then what we want to do is SQL insert, uh, has parent and then and then otherwise else SQL insert no periods.
Okay, so that's what we were basically of these three different functions that we're gonna work with now.
Chances are they're probably better way to do it from, but that's what I'm gonna do.
So, um, so now we're gonna pass the information toothy.
So basically, if we're gonna be replacing something, that means we're gonna take comment.
I d We're gonna take a parent.
I d We're probably gonna take the parent, uh, data because we got it the body.
So it's gonna be the new comments.
This will be parent.
This will be the reply.
Um, and then subreddit created you TC and then the score.
So that's if we do have, um, we have an existing comment already in place, and we're gonna update it because our new score is higher.
Now, if we do have that parent in our database so we have information on that, we want to go ahead and insert with that that information that we do have.
So in this case, it's gonna be again comment I d.
And in fact, trying to think, if that would be like it should be ever should be the exact same.
I'm pretty sure take the body.
It should be exactly the same.
Data is here because we have all that information.
It's really only this one here with no parents, that we don't have anything that no parent body information to throw in.
So this one would be comment I d We do have a parent idea because everything has apparent idea.
If it's a top level comment on it has no parent comment.
The actual parent is the threat itself, though the reddit threat.
So anyway, parent, I D.
But we don't have parent data, but we do have body subreddit created you TC and sport.
So why would we insert these if we don't have a parent information to go with them?
Well, because this comment might still be some other comments parent that we want to get the data on.
So that's why we actually still want way still want to store that information?
Um, yeah.
So now what we have to do is actually build all three of these, um, inserts.
So they're all three pretty similar.
Uh, part of me wants to build them all.
I guess we'll just build them all together.
Yeah, let's just, uh, just throw it down here.
Like I said, there's probably a better way to do this than create them like this, but I'm gonna go this way.
So defined SQL insert place comments.
Um, we've got common I d We've got parent I d.
We've got parent comment, subreddit time and score.
And then again, in this case, let's say, um just in case we hit an issue, we'll try except exception as e.
On and then in this case will say print, replace comments, and then whatever e waas look, you know, if you have to string, you're not gonna go and pass that the I don't think we have any other ease just beside the stuff that we commented out I've been playing and go too much recently.
I can't remember if you have to throw a straight or not.
Anyways, um, yes.
So we want to have that information there.
So now we're gonna say SQL equals and again, we'll just use reports here.
And then we're gonna say update updates a parent reply set.
Undecided.
If I really want to write all this out or if I just want to post it And then I think I'm just gonna copy and paste this.
I'm not sure what gain we're gonna have by writing out all these queries.
Um, I'm gonna copy impatient, so I'll put a link in the description to the text based versions tutorial for Forget someone, remind me.
But if even if I do forget, it'll be live on python programming detonate, so you should be able to find it.
Um, yeah.
I just don't see any benefit to writing all this out.
So anyway, here we have the three functions.
So basically, what's happening is in this case, uh, first I want to update this.
This should be, uh, yes, I don't know, update.
And then we'll call this one parent.
No parents.
Okay?
So basically, what this is gonna do is if, because it has, it's just gonna overwrite.
Basically.
So what we want to do is we want to overwrite, um, all this information, basically, where the parent i d.
Was whatever that comments parent, I d was cause So basically, any time we've got that parent, I d any reply to that parent comment?
We want to make sure that's the new comment that has a better score on then.
SQL has Painter insert has parent.
Basically, what this one's doing is just Ah, we're just inserting where there was apparent I d or basically what we're saying.
We're inserting a new row right where we have the paranoid.
But we also happen to have the data for that parent.
So we're inserting information about that parent body basically, and then this one were inserting.
There was no parent, But we wanna have the paranoid He just in case somehow maybe it was out of order, but also mainly were inserting this one.
So we have parent information for another comment whose parent might be this comment.
Okay, Um, yeah, we just saved pride 15 minutes doing doing it that way so anyways, But if you have any questions or whatever, you can feel free to ask.
But it's all pretty simple.
SQL queries there.
So now the last thing I want us to go ahead and do is so we can actually press go on this script to make sure it works is defined the transaction builder.
So, up to this point, um, we've been, um we've been making you know, we've been building these queries, but as you can see here, we're using Transaction Builder and passing the SQL.
So now we're gonna do is add a final little helper function and that's going to be defined.
True.
Not in all caps, though.
And in fact, let's just copy that Define transaction builder.
It takes in SQL statements and then it there's something is gonna piss some people off and it's going global.
The SQL transaction says global global ing the this variable here.
So we're going global that that way we can because we said we'd be in stuff things into the SQL transaction, but eventually we want actually cleared out.
So we're gonna global for that reason.
So now we want to dio is, um Well, come down here.
And so we're gonna take in some sq on.
Basically, what we're gonna do is we're just gonna keep building this transaction until it's over a certain size.
Um, So what we're gonna do is we're just going to say SQL uh, transaction dot of hand, the SQL statement.
So we just keep depending these SQL statements to the transaction.
And then there's will say if the length of the sq alot transaction is greater than 1000 you could choose different numbers of thousands.
Not going to be all that much slower than 10,000 by the 1000 is gonna be a whole hell of a lot faster than one or 10 or something like that.
Um, anyway, si dot executes.
And then to do this Thio insert like a bulk statement you need Thio, begin trance transaction.
Here we go.
Um, so So we start to transact transaction that way, and then we're just going to say four s in sq.
Oh, transaction.
So for each of those little SQL statements, what do we want to do?
Well, we're gonna try to see dot Execute?
Yes.
Otherwise, we're just gonna accept and, um, commit unholy sin of past, and then we're all done.
Once we've said execute all this stuff we're gonna run to commit, you also could execute, commit.
But I'm gonna go and just connection not commit, because there's a method for us.
And then once we've done all that, what we want to do is SQL Transaction equals nothing.
We want to empty it out.
Who?
Okay, um, looks good.
Very good.
Eso Now we should be ready.
Let's go and say that we should be ready to actually run this code.
Um, and see what what we have.
So let me see if deposits and then I'll pull up where the database should be.
All right, If I did everything right, which I'm sure I didn't, um this database here should start to be populated, so let's go ahead.
I'm just gonna press that five to run this, um you know, we've done everything else that we really wanted to D'oh.
Um, I guess one thing I want to add before we get too deep because I just I'm gonna not run this whole thing on video.
I just really want to make sure it works.
Um, is basically in this four loop will just come down here.
I think that's right.
Should be.
Ah, what?
Two one tab to tab three tabs over 123 Then what were you say is if roe counter module 0 100 1000 equal zero.
So for every 100,000 Rose was going to print, um, total rose red peered rose time, and then we'll go ahead and format.
Uh, the row counter.
Pierre grows Pettigrew's.
We also didn't do the code for the pair grows anyway.
Well, at that moment and then string daytime dot now So apparent road is basically the only time.
You know, if we're updating an existing comment, um, that's not a new pair, so we don't really need to do it.
There also.
Are we OK?
We are in committing the row counter.
Um, anyway, if there is parent data, um, that means we're inserting, and this will be the first reply we've got.
So really, it's it's after this one that we definitely want to say paired Rose plus equals one.
Uh, but then here, when there's no parent, that's not a pair.
So we don't really need anything else there either.
So we really just need to throw it in that 11 Okay, let's run this and see what?
Ares?
We've got an invalid syntax.
Not a surprise.
Uh, here.
That's not something we can say.
So he could be a double equal assignment versus comparison.
Let's go again.
Do we?
Wow.
I mean, the same exact mistake.
Okay.
Comment.
I d is not defined.
So I got this air Here s cable.
Insert parent comment.
I d Did we not do a underscore?
Possibly.
Oh, we just simply never defined the comment.
I d, um Interesting.
So let's just say comment.
Underscore i d equals ro.
And this one was kind of funny.
I think it was road name.
So let's try this again.
Who's, uh And it does appear that the database is growing in size.
I think I'll just wait until the very 1st 100,000 rose.
Um, I forget what it's called as it s sorry, viewer, I forget the name of this is called for viewing the database anyway.
Okay, So for the 1st 100,000 Rose, we've paired 3220 comments.
That sounds about right.
Especially because those first batch is not gonna have too many parents for sure, historically.
But as we continue to build, we should get a few more pairs per 100,000 Rose.
Uh, anyways, I'm gonna go ahead and stop this and make sure our database looks appropriate.
Open that up.
Okay, uh, let's browse the data.
Okay?
That's a correct comment.
I d That's good.
So, as you can see here, um, this is just our database.
Basically, um, handheld.
What's a Vape, huh?
Why not reduce the volume of the tank night here?
This one should make a little more sense.
Hot.
Have you seen her pick?
She's supposedly 31.
Probably a meth head.
Uh huh.
Like a red.
Okay, um, don't tell me when to sleep.
You're not real.
Okay.
Anyway, clearly this is working subreddit.
Okay, uh, and then score and all that so, Sure enough, all of these scores are higher than one.
Uh, anyway, so we could keep running that keep building a massive database of parent reply information.
Basically, what you should do is at least run that for the entire 2050 May 2015 data set.
But if you really want to have a good chat by a private and continue to build that even even larger than that, for example, that the current chat, but was built off, like 20 million pairs.
So So quite a bit larger of a day set.
Then what?
You're going to get off just this one, Um, this one dump the newer stuff that we're the other model that I'm gonna be working on those might require a little less data.
I really don't know at this point, but we'll find out in time anyways, go ahead and build out.
Like I said, I do at least May 2015 if you can, Um or if that single dump I can't remember what that single dump was.
It was sometime in 2015.
I can't remember it was January or may just do the at least one entire month in 2015.
Let's say anyway, if you have questions, comments, concerns if I made a mistake, which it probably did somewhere below Otherwise I will see you guys in the next tutorial, which is where we start talking about models and all that fun stuff.
How we can actually take this training day that we've been building, um, and convert it into training data is at this point, we're just kind of pairing things together.
We don't really have a perfect train set, so the next thing we have to do is actually build the training data and then stuff it through some model somewhere.
So that's what you guys have to look forward to again.
Questions, comments, concerns, whatever.
Feel free to leave, it'll blow.
Otherwise I'll see you in another tutorial.