Placeholder Image

字幕列表 影片播放

  • What's going on?

  • Everybody welcome to part three of our how to make a friend.

  • You're Siri's In the last tutorial, basically just kind of created our table Saurus set the scene for what we're expecting to be doing.

  • And now in this tutorial, what we're gonna do is start actually iterating through that file, start cleaning up the data at least a little bit.

  • So with that, let's go ahead and get started.

  • So the code that we're gonna be writing here, at least for a little bit, will be here.

  • So I'm just trying make some space here.

  • So now we're gonna do is I'm gonna go ahead, start a counter for road counter and then I also just out of curiosity paired Rose.

  • So both of these things are gonna be these counters row counter is just gonna tell us, kind of.

  • How many rows have we gone through as we're iterating through these files with like, 50 to 80 plus 1,000,000 rose and then paired?

  • Rose tells us how many parent and child um pairs have we come up with Basically because a lot of times we're prying, not like a lot of comments will go without a reply because they don't need a reply or they're never seen or whatever.

  • So anyways, we've got that, uh, Now we're gonna do is with open.

  • And now we want to open one of those files.

  • Now, wherever you've stored your files is probably different from where I've stored the file.

  • So, um, mine are in J colon chat data slash reddit data and then one of the files, for example, like, let's say we're into 2015.

  • So we'll say that's about all they're already even knows.

  • Okay, 20.

  • Look a Iittle.

  • See, people decide all for, you know, being not special or whatever.

  • And look at this high quality anyway.

  • Hasn't finally, um r c and that will be underscore.

  • And then again, some sort of variable.

  • So then that will do is format.

  • Um, and then format is this will be timeframe dot split, and we want to split by.

  • Was it a dasher?

  • Underscore?

  • Er, that will be Let's see.

  • It should be well for the time frame.

  • We're going to need to split.

  • Uh oh.

  • You know what?

  • No, no, It will be time frame.

  • Dash zero.

  • There we go.

  • Attention, everybody.

  • No Thio split zero and then finally r c underscore er and then it will be the full time for him.

  • So, uh so So we wanna open this file, right?

  • We want to open that file and then we're gonna buffer.

  • We'll just say, uh and then as f and then we're ready to actually store iterating through F.

  • So, for example, what we could say is four row in F and then what we can say here is First of all, let's go ahead, Roe Counter plus equals one on.

  • And then what we're gonna do is ro equals jayson dot load string row.

  • And then we can say a parent i d parent I d equals row parents i d.

  • And then, uh, the body equals So the body's probably have some issues.

  • So we're gonna do it for Matt underscore data so we can create a new function for this, and then it'll be row body.

  • We need something to kind of sanitize and clean up that data.

  • So that'll be body, uh, and then created you TC b ro created you to see and then score row.

  • Whatever the score waas and finally subreddit will equal Uh oh.

  • Subroutine.

  • Okay, so now we want to go ahead and do is we're gonna come up here, and we need to go ahead and create that format data function.

  • So I'm just going to define format data.

  • It's gonna take in data, and basically, we're going to his data.

  • Evils datadata replace, uh, place and are at this stage, we really just trying to replace a few key things.

  • Um, the first thing is new lines.

  • So we want to get rid of any new lines.

  • Some because they did not replace a new line character.

  • Um, we're gonna replace that straight up with, um I'm just gonna put spaces around it.

  • New line character.

  • Okay, uh, that way first wallet doesn't get a pendant so often times at the very end of something, you get a new line, characters.

  • It would be touching.

  • And when we go to actually token eyes this information, um, that would be, like token eyes together, and we definitely don't want that.

  • We also simultaneously don't really want a token.

  • I like if we if we token eyes this correctly, we're gonna token eyes, most likely a backslash and as separate entities not as one single entity.

  • That means hey, basically, when we token eyes, we token eyes, entire words.

  • In theory, you can actually token eyes, chunks of words and stuff like that, Like syllables basically.

  • But we're not gonna do that either.

  • So basically, long story short, we want to make sure that new line characters stay together.

  • But also, we can't wait.

  • We're just gonna make up a word.

  • Basically.

  • Hopefully no one on Reddit actually uses new line care.

  • But anyway, that's what we're doing there.

  • And then we're gonna do the exact same thing with return.

  • I'm pretty sure read it will combine.

  • It's like slash or a slash and er or something like that.

  • So we're gonna make sure that gets replaced.

  • And then finally, I'm just replace one more thing, and that's going to be the double quote words gonna say all double quotes are actually single quotes just to kind of normalized the data there.

  • So because there's really no reason for the A I to think, um, to think that there's two there's no release mean, mean the same thing.

  • So anyway, we're gonna we're gonna do that.

  • So, uh, so that's how we're going to format data.

  • So then we just returned data there.

  • And, um, now what?

  • I want to go ahead and do is price right here.

  • I'd still like to do like, we'd like to do something like this.

  • Like a parent parent data equals, find parents parent I d.

  • So we might.

  • There's gonna be times where we were gonna Maybe you want to find information from that actual parents comment.

  • So if well, let's is right now.

  • So define find underscore parents.

  • We're gonna find the parent by the parent i d.

  • And then what we're gonna dio is we're just gonna say es que o equals select comments from parent reply.

  • This is the name of the table.

  • So if you named it something different cities that where comments i d So, actually, Lissy parent, I d I d.

  • Oh, In this case, never one we'll talk.

  • We'll private type of this more in the next the next one, but anyways, paranoid He's gonna get passed into here, and then basically, we're looking for anywhere where the comment i d is the parent.

  • So this is how we're gonna actually find the initial parent comment that belongs to, Uh, that's so, for example, every comment.

  • The string has a parent, I d but it doesn't have the parents text, right?

  • The parent body is not there.

  • Right?

  • So when we insert this comment into our database, we actually were gonna want the parent body.

  • So this is how we're gonna do that.

  • So anyways, select the comment from parent reply where the comment I d.

  • Is equal to that new comments, parent, I d.

  • Okay, so where comment I d equals.

  • And then, um let's do this.

  • Should be I think we're gonna need some single quotes around that, and then we'll say limits one.

  • We're hoping that doesn't get violated, but anyways, format parity.

  • So then why won't you let me do this?

  • Thank you, sir.

  • Now we're gonna go and execute.

  • That's a sea that executes ranks.

  • Keep That s Q Oh, the results will be si dot Fetch one if result Does not.

  • Does not equal none.

  • Then were gonna return results.

  • Even that's fetch one.

  • We still need to say that zero with and then, um oh, actually, I'm sorry.

  • So you actually wouldn't I'm pretty sure.

  • Zero.

  • That's just because we're only selecting one comment there.

  • So normally you might say comments, uh, score or something like that.

  • And so it's always gonna return a list.

  • So actually, fetch one normally would be a list of bliss.

  • In this case, it's just a list, but only happens to have one element anyway, Anyways, return result.

  • Zero uh, and then else let's just return.

  • False.

  • You could make that a new line if you want.

  • I'm not gonna do it.

  • If, um and then just in case we do something wrong, let's just try except exception as e print, Fine parent.

  • Fabulous.

  • And if we hit an exception, let's Let's, uh, let's just return falls.

  • I'm gonna come in this out, actually, for now, if we're having issue, I'll let that that print out Anyway, I think this is a good spot to stop.

  • We've done a lot of coding here, so, um, obviously we still have a little bit of ways to go.

  • We still need Thio actually insert this data into the database and all that.

  • Um, but then we also we have a few more constraints on the data that we're gonna put in, like is is the is the string Maybe too long.

  • Is it empty?

  • Also, there's all kinds of things.

  • Like sometimes sometimes comments get deleted so they might have been deleted before we actually get to them.

  • So it'll have, like, deleted or removed in, like, brackets and stuff.

  • So we don't really want those, either.

  • So anyways, uh, we still have quite a few things to build out.

  • Also, we might want to add in some logic about like, under what constraints?

  • What do we want to insert something.

  • So I'm going to score.

  • For example, I'm gonna only really want comments that have scores that at least one person voted all right, because there's a lot of comments that are pretty much useless.

  • So anyways, that's it for now.

  • Questions, comments, concerns, whatever.

  • You're free to leave them below.

  • Otherwise I will see you in the next tutorial.

What's going on?

字幕與單字

影片操作 你可以在這邊進行「影片」的調整,以及「字幕」的顯示

A2 初級

緩衝數據集--用深度學習、Python和TensorFlow創建哈拉機器人 p.3 (Buffering dataset - Creating a Chatbot with Deep Learning, Python, and TensorFlow p.3)

  • 0 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字