Placeholder Image

字幕列表 影片播放

  • what's going on?

  • Everybody And welcome to a creating a reddit bought tutorial in Python.

  • So what we're gonna do in specifically is creating a reddit bought that goes through and detects Reddit Spam and affiliate Link specifically in this case to you, Demi.

  • But we could very easily modify this to detective.

  • Quite a bit of spam.

  • Um, so overlooked the last year I've noticed this is crazy sharp incline in the amount of just you Demi affiliate link Spam coming in to read it.

  • And it's hard, I think, to catch because they're not necessarily directly linking tow you Demi.

  • They're kind of linking to some other tracker site that then links to you, Demi.

  • So a lot of times they link to a Twitter post that links to you Demi or links to a trucker site that links to you, Demi or they linked to a medium posed and so on.

  • Um, now that seems to be on the decline.

  • And now they're actually not even doing that extra step.

  • They're they're actually just linking direct to, um to the tracker site.

  • Then which goes to you, Demi and they're not even being kind of hidden about it most of time.

  • They're saying you Demi and all this kind of stuff.

  • So anyways, I thought about writing about a long time ago to do this and detect this kind of spam.

  • But there are just so many other things I want to spend my time.

  • I didn't really feel like doing it.

  • Uh, that was until a few days ago I found, um, some red.

  • It's spam.

  • Do you have to you Demi course that was actually a Pirated course of my course is a direct ripped from YouTube.

  • Even the author's bio on you, Demi, was a rip from h kingsley dot com.

  • So just a total Pirated.

  • Um, course.

  • And this was being spammed on reddit.

  • So, uh, why don't you know?

  • I found some time I found some motivation, wrote the body, So here it is.

  • I'm gonna run through it with you guys.

  • We're not gonna be writing it line by line.

  • I'm gonna kind of copy the method I did with the latest one of the latest tutorials.

  • If you want to learn more about working with prosthesis, thickly go to the project, Auriol.

  • Siri's like this is just gonna be apart for of that Siri's, um so if you want to learn more about interacting with the red objects and stuff like that, check that out.

  • This is really just a application, me just kind of showing you it's going to use.

  • So the first thing that we're gonna want to do obviously have a reddit accounts set the AP I again go through.

  • The process is if you don't know how to do that but basically making account, go to your account preferences app, create the app, make sure it's a script up.

  • Um, that's about it.

  • Really.

  • Um, fill out the form.

  • So once you've done that, you'll have your credentials client I d.

  • Secret password, user agent, user name.

  • I'm putting these into a prat Underscore creds dot pie.

  • This is a copy of that because the real one actually has my real values and I don't want to go to you.

  • So anyway, um and then I've got a to catch a press to catch a scammer dot pie script.

  • So with that, let's go ahead and get started.

  • The first things we need to do is import that Prock Reds, the client I d oh, by the way.

  • You don't have to write all this out.

  • Have a link of description to this tutorial.

  • Also, the code is on Get hub.

  • Um, so if you want to contribute or check out the code or whatever, it's it's up there.

  • I'll put a link in the description, but it's just ram spam.

  • Read it.

  • Spin detector bought.

  • Okay, so, um, once we've got that, the next thing that I want to do is basically we create the reddit object itself, passed through all the credential stuff and all that, and then we kind of have to figure out Okay, how are we gonna catch these people?

  • How are we gonna figure out who's a spammer who's not?

  • Because just because someone's posting a link that does go to you.

  • Demi doesn't necessarily mean that they're a spammer.

  • I mean, they're probably a spammer, but we can't just assume at that point that they're a spammer.

  • So what we actually need to do is probably go through all of the posts, but then let go through post that are linking to you Demi stuff.

  • But then also go to that user name and then check out that username or all of their posts to you Demi junk or is just some of it or, you know, whatever.

  • Because sometimes, like I thought it was probably course creators, just the course creator that create the course.

  • And then they went thio read it to kind of spam it out, But no, these air definitely massive spam rings.

  • So, um, I found so much like their huge, um I didn't realize how big this problem was until I really started looking into It's pretty.

  • It's pretty nuts.

  • Um, anyway, so what I've done here is I'll pull up.

  • Um So, for example, let me pull up my little bought guy.

  • Um, so this is my spam detector bought.

  • And basically, what he does is he just, You know, he just going to these threads and just post this message, basically.

  • So I get in this case, this guy, um, I don't even know this name.

  • It's just a spam name.

  • Um, we're just saying, Hey, you know, six a third of these 33 submissions from this user appear to be you Demi affiliate links.

  • And the reason we could know that is, if we actually go to this user account.

  • We just click on it and we can see, like, look at all of these courses.

  • Really 100% of these air courses, In my case, we're just looking for the term you Demi.

  • Um, but in time, I plan to expand that out because obviously, especially once somebody figures out that I'm looking for the term you Demi, they'll just remove the term you Demi.

  • So, um, so that'll that'll change for sure.

  • We'll come up with better words and stuff and, uh, ways to match courses.

  • There's lots of little things.

  • I'm not sure I'm gonna go over all of them.

  • I don't want to give away the tricks to the trade.

  • Um, but anyways, uh, yeah.

  • So it's pretty quick and easy to go to these people's page and figure out this is a spam bots or a spammer.

  • I'm actually not confident these air bots.

  • I think it might actually be human behind them.

  • Anyways, we're gonna move these aside now, so that's what we need to do.

  • We need to get to this user and look through his content.

  • So, um, so that we can do all of that with the python Reddit a P I rapper.

  • So what we're gonna do first is find a spa.

  • Just find spam by a user name.

  • So the function that I wrote for that or find spam and then return the user names.

  • So basically, what we're gonna do here is we're just gonna go through a search query and just log all of the author's right.

  • Um And then what we're gonna do here is let's just generate through that real quick just to show you an example of it running so paste, and then I'll run that.

  • So here we have just the list basically just ran through.

  • Um, you know, the posts.

  • And then here are the users.

  • So all of these people are suspicious, at least, But again, these are just people who have posted about you, Demi.

  • We don't know anything more about them.

  • So we really kind of want to dig a little deeper into these users toe really determine whether or not they're there spammer or they're just a regular person, um, sharing a link to their course.

  • So now when I go ahead and Dio is just overwrite this kind of loop here, and so instead what we're gonna do is we're gonna search for Query.

  • And right now, we're just gonna random choice for you Demi again.

  • Later.

  • I would add many, many more terms to that.

  • Because again, if we're just looking for you, Demi, if these spammers were to find out that that's the one thing that is the crux to this entire cut, just stop posting you, Demi the Just stop using that word because it's really not essential.

  • Um, so anyways, once we find that we're gonna have spam content, trashy users, and then basically smell the authors, we don't know if these authors air trashy, but they smell bad.

  • So we're gonna figure out, um, a little bit more about them.

  • So also were using random here, so let's go ahead.

  • And at this stage, we need thio import random.

  • Okay, So once we have these smelly authors, we want to start iterating through the smelly authors to figure out what their deal is.

  • So come down here, pay.

  • So I make sure we're not having a tab issue.

  • Good.

  • Um, and they make sure this is set right, And then So what we're gonna do is now we start iterating over those authors.

  • So for each author in smelly authors were going to say we're going to start counting.

  • How many trashy you or else do they have?

  • How many submissions did they make and how many of those submissions are dirty?

  • So then what we need wanna have is something that is, like, common spammy words.

  • So I'm gonna actually come up to the top here.

  • You just paste that in so common spammy Weren't worms you Demi course save coupon free discount?

  • As a matter of fact, really, A lot of these things, like for sure, course coupon discount.

  • Chances are, if someone has these things in there, it's probably spam.

  • But anyway, um and then if if the user continues, like again, if we just if we search for Spain, the words and then we only matched these spammy words.

  • If, like more than 50% of a user's content is always this junk, they're a spammer.

  • So So anyway, that's just again to point out, you know, the you Demi should not be the only word that you're going to use.

  • I'm just gonna do it for this tutorial.

  • I don't want to give away all the great words.

  • Um, but we'll see.

  • I'm gonna have it up on get hub for a little bit.

  • If people contribute, then we could probably leave it open source, and it won't be a big deal of people can say ahead of it, But this isn't exactly the most exciting task in the world, so I'm not really sure how many people are gonna help, but it's ah, it's kind of fun to combat these guys.

  • Anyway, um, next thing that we're gonna dio is, uh, run.

  • Basically want to run through their submission.

  • So we found the author.

  • Now I want iterated over that author's submission, so we want to visit their profile.

  • So I'm gonna paste in that chunk of code the museum at a little bit there, so it all fits.

  • So we want to throw a trying except into here, mostly because the Post might have been made by that author.

  • But the's authors, air spam authors, they're they're going to eventually be removed from reddit.

  • So when that happens, sometimes these users will return error codes if you try to access them.

  • Otherwise, what we're gonna do is, um basically, we're gonna say it's for submission in that red eaters and then string author because at this point, author is actually an object.

  • It's an author, object from the reddit instance.

  • So from whatever that authors, user name is their submissions ordered by new, we're gonna start iterating over them.

  • So we're going to see where that submission links to.

  • At the moment, I don't actually think I'm using this, but this is another great thing to track to figure out if someone's a spammer right now, they're using the same kind of websites.

  • So one you could figure out there a spammer, but to you could also link them to other spammers.

  • And this is what I started to do to identify the networks themselves.

  • So the people that are spanning two the same kind of trackers, they're pretty much all connected and given the fact that they all have very similar usernames, um, I'm gonna go ahead and say it's probably the same group that's doing this, but anyways, moving right along I'm not actually using that right now.

  • I don't think, but it's still something good to track.

  • Otherwise, we take the I.

  • D.

  • S.

  • O.

  • We can quickly get to it.

  • Um, actually, I'm not even sure we're tracking that anymore, because I think we just get the girl to it.

  • But anyways, um, we also want tracked subreddit again, just in case there's, like a you Demi freebies subreddit that is basically all affiliate spam.

  • So I think everybody in that subreddit probably knows exactly what that subreddit is.

  • It's like literally probably four spammers.

  • Um, so not really sure would say about that.

  • Honestly, I feel like that sub should just be deleted.

  • Um, implants, whatever.

  • Um but then we're gonna say dirty equals false, and then we're gonna do is we're gonna generate through the spam E words.

  • And then for each word in the submit title dot lower.

  • What we want to do is, um if I'm sorry if the word is in the submit titled out lower that way, we lower case everything.

  • If it is, they're gonna say dirty use equal the true, um, price should break at this point.

  • Um, not wasting one.

  • It's not that I would break at this point, but we're gonna continue on, um anyways And then if that junk is not already in the user's trashy Orioles, let's go ahead and add some information.

  • This submission i d the title and the author's name into there.

  • And then if this is dirty, what we're gonna do is we're gonna add 1/3 count, add to the sub count.

  • We're just It's just basically a counter, um, a counter for that.

  • Now, what we want to do is if, um if the trashy score is above something, basically so we want to generate this kind of trash, we don't have a trashy scores.

  • So let me copy and paste this new block, because actually, this one's gonna change.

  • So really, we're just adding a little bit under here, but basically what we're gonna say is so, like, what we did up to this point is like, right here, right?

  • That's what we just covered here.

  • What we're going to say is, we're gonna try to say the trashy score is equal to however many dirty posters person made over the total that they have.

  • If we fail for whatever reason, for example, sometimes the sub count will be zero again, some sort of weird error.

  • You'll get a dip by zero.

  • So we're just gonna say trashy score zero Otherwise, it's just the percentage of trashy posts.

  • So we'll just print this out.

  • You could leave this or get rid of it.

  • Honestly, this won't take too much of your time, so you could just leave it.

  • Anyway, Um, if the trashy score right now is about 50% we're gonna add this user to the trashy users and then, um, for each trash in, trashy, her user trash anywhere else.

  • We're gonna add that to the spam content in general.

  • So once we get to this point, um What?

  • We could do it.

  • Let's go ahead and run this and make sure this actually works.

  • So this point, we're just returning u users.

  • Uh, let's go ahead and run this real quick and make sure we don't have heirs.

  • Um, okay, so we're already running through it.

  • We've found online free courses.

  • They're trashy.

  • Score is basically 92%.

  • We've already run through tools.

  • Break that.

  • Anyway.

  • So clearly this guy's a spammer.

  • This guy's a spammer.

  • This guy's probably a spammer.

  • For some reason, his ratio is at 44.

  • This is actually the guy showed you initially.

  • I think so.

  • Anyway, probably we should lower it may be to 30%.

  • I mean, if more than 30% of a user submissions or spam, maybe that's worth pointing out.

  • I don't know.

  • I can't decide on what a good number is, but I feel like definitely more than 50% there a piece of junk.

  • So?

  • So once we've identified who's a piece of junk, what we want to do is then iterated over that, um, and then post of it, because now we know it is true.

  • The junk.

  • So every time we make a post, at least on a brand new account, we have, like a timer.

  • It's like a 10 minute or eight minutes or something like that.

  • So every time we make a post, we also want to sleep.

  • So we're gonna import time here, and then I'm gonna scroll on down.

  • And now what we want to do is basically just iterating over.

  • Uh, the whatever we have now in spam content.

  • So I was gonna come down here.

  • I'm gonna paste this in.

  • I'm a scroll out.

  • Make sure we did that right.

  • Uh, let me confirm that real quick.

  • Yeah.

  • So once we get to this point, pull this back down and then we're gonna start iterating over the span content.

  • So we're gonna identify the i.

  • D.

  • The user, the submission itself.

  • When was that submission created?

  • If that was less than a day ago, that's just 60 times 60 times 24.

  • If that was less than a day ago than the full link would be to reddit dot coms.

  • Last mission.

  • I just want to save this for myself.

  • At least Initially when I was writing this, I just I really wanted to make sure I wasn't accusing someone.

  • Spam, That wasn't spamming.

  • So I've been going to save all of the links.

  • Basically, Um, then this is our message.

  • We're just be boop whereabout.

  • We're identifying this idiot.

  • They're a spammer.

  • Get rid of him.

  • That kind of stuff.

  • Uh, and they were just using string for Manning passing it on in there and let them know.

  • So once we've got that message, what we're gonna go ahead and do is first, we're gonna check the post the urals dot text.

  • This is just a quick file that we're just going to check and see if we've already posted to that file.

  • Uh, if we have, then we're gonna skip it.

  • Um, but if we haven't so if that links not already in the already posted.

  • What we're gonna want to do is let's print out the message.

  • Let's go ahead and make a reply to that reddit post.

  • Um, Then we're gonna print out that we've posted to whatever the link is, and then we're gonna write that link to the post of Euros text file, and they were gonna sleep for 12 minutes, and then we're gonna break because we don't want to necessarily iterated through, um, through everything we'd rather, like start clean and go with the newest submissions again.

  • So we'll break there.

  • Um, other than that, that's basically it.

  • So we could, uh, we can run this now.

  • I'm not sure if we'll actually find spam this time around, but let's just see, um, because I've been running this, so I'm not really sure that we'll find any spam.

  • Um, but we'll see what we get.

  • And if we don't see were turned off for four for some user name, I don't know who it was, but anyway, if we don't post well, I can do is we can At least we can.

  • Well, we can go back further in history.

  • Um, when we search for posters and then we had also go back further in history into the actual spammers.

  • Like, right now we say we're not We're not gonna post on anything older than a day.

  • We could, Um at least when I was writing this, I didn't see any point to doing that.

  • Especially if this is running live.

  • There's no reason to go back.

  • Super for historically did toe do this?

  • Um, but yeah, uh, it doesn't look like we're gonna find anything on this one.

  • Um, but what I'll deal, I guess what I could do.

  • I'll pull this one over at least.

  • So at least here, we figure it out.

  • Um, this one was for online courses.

  • We found him.

  • We put this just says we've posted, and then we're going to sleep for 12 minutes.

  • If we do happen to find somebody, what I'll do real quick just to see if we can get somebody else.

  • Let me say I don't know, 50.

  • And then at the point where we wait area here.

  • So let's just say I don't know, 10.

  • I must try to rerun that, and we'll see if we find anything.

  • Not really sure if we will, but we'll try, friend.

  • Anyway, while we're waiting on that, I'll just draw your attention one more time.

  • Thio, this is on.

  • Get her if you want to make any contributions or supported or whatever, feel free Thio to take part.

  • Um, other than that, the codes also on python program that net running through it.

  • Kind of like I did here, so if you missed anything or whatever, um, you can go over there, man.

  • We'll get all these users were fine.

  • Ah, pressure to set that limit to 50.

  • Probably should just done the historical toe, like, 10 days or whatever is taking for ever.

  • Okay, so we found somebody so again, it was online courses.

  • Um, and it's on you Demi freebies, which is, like, just a total spam subreddit.

  • But let's go and check him out anyway, So we'll just pay copy of here.

  • We'll head to that.

  • And here we are.

  • So, um, there it is.

  • So in action running and just in general, being slightly annoying, eh?

  • So far, it looks like, um, people have been relatively appreciative of of or bought.

  • But we'll see.

  • I wasn't really sure what people would think of it, especially because, like I said it, like on on you Demi freebies, I'm pretty sure that's all for a spam.

  • And I'm not sure, like if people actually like that sub read it and find it useful because they get cheap courses, I'm not really sure.

  • Um, anyways, uh, that's it for now.

  • If you guys have questions, comments, concerns, whatever feel free leaving below, otherwise I'll see you in another video.

what's going on?

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

創建一個Reddit機器人來檢測垃圾郵件--Python Reddit API Wrapper (PRAW)教程 p.4 (Creating a Reddit Bot to Detect Spam - Python Reddit API Wrapper (PRAW) tutorial p.4)

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字