Placeholder Image

字幕列表 影片播放

  • What's going on?

  • Everybody.

  • And welcome to a new tutorial.

  • Slash some coverage of a new package called requests Dash.

  • HTM L.

  • It's basically just a way for you to really quickly, easily parse, um, html.

  • So it's written by the same person who wrote the request library, so my expectations are high, but this is a very new package.

  • It's been on Get hub for about a month, maybe a little over, um, so I would expect some rough edges, but let's check it out.

  • So, first of all, to install it, let's scroll on down to the bottom to install it.

  • You just pipped install, request, stash HTML.

  • You will need python 3.6 or I'm assuming later.

  • But right now, the latest release of official Python is 36 So, uh, let's make sure you install it and then let's go ahead and check it out.

  • So check it out.

  • I'm gonna be using the following Web page, at least to start.

  • It's just python.

  • Parliament and Slash parse me, make par space and basically got some tax A list.

  • A table.

  • It's got imagery.

  • It's also inside of def tags, you see, kind of poke around That's got some Java script tests.

  • So if the job script loads, it says, look at you, shine in, and then it before it loads Its default is like, Why, you bad, though I think then we've got some pre tag information in here with the Zen a python, and we've got a link down here.

  • And then we've got some goofy looking characters here that it could throw you off.

  • So with that, uh, this is what we're gonna start off by parsing.

  • So coming back over here, it looks pretty simple.

  • I'm just gonna copy and paces and just remove the the interactive interpreter dash things.

  • So it should be that simple.

  • So you can request, not get I'm gonna pass in my own.

  • You, Earl.

  • You can feel free to use something else if you want.

  • Um, I'm just gonna be exemplifying some of the things I use this page because for this one page, I can control what happens to it.

  • It seems like every time I do a tutorial on literally anybody else's pages or with anybody else's AP, they all get deprecate ID.

  • They all change.

  • So this one should be a rock now eso that starts your session, we get some stuff, and then we can start actually interacting with it.

  • So again, I don't really know everything with this package, But generally when I check out a package, I mean, they might I'm not sure if he has, like, a documentation page.

  • It doesn't look like he does.

  • Let's see.

  • Oh, maybe he does.

  • I see what this looks like.

  • It's pretty much the same thing.

  • Um mmm.

  • For H team, this is interesting.

  • Pagination.

  • That's Ah, that's more magical than I would have expected.

  • That's incredible.

  • If that really is that simple to do pagination.

  • Wow, that's not written up here.

  • Okay, so anyway, that's interesting to say the least.

  • Um so then you requested Oh, Mike.

  • And then you could just generate over it.

  • Oh, Uh huh.

  • Um, that might work really well with red.

  • I'm curious to know if that works well on other websites.

  • Want to find well, to test that maybe at the very end, um, you can use it without request.

  • So if you downloaded maybe some HTML documents, you could then parse them.

  • Ah, yeah.

  • This has a lot more explanation.

  • Well, okay.

  • Cool.

  • um, I'm really that's that peyote pagination soon for.

  • That's just icing on the cake, man.

  • I would think you'd have to find the link to it and then go to the link.

  • The fact that that patch in age I'm sorry, I'm that's just that's just interesting.

  • So back to the simplified version of the docks, though all we need to do is once we get, we can just start referencing.

  • And I'm pretty sure as long as it's not Java script.

  • So yeah, immediately, we could just say, like, for example, let's just do our dot Well, first of all, let's print dir farce.

  • Let's just see all the things that we could do with it.

  • Let me just pull this up so immediately on the request.

  • Um, we could check against the encoding.

  • I get apparent in Cody.

  • We can close the request, uh, cookies and coding.

  • I'm not sure what the difference stream coating apparent encoding is.

  • Check the headers.

  • History weaken.

  • Reference.

  • The HTML.

  • We can check if it was a redirect.

  • That's interesting.

  • So just automatically does redirect you, but you can find out if you've been redirected.

  • That's interesting.

  • Uh, Jason, I wonder if you would just pull it.

  • I'm not sure if you say Dodge a song like it is like if it's like some sort of a p I or something in the responses.

  • Jason, if you could say Dodge a science that Jason object now, I'm not sure I'm guessing that's all the links.

  • That's if it's paginated, apparently, which is fascinating.

  • Also, be cool if it wasn't paginated like, I wonder if you could make some sort of crawler that just, like, automatically goes to any link and just keep slinking around.

  • If anybody that's trying to build just like a Web crawler, that's kind of an annoying thing to have to build.

  • Um, okay, cool.

  • So we've got text.

  • You are L.

  • And then the HTML.

  • So let's check out, uh, let's just check out dot html eso First of all, that's I guess we'll print out the dirt dot html as well, but let's see what we've got.

  • Text raw HTML search so we could search the HTML um, assuming links, find all of the links.

  • So, like, for example, it's print our dot html dot links, see what that looks like.

  • Okay, so it looks like it gives us.

  • Ah, set.

  • I guess that makes sense to give it a set.

  • Just in case there was multiple of the same.

  • I'm guessing that's a set.

  • I don't know.

  • Weaken, Uh, type.

  • Just looks like a set.

  • Yeah.

  • So it gives us a set of the links.

  • Um, what else could we print out?

  • Are that links?

  • We've got our dot html.

  • Wasn't that a dura of order?

  • Html Who gets all formatted and beautiful for us?

  • So what's the difference between a seamount and raw H de mille?

  • Oh, yeah.

  • Okay.

  • That gives us, like, the new lines in, like, all the tabs and spacing and stuff like that.

  • Okay, Okay.

  • I'm not sure exactly why you would need to do that, but, um Okay, so that's interesting.

  • The other things.

  • Let's see.

  • Let's check out.

  • Find you confined.

  • It looks like, uh, the idea.

  • Yes or not?

  • I'm not sure.

  • Let me see if if we have i d s here.

  • I know.

  • At least we have probably I d in the Yeah.

  • Yes.

  • No, J s.

  • I wonder if we can find that one.

  • I'm just gonna copy paste art up.

  • Find Yes.

  • No, Js print about even though it's kind of bad words.

  • Ah, darn, it was He, uh, probably got text.

  • I'm just guessing.

  • Nice.

  • Okay, so that's how we could find it.

  • I don't like what it's saying to me, but that's how we can find it, Which then brings me to the next curiosity I have, which is with the Java script rendering.

  • So what's happening here in first of all, let's just say, um I don't know Js test.

  • Uh, copy that.

  • Pace that Here.

  • Uh, save.

  • Okay, so at least, um, the whole point of this was to test if we could actually load Java script when we parsed things.

  • Um, since that was part of an older tutorial, but now with request html apparently, it's super simple, So let us check that out.

  • So if we scroll onto the documents, it looks like basically all you gotta do is just use dot render.

  • So s.

  • So let's go ahead.

  • Basically, you're gonna want to render it right after the request.

  • So let's just paste that in there.

  • So with Java script, the immediate return will be the base JavaScript, including, like the actual JavaScript code and in order to render Java script.

  • You actually have a browser that runs it like That's how JavaScript works.

  • Like you request the server and the server just responds in Just text, obviously, and then your browser's going to see.

  • Okay, here's some script.

  • We want to run this script.

  • So what you have to do in order, read elements inside of tags that are updated by job.

  • Script says you need to render it, and you also have to kind of wait a moment.

  • You have to wait until it's done rendering.

  • So that's kind of a tedious thing to build yourself.

  • But apparently it's in in here, ready to go.

  • Let's go and run it and let's see if it updates it.

  • So on the first run, it looks like you have to install chromium.

  • So I guess gonna download download may take a few minutes.

  • I hope it doesn't.

  • I don't really want to wait that long.

  • Maybe I'll posit what we go, Um, but I think it's done.

  • Okay, It's done okay.

  • And it's just telling you where it extracted to, um, only an heir unable to remove temporary user data.

  • Interesting.

  • It did work, though.

  • That's the That's this is the text right here that we're looking for.

  • So, like, we just kind of make some space here, since the air's in our way.

  • Um, probably got permissions there.

  • Honestly, uh, anyway, yeah, that's what we were looking for on that's indeed, you know, if we come over here, so it definitely works, but for some reason, we're getting an error when we attempt to remove the user data.

  • So let's open up this file the launcher.

  • So at least in sublime, it's really nice.

  • You could just double click on the thing that airs, and we could check it out.

  • Okay, So in this internal method, we're trying to clean up some data and they're using Shue till remove tree.

  • So is trying to remove a directory.

  • Basically, that's how you're gonna remove directory with contents.

  • Um, home.

  • So we already we saw where it went.

  • It goes into users h and then that, uh, let's see me see if I can find it.

  • Ah, yeah.

  • So it goes in, um, you know, here it is.

  • And then if we go into there, we can see the contents.

  • But for some reason, we're unable to delete them.

  • I'm gonna guess it's a permission.

  • Zehr.

  • One way we can confirm that is to not ignore Ares.

  • We'll set that to false, just in case anybody's like, Why would you ever ignore the errors?

  • I forget that.

  • I think if you don't ignore the error, I think one of the errors is that the file directory has contents, and that'll stop it from working.

  • If you ignore errors, it does work.

  • But we also don't get to see the air.

  • So let's go ahead and see, because things aren't working.

  • But once you once it is working ones Shue Till does work for you.

  • You actually kind of wanna access is denied.

  • You actually do wantto put ignore heirs to.

  • True.

  • So let me just set this to true again.

  • So we're seeing here access it tonight.

  • Just we don't have the permissions.

  • Um, I'm kind of curious.

  • I kind of want to open up an administrator to see if I Kenbrell that.

  • Let me pause for a moment.

  • Let me open it up and head to where we're working and see if we can use, uh, at least command prompt as administrator and see if this works.

  • Okay, let's give it a shot.

  • Where?

  • Administrator control panel.

  • Wow.

  • It's still can't remove it.

  • We did get the info here, though.

  • Um Mm hm.

  • That's gonna break any script that's running, too.

  • Though that's really quite the bummer.

  • Uh, least right now the show must go on.

  • So I'm gonna continue, uh, at least a short term fix that I can think of.

  • Um, why wouldn't that work?

  • I'm pretty sure if you used your own like Lennox and you pseudo, though, that this wouldn't cause any problems for you.

  • It's He beats the location.

  • I'm not really sure.

  • I really can't decide why.

  • That won't delete.

  • If somebody has a better idea, go for it.

  • Otherwise, temporarily.

  • One option we have is to just not raise the IOA.

  • Don't raise the IOA air.

  • Instead, we could just print.

  • Um, we can just print unable to remove, like we don't actually have to.

  • We don't need to delete that file.

  • We just would have liked to to keep things clean.

  • Um, all right, strike in.

  • Cool.

  • Okay, so now it'll still at least print out to us that something went wrong.

  • But what we would be able to continue.

  • So, like, if if, like, for example?

  • Um, actually, I'm not even sure moving already moving along was that.

  • That's kind of weird, because the other print, actually, I think was working.

  • Uh, we are moving along.

  • Maybe.

  • Maybe it wasn't.

  • I thought there was space being made.

  • That doesn't make any sense to me that if it would raise that error, why it would continue working.

  • Wow, it actually does.

  • Okay, Magic.

  • I guess we could leave the io where they're to be honest.

  • Because it does.

  • We're able to continue moving along.

  • I don't know if that's bothering you, though.

  • You could get rid of it.

  • Um, let's see what else we could do.

  • So I'm curious to parse that table Also, let's try to parse.

  • Um, let's pursue finance dot yahoo dot com.

  • That sounds like a fun one.

  • Uh, I'm pretty sure they block automated requests now.

  • Okay, So what we can what we can do is like, if we go to amazon dot com, one of the things I used to do is parts from here.

  • And if we go toe like price statistics, right, there's all these statistics and for a while, they were just I'm not even sure how they did it.

  • To be honest, it must have not.

  • It just wasn't with Java script.

  • Um, but now it is with Java script in these values are not like static.

  • They're updated via Java script.

  • So and now I'm kind of curious.

  • I wonder how they did do it.

  • Like, I guess you could like, if you's like ginger formatting, Let's say, um, you wouldn't necessarily use job script update those values.

  • So maybe that's it.

  • Anyway, let's head to this page.

  • Um, Booth.

  • And now what I'm curious about is like, can we find?

  • I wonder first, not true.

  • If we don't do that, if it'll find us a list like so for example, what I'm gonna be looking for is like, let me just do it.

  • I'm just going to go into the source code, and I'm gonna search for Ford P.

  • E.

  • So it looks like we could look for any table data.

  • For example.

  • Looks like what if we wanted to just find all table data, So probably we have to use a tag is my guest.

  • Let's see if they have any tags here.

  • So about attacks Links?

  • Uh, okay.

  • Find a So this would be any So I wonder if we could just do find TD like, fine table data about dot html about attributes.

  • I think you can just use dot find in any bit of tests text.

  • So let's try this.

  • Let's say are not render.

  • Let's do, um, friends are dot finds.

  • Let's find all the table data.

  • Let me just see what we get because I think we don't have to be table data dot text.

  • Probably.

  • I'm not really sure We're still gonna see that stupid message, too.

  • Html response objects.

  • Oh, me oh, prysby dot html.

  • Maybe it needs to be art.

  • That's weird.

  • Our Yeah.

  • So it needs to be art Ishmael dot find.

  • Probably it's hard because they don't have just like searching purely html.

  • But I'm pretty sure you need to do the HTML.

  • Yeah, yeah.

  • Cool.

  • Wow, that's super neat.

  • So you can get all the table data and then, in this case, all it's super annoying with their table data in the Ah, yeah, Who is each one is has a unique I'm pretty sure or does have a unique class.

  • They are actually all the same, yet they're all the same class table data.

  • And I wonder if we can search for the class specifically to in ah in requests dot html not find.

  • It would be interesting to know if we can do that.

  • I'm sure you can see.

  • Yeah.

  • Oh, well, you have class.

  • At least there.

  • Where's Thea?

  • Where's that link to the full documentation?

  • Ah.

  • Oh, right here.

  • Cool.

  • Let's see if we can do it.

  • Find first equals true about class pot on attributes like I wonder if you can find specific classes.

  • I'm sure there's a way I'm just trying to figure out what the parameter would be.

  • Wait, Bo.

  • So could we say td dot Let's just try.

  • Let's try to do what About what?

  • The spaces, though.

  • I guess you?

  • Yeah, you.

  • Mmm.

  • I don't think this is gonna work.

  • I don't have high hopes for this one.

  • I wonder.

  • I think it would find him if it didn't have spaces.

  • But let's just see what happens.

  • I can't.

  • No.

  • And now I don't even know what the other error was because we ran a space, apparently.

  • Oh, Maybe this is it.

  • There just wasn't anything.

  • Mmm.

  • Yeah.

  • It looks like the parentheses threw it off.

  • Bummer.

  • Not really sure.

  • I'm gonna go ahead and move along from that one, but that's interesting.

  • This is We're probably going beyond what it intended us to use this for.

  • Uh, let me see if any of these have a classroom quick.

  • Like I bet I can do.

  • And maybe that dot was for the i D.

  • No.

  • Sure.

  • Let's do Let's go back to parse me.

  • Make pars face.

  • Just curious.

  • Find.

  • I want to find all the active tags.

  • Give motile class.

  • We really don't need to render it anymore.

  • That way we won't have to see that error anymore.

  • Yeah, cool.

  • Anyway, let's do that.

  • So if you wanted to find all the like of a specific class, he would use dot Cool.

  • Okay.

  • Anyway, I think that's enough for now, that's probably a pretty long That's just me kind of poking around with library.

  • Um, if you have any questions or something like that, feel free to share.

  • If you got some cool things that you found about it, you can share that too.

  • Um, I'm also pretty curious with the reddit thing.

  • I take its word for it.

  • I'm trying.

  • Think of something real quick that we could search for the pagination.

  • I want to see that work on everything.

  • I don't understand how you could know that it was pagination.

  • Um I just can't think of something that I could search real quick that has pagination to it.

  • But I wouldn't mind poking around with that.

  • I'm going to guess it works, at least on Reddit.

  • But how would they know?

  • How does he know about the pagination?

  • Intelligent pagination support Always improving.

  • So I'm guessing it doesn't always work.

  • Um, what if we did?

  • Um, what is it?

  • Uh, let's do, uh, to hacker news.

  • That's got some pagination to it.

  • So what if we go to what news dot y Combinator dot com and then let's see how they did it.

  • So they're just saying our dot html dot Next.

  • So what if I say this art of html dot Next, uh, print?

  • Nope.

  • Really has no attribute.

  • Next.

  • Let's check Reddit real quick.

  • Let's see if that one worked.

  • That's kind of weird.

  • Why wouldn't no that one doesn't work either.

  • Well, then, I'm not sure for our dot html.

  • We'll see if this one works.

  • Look, I'm just trying to do the example that they've got posted now.

  • Four aged female.

  • Let's just study this.

  • Go away clips.

  • What did I do wrong?

  • Are equal session not get are equal session like you read it and then this two lines Don't forget that there it goes.

  • So it gets the initial one and it definitely is getting the pages.

  • And I guess each time it goes to that page to get the next one, which is interesting.

  • So it definitely knows that pagination.

  • Let's try again on hacker news because the next thing, the next operator thing doesn't appear to be working.

  • I started to work that I failed.

  • So we got to page two, and then it went to CNBC.

  • So I think it just picked the wrong page.

  • So he started here, and then we go probably found page two, which was smart, and somehow I got stuck on CNBC.

  • Not sure why, So Okay, the pageant, OK, It sounded too good to be true, and I think it is, but hopefully over time it will improve.

  • And that would be like, that's pretty cool.

  • Anyway, I think that was a good time to stop.

  • Um, I'm mostly ran into the edges of this, this package, but for simple HTML parsing and stuff like that, this clearly is super useful.

  • You won't need to use beautiful soup and stuff like that, but, um, at least we found some issues with the Java script thing.

  • Uh, it just can't seem to delete that, uh, your temporary directory.

  • So eventually you'd probably have to go in there and delete it just manually.

  • Otherwise, I'm trying to think of what else?

  • Obviously like, the adoration didn't really work, but still, the rendering actually worked.

  • And I'm not sure you would hit that issue on Lenox.

  • So just the fact that I'm on Windows could be responsible for that issue.

  • And then, um, the pagination.

  • I mean, come on, that's really hard.

  • I'm surprised that they're even trying to do that, but cool.

  • Um, at least it works on reddit to bed.

  • Read it has a P.

  • I probably rather use that than paginated through them.

  • Uh, other than that, I think that's it.

  • Like, the search is super useful.

  • Super quick.

  • Super simple.

  • Um, yeah.

  • So pretty cool.

  • Package if you need to parse HTML.

  • I think this would be the way that I would suggest you go.

  • Um, but yeah, if you've got questions, comments, concerns, whatever.

  • Feel free to leave it below.

  • Otherwise I will see you guys in another video.

What's going on?

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

Requests-HTML - 查看一個新的Python HTML解析庫。 (Requests-HTML - Checking out a new HTML parsing library for Python)

  • 1 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字