Placeholder Image

字幕列表 影片播放

  • what's going on?

  • Everybody.

  • And welcome to video where we're gonna be going through some of our Google takeout data.

  • If you're not familiar with Google, take out.

  • It is a way you can go to Google like your Google profile and download all of the data that Google has on you, at least through Google Service's.

  • So if you want to do that, it can take a while to actually get the archive.

  • But if you log in, you go.

  • There's like a blue, you know, account top, right click on that.

  • Then on the left hand side there's data and personalization.

  • You can click on that and then from there, scroll down just a little bit.

  • And then you can download your data and then you're gonna pick the things that you want.

  • Now you might go through this list and you might think you know what those things mean.

  • But you probably don't so just get it all.

  • So you get a good idea of all the things that Google has on you because you would actually be kind of surprised.

  • Or at least I was like there were some things that I thought I understood, like I thought I knew, like what kind of data Google had on me, but I didn't.

  • And so it's just a good idea.

  • Just grab it all.

  • Just so you have a full idea of all the information that they have.

  • So it's even like little stuff like they've got, like, purchase history, for example, even though you might not shop on Google but Google persons, your e mails and from parsing your emails, they've got purchases history from, like confirmation e mails and hotel booking emails, all that kind of stuff.

  • They're just extracting that by parsing your e mails.

  • So stuff like that, um, you might not realize they have eso anyway.

  • Definitely, I think just get it all.

  • But mainly we're going to be focusing on the my activity section, uh, least to start.

  • But later on, if you guys wanna suggest stuff that you want to see us go over, feel free so we might do other stuff.

  • So finally you'll create your archive, and this will take like, at least for me.

  • It took a few hours, and you'll get like a confirmation e mail and stuff like that.

  • Like for security?

  • I can't think of a much more scary email to receive them.

  • You've requested your ghoul takeout data.

  • And if you weren't expecting that email, so anyway, cool.

  • So that's how you get the data.

  • And then once you get it, you'll get like it.

  • Take it like you extract it and you'll have a takeout directory Kind of like this.

  • This is not the full one for me, but just some of the stuff I wanted to point out.

  • So there's that purchases and reservations thing I was talking about location history.

  • This is you tracked everywhere you go based on your phone, but not just everywhere you go.

  • It's not just coordinates.

  • It's Are you walking?

  • Are you staying still?

  • Are you in a car?

  • I didn't think they were doing that, but they are, and based on that, you can.

  • You can then extrapolate information.

  • So based on that, you know where the person because, like where they sleeping at night?

  • Basically, that's probably home.

  • Where do they go every day?

  • Monday through Friday, Similar times.

  • It's probably the work, you know, or school or something like that, Uh and then in based on the coordinates, they can pretty easily find out what's at those coordinates and stuff like that.

  • So anyway, like, I remember the 1st 1 of the times I moved, I went to the same store, like, twice in a row or something, and Google just assumed that's where I worked.

  • It was really weird, but my phone was like, You need to go to work soon.

  • One day and I was like, What you talking about?

  • I have a job.

  • So anyway, um, yeah, so we're interested in my activity inside here.

  • There's stuff like Android again.

  • One of the things I didn't think they tracked was like, every app you open look, they just track it.

  • Um, all right, cool.

  • And as time goes on, like all this stuff, like on its own or alone doesn't really sound that nefarious.

  • But like then you start realizing like all this stuff adds together to basically be your whole life, and it's kind of creepy.

  • So anyway, on that note, we're gonna be going through our entire search history and hopefully, um, you know, first of all, just as a four warning, you guys don't get to comment on my search history unless you post your search history.

  • So no making fun of me.

  • So anyway, this is what it looks like.

  • It's an HTML file, which is kind of weird because other stuff like, for example, the location data.

  • Let's just pull up that, um that's a straight up Jason file, so I'm not really sure why this one is an html file on, like, some stuff is Jason and stuff I don't really know, but anyway, uh, so we want we actually, in this case, we're gonna have to pour ce this HTML file, which is kind of a pain, but versus, like, the location history, like getting through that is like a breeze.

  • You just important, Jason and you go on, in this case, you could use beautiful soup or something, but we're just gonna do some stupid splits.

  • I think I think that's the way to go.

  • Um, so the first thing we're gonna do is just something really, really basic.

  • We're just gonna run through all our search queries, split by word and then just start looking at one of the most frequent words.

  • And then what we can do is, you know, you could do that overall, but there's like, five years of search history.

  • So then you could do like a daily moving one year window.

  • So, like every day along the way, what was the previous year's worth of search history?

  • And what are the most common top 10 words, for example?

  • And then And that should probably give us, like a good overall, like macro understanding of major interests of ours and major life changes and stuff like that.

  • And then we could go smaller, like month, a month window or a week window and figure out, you know, micro things that were going on just a TTE that time.

  • So anyways, pretty cool.

  • So that's the plan.

  • Later we could do we more advanced stuff like Dio like word vectors and get general concepts that you know I'm interested in or you are whatever we could do lots of really cool stuff.

  • One of the other things that I saw that I absolutely have to try is they also have all of your Google assistant, uh, data.

  • So your actual translation is mapped to the audio, which is basically a text to speech data set waiting to happen so and also a speech text so pretty cool but mainly so you can create a, uh, text to speech.

  • Uh, but this sounds like you, uh, Also take note.

  • Google could do the same.

  • Great.

  • So an you ate, Let's jump in.

  • So first of all, I'm just gonna make a new Ah, a new folder.

  • I'm gonna call this G data.

  • I'm gonna put take out into G data, and then I am going to figure out why my mouse isn't showing up.

  • Hopefully, well, I can't keep my mouth.

  • Okay.

  • Cool.

  • Um, all right.

  • And then what I'm gonna do is file I'm gonna save, and I am gonna put that in desktop G data.

  • And then for now, I'm just gonna call this search database stock pie search data.

  • Awesome.

  • So that's all we're gonna do.

  • We're gonna parse through this, save the times.

  • We're gonna convert those times to you next time because the date stamp is gonna be kind of a pain to work with.

  • Um, we're gonna split by word and then save that into our database and scratch all of our itches.

  • Okay, let's get started.

  • So, first of all, we're gonna need imports.

  • Sq light three for the database.

  • We're going to front?

  • Well, actually, probably this file.

  • We don't need the collections.

  • I was gonna bring that.

  • And, um, let's go ahead and bring We're gonna use from Tiki import teeth hue, T Q d m uh, a security.

  • Uh, and then we're gonna do import date.

  • You tilled up her, sir, and we're not gonna need Jason still scratches.

  • I'm sorry, guys.

  • I got, like, allergies or something.

  • Um, I don't know.

  • We'll just get started.

  • So first of all, let's get thes search activity activity, uh, location.

  • So I'll just do take out my activity.

  • Serge, take that copy.

  • Pasta slash my activity dot html Don't know if he's back.

  • Slashes will actually create a problem in this case, but I'm gonna just fix those manually really quick.

  • Then what?

  • We're gonna after this, we're gonna create our table.

  • Or actually, first we need to make the database, which is actually super simple with us.

  • Cute light.

  • You just connect.

  • And if it doesn't exist, boom, It's created.

  • So let's do that.

  • So we're gonna say, uh, con equals sq light.

  • Three dots connect, and we're gonna save.

  • This is my life dot database and then C equals conduct cursor.

  • Awesome.

  • Define make table.

  • Now what we're gonna do is seed on executes, and we're gonna create table.

  • If not exists, the table will be cold.

  • Words and basically words is gonna contain e.

  • I think we'll still we'll have a primary key.

  • Why not?

  • I d into juror.

  • That will be a primary key.

  • I d.

  • We're gonna be UNIX.

  • That will be a really We could also go with energy about to say riel.

  • Uh, and then the actual word itself, which will be a text type.

  • Um awesome.

  • OK, so that's our table.

  • Can I please listen?

  • Uh, what's what's Margie Indentation?

  • Canape?

  • I'm so glad that this thing is like, That's that's violating Pepe.

  • No tabs allowed.

  • Let's fix that really quick and then using spaces.

  • Fix this stupid idiot.

  • Awesome.

  • Okay, so make table.

  • So let's go ahead and we'll run.

  • Uh, make table.

  • We don't actually need to run that now, though.

  • And now what we want to do is work on search data.

  • So defined search data.

  • Uh, what we wanna do here?

  • The first thing is we're gonna split by word, but we don't want all the words like a the and these air words we don't care about those in the n l p community are called stop words.

  • These are just words we don't We just want to kind of toss him out.

  • So the first thing I'm gonna try is stop word list and list lt Kay's list of English stock words.

  • Heck, yeah.

  • That is not the format I wanted, but up.

  • Here we go.

  • Here's a list, and you wanna want the above list, as in Ray.

  • Here you go.

  • Yeah.

  • Thank you.

  • Because the other guy obviously isn't a programmer.

  • Union, huh?

  • This looks like a great list.

  • I'm taking it.

  • Boom.

  • Copy.

  • Nice.

  • Big long list s.

  • So we're going to say stop words equals bang.

  • Um, it's kind of a is not the greatest list of stop words.

  • Throw has meaning.

  • Wonder also has meaning.

  • Seriously Has meaning.

  • Can it get?

  • I think I'll use this list, but it depends on your depends on what you're trying to do.

  • I mean, some of these air actually meaningful words.

  • I'm not sure you'd want to toss them out.

  • Um, in our case, I think it will be fine, but it's interesting.

  • Some of these.

  • I'm just not sure I agree, but the show must go on.

  • So now we want to do is we wanna open.

  • We've already closed it, but we want to open that HTML file.

  • We want to split for searched for I gotta open it because I gotta figure Is it nice?

  • Um, we gotta split for, uh, because we don't care about this.

  • We care about searched for So we want to split for that.

  • And then we want to parse out each word in that link.

  • So pretty easy task.

  • So searched is so now we want to do is open that file.

  • So what we're gonna say is with open, um, the search activity with open search activity with the intention to read as f contents will be eagle toe f dot read.

  • Then we want to split those contents and iterated over those contents.

  • So we're going to say is four item in con tents.

  • Don split.

  • Um, and we want to split by searched for with a capital s for I am in cars.

  • Okay, cool.

  • Well, let's just print item, print item, and then we'll break because we're gonna do some development.

  • here.

  • We don't want Tol.

  • Well, we made the table, but we pride didn't do search data.

  • We did not search data straight.

  • One more time here.

  • Car map, blah blah blah, blah, blah.

  • Okay, fine.

  • We'll open it with some encoding.

  • I wish this was somehow easier to get back over here.

  • Uh, here, uh, and coding equals and we'll make that utf Edel's animals who have beautiful.

  • Not that beautiful, but this what we wanted.

  • Um oh, so when we split by search for actually the zero with element we don't actually care for.

  • So let's do one colon.

  • Try one more time.

  • Beautiful.

  • Exactly what we were hoping for.

  • So now we're gonna do some pretty ugly splitting here, so I'm going to say search, underscore shrinking equals item not split by that.

  • That escape that there.

  • Um And then we'll be the first if don split by the closing link tag here.

  • Split by boom.

  • And in that case, went zero with.

  • So now let me print Search string.

  • Beautiful.

  • Just beautiful.

  • Now we want a purse out the date.

  • Where is date?

  • Here.

  • So it should be the first break.

  • Someone's gonna be like this is so ugly in the comments.

  • Whatever, bro.

  • Uh, day equals item.

  • Don't split Break.

  • You did this so stupid.

  • Look, you could have done it this way.

  • Okay, Split, um, active.

  • And then zero with Here's a Radek string.

  • This had worked so much better, uh, threat And the date Unless it's stopped printing.

  • I don't let you see how that looks.

  • Beautiful.

  • Beautiful.

  • Now we want to convert that date to a UNIX time so we can use it as the programmers we are.

  • So date you, Tilda.

  • Parcelled up.

  • Parse the day dot Time stamp.

  • That's not gonna do anything for us.

  • D equals day.

  • Uh, rather than date.

  • We'll just print d print the d Looks good.

  • TZ name e d t identified, but not understood past Easy info is arguing.

  • What if I don't want it?

  • I'm gonna really care about e t T.

  • What?

  • How do I What's t z in photos?

  • Can I just, uh t z in photos date?

  • You too.

  • I know that it's about TZ in foes Tizi and Bo's avenge additional time zone names.

  • Okay, because e t t.

  • I don't even know.

  • Is that the same?

  • That's probably say, Miss e s t like Eastern.

  • I don't know what the D stands were Eastern Dane time.

  • I don't know.

  • Um, so this is probably the delta from, uh, what do you call it?

  • You see, T.

  • I don't know.

  • Uh, I'm just not worried about it right now.

  • This is working, so I'm just gonna skip it.

  • Um, a real programmer.

  • Uh, yeah.

  • The other thing.

  • You could just parts up to, like, the last four.

  • Or you could pasties ian foes or replace like, what if I did?

  • This plays GT t with E S T.

  • What happens in that?

  • Okay.

  • You don't even understand.

  • Yes.

  • T o get well, I don't care to cater to you broke.

  • Okay, so we've got the search during we've got the time.

  • Now we can insert the words into a database.

  • So now what we're gonna say is four four w in search string dying on.

  • We're going to split by a space.

  • Cool if w not in stop words.

  • What do we want to do?

  • We want to see that Execute attacks.

  • Cute.

  • Insert in to insert into words.

  • Oh, man.

  • I'm gonna flop here.

  • I think it's like this word's gonna try this UNIX word, but I can't remember if it's like this, and then it's like, two.

  • I'm gonna check.

  • Don't worry.

  • That that and then I want to say it will be a comma to bull.

  • And then, uh, w no, no, no.

  • UNIX.

  • His first caution, Um, this is this is going south fast, so d w And now I'm beginning to wonder maybe you don't have a common there.

  • That's looking good.

  • You guys think I think I made it.

  • Let's see.

  • Um, let me check pool going.

  • This is the best program.

  • Your research resource.

  • I've never heard of it.

  • It's phenomenal.

  • Uh, dynamically inserting into a database.

  • Dang, Who is that guy?

  • He's good looking.

  • I didn't do values.

  • Dang it.

  • I was close.

  • I was close.

  • I deserve partial credit values.

  • So, like that And then question mark question Mark Comma to pull.

  • Is that about right?

  • Close.

  • Common tube.

  • Yep.

  • I was so close.

  • Everybody okay?

  • Um cool break.

  • I think that's it.

  • Y'all.

  • So let me get rid of that break.

  • And before we forget, you need two things.

  • We need a conduct commit and we needed conduct close at the very end.

  • Now, this is not too much data like our full data set of 80 megabytes.

  • So there's no way this is gonna be bigger than 80.

  • It's probably, like less than 10.

  • So, um, committing at the very end is totally fine.

  • Uh, many times you might not want to commit once at the very end, like you pry wanna commit in batches.

  • But in this case, the entirety is smaller than a batch that I would commit.

  • As so we'll just committed the very end.

  • No big deal.

  • Uh, we're gonna comment that out, and we are gonna t q d m this operation here.

  • So we see where we are percentagewise along the way.

  • And finally, we're going to try and accept because we're probably gonna hit in air somewhere.

  • And I don't want this whole thing to flop because of one stupid air.

  • There is a trauma because, like some of this stuff you've got, like, visited and then searched for, So that's kind of awkward, and that's probably cause an error.

  • And there's probably other things too.

  • So we're gonna try except exception as e print.

  • What?

  • What?

  • No, no, no.

  • What's going on?

  • What is going on?

  • Oh, now you want to fix?

  • Okay, Whatever.

  • Uh, proof string he look it.

  • This thing is just thrown a fit.

  • What is your problem?

  • Lint?

  • Er are we good?

  • I think we're good.

  • Okay, so let's run this, um g dei dio Sandy.

  • Looks like we actually practice.

  • Already ran it.

  • So let me just delete that and it shouldn't be a problem, but I forget why we had to rerun anyway.

  • Python search database.

  • Stop on.

  • We can do it.

  • Here we go.

  • We got a progress bar is looking great.

  • We're, um We're going quick.

  • Just for the record, if you want to look at your database like I'm going to when this is done, I can't really see anything yet because we haven't committed anything.

  • But I'm using like it's like, sq light database viewer or something.

  • Um, open with D B browser for sq light.

  • Just for the record, we're gonna wait for the It's done 3.4 megabytes.

  • That's really small.

  • OK, browse data.

  • This could be risky.

  • Everybody by the show, the world I could always blur it in editing and gas.

  • Then I'd have to edit.

  • Okay, so, uh, cool.

  • Looks like it works.

  • We get the unique I.

  • D.

  • A UNIX time stamp, and then the actual words I'm not going to school to force that could be dangerous.

  • Uh, all right, so we are ready for, uh, the next Victoria, which is where we will begin to build the images that will represent the frames in our video of our search histories.

  • Over time, that's gonna be pretty cool.

  • Shout out to my most recent channel members, Rodrigo Claw.

  • And why you, Ian Christensen and Kunal?

  • Thank you guys very much.

  • Without support from people like you, I wouldn't be able to play with Google data like this.

  • I mean, it's pretty awesome.

  • So thank you guys very much for your support.

  • And I will see the rest of you guys.

  • I love you guys, too.

what's going on?

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

A2 初級

構建搜索詞庫--谷歌跟蹤外賣的數據分析第1頁 (Building Search Word Database - Data Analysis of Google Tracking Takeout p.1)

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字