字幕列表 影片播放
-
Hello, and welcome back to R for Statistics and Data science.
-
In the next few lessons, we will dive deep into the Star Wars data and will learn how
-
to transform data sets in various creative and not-so-creative ways.
-
Let’s get to it!
-
This is the first real lesson in which we will use the diplyr package.
-
For the distracted souls out there, diplyr is part of the tidyverse and we got it when
-
we installed the tidyverse ecosystem of packages.
-
It specializes in data manipulation tools that deal with filtering, mutating, and summarizing
-
data.
-
First things first, let’s fire up the Star Wars data frame that comes with diplyr.
-
This time, I will save it as “star”.
-
Notice that the data are saved as a tibble instead of an R base data frame.
-
Let’s keep it this way and use some of the tibble properties.
-
Tibbles come in handy here because this is a relatively big dataset and we don’t want
-
to see the entire thing every time we do an operation and print to see our results.
-
Tibbles limit the printing to just a few rows.
-
Okay, although we’ve already looked at it before, if you want to see the data in all
-
its glory, run View(star).
-
This will open the viewer and you can scroll through the values to your heart’s content.
-
Right!
-
Transforming data!
-
The filter() function does what we think it does: subsets data according to a set of criteria.
-
It works like this: we pass the data, and then the expression according to which we
-
want our data filtered.
-
There can be more than 1 criteria, of course.
-
For instance, I can select all the droids in the data frame.
-
And now I can call only the ones from Tatooine.
-
Right.
-
Yes, that makes sense, it was young Anakin Skywalker who re-built C-3PO while still on
-
Tatooine.
-
And R5-D4…
-
I am not sure I know anything about that little R5-unit!
-
Okay.
-
Filter() also works with logical operators, so, for example, I can call every character
-
that has red, orange or yellow as an eye colour.
-
Okay, the majority of these aren’t human…
-
I wonder if there are any more humans with weird eyes apart from Darth Vader and Palpatine.
-
No?
-
Yikes.
-
Alright, next we have the select() function.
-
Now, our database may not have hundreds of variables, but looking at the column names,
-
it does feel like I genuinely don’t need to know about some of these things.
-
To narrow down the data to the information I want, I can use select().
-
This selects specific individual columns, by name.
-
If I want to select a column and then everything between two other columns, I can do this...
-
Isn’t this already a lot easier to do than with the base R functions we learned earlier?
-
But check this out, too: select() works nicely with a couple of nifty functions like starts_with(),
-
or ends_with(), which let us subset data in a super intuitive way.
-
So, if I wanted to get all the columns that have to do with coloration, I can run this...
-
Okay, new scenario: there are a bunch of interesting variables you want to look at but you also
-
don’t want to ignore the rest of the data… what do you do?
-
Well, you can use the everything() function with select, to move the variables you want
-
to the beginning of the table, and then show everything else.
-
Like this.
-
Sweet, right.
-
Finally, let’s look at the mutate() function.
-
Mutate() is dplyr’s easy way of creating new variables from variables that already
-
exist in the data set.
-
For example, I can calculate the BMI for our characters because the Star Wars data has
-
recorded both height and mass information.
-
Of course, this is largely uninformative, because the BMI scale is extremely human-centred,
-
but you know – anything to get the point across!
-
Now, if mutate() is the function to use when you want to add a column to your data while
-
also retaining all the other columns in your data frame, then transmute() is what you will
-
opt for if you only want to keep the new variable you create.
-
Let me show you want I mean...
-
See?
-
Effectively, transmute() created my new variable and allowed me to extract it without tagging
-
everything else along as well.
-
Great.
-
Okay!
-
I will end this lesson here because otherwise I am at risk of going into way too much detail
-
about side comments I make.
-
Thanks for watching, everyone!
-
In the next lesson we will pick it up right where we left off.
-
See you there!