字幕列表 影片播放 列印英文字幕 Hello, and welcome back to R for Statistics and Data science. In the next few lessons, we will dive deep into the Star Wars data and will learn how to transform data sets in various creative and not-so-creative ways. Let’s get to it! This is the first real lesson in which we will use the diplyr package. For the distracted souls out there, diplyr is part of the tidyverse and we got it when we installed the tidyverse ecosystem of packages. It specializes in data manipulation tools that deal with filtering, mutating, and summarizing data. First things first, let’s fire up the Star Wars data frame that comes with diplyr. This time, I will save it as “star”. Notice that the data are saved as a tibble instead of an R base data frame. Let’s keep it this way and use some of the tibble properties. Tibbles come in handy here because this is a relatively big dataset and we don’t want to see the entire thing every time we do an operation and print to see our results. Tibbles limit the printing to just a few rows. Okay, although we’ve already looked at it before, if you want to see the data in all its glory, run View(star). This will open the viewer and you can scroll through the values to your heart’s content. Right! Transforming data! The filter() function does what we think it does: subsets data according to a set of criteria. It works like this: we pass the data, and then the expression according to which we want our data filtered. There can be more than 1 criteria, of course. For instance, I can select all the droids in the data frame. And now I can call only the ones from Tatooine. Right. Yes, that makes sense, it was young Anakin Skywalker who re-built C-3PO while still on Tatooine. And R5-D4… I am not sure I know anything about that little R5-unit! Okay. Filter() also works with logical operators, so, for example, I can call every character that has red, orange or yellow as an eye colour. Okay, the majority of these aren’t human… I wonder if there are any more humans with weird eyes apart from Darth Vader and Palpatine. No? Yikes. Alright, next we have the select() function. Now, our database may not have hundreds of variables, but looking at the column names, it does feel like I genuinely don’t need to know about some of these things. To narrow down the data to the information I want, I can use select(). This selects specific individual columns, by name. If I want to select a column and then everything between two other columns, I can do this... Isn’t this already a lot easier to do than with the base R functions we learned earlier? But check this out, too: select() works nicely with a couple of nifty functions like starts_with(), or ends_with(), which let us subset data in a super intuitive way. So, if I wanted to get all the columns that have to do with coloration, I can run this... Okay, new scenario: there are a bunch of interesting variables you want to look at but you also don’t want to ignore the rest of the data… what do you do? Well, you can use the everything() function with select, to move the variables you want to the beginning of the table, and then show everything else. Like this. Sweet, right. Finally, let’s look at the mutate() function. Mutate() is dplyr’s easy way of creating new variables from variables that already exist in the data set. For example, I can calculate the BMI for our characters because the Star Wars data has recorded both height and mass information. Of course, this is largely uninformative, because the BMI scale is extremely human-centred, but you know – anything to get the point across! Now, if mutate() is the function to use when you want to add a column to your data while also retaining all the other columns in your data frame, then transmute() is what you will opt for if you only want to keep the new variable you create. Let me show you want I mean... See? Effectively, transmute() created my new variable and allowed me to extract it without tagging everything else along as well. Great. Okay! I will end this lesson here because otherwise I am at risk of going into way too much detail about side comments I make. Thanks for watching, everyone! In the next lesson we will pick it up right where we left off. See you there!
B1 中級 R中的數據幀--數據變換 第一部分 (Data frames in R - Transforming data PART I) 4 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字