Name: 擬合模型就像俄羅斯方塊：速成班統計學#35。 (Fitting Models Is like Tetris: Crash Course Statistics #35)
Uploaded: 2021-01-14T10:27:46.000Z
Duration: 11 min 9 s
Description: 【看影片學英語】數萬部 YouTube 影片，搭配英漢字典即點即查，輕鬆掌握單字發音與用法，長久累積看電影不必再看字幕。

Hi, I'm Adriene Hill, and welcome back to Crash Course Statistics.

General Linear Models -- like Regression and ANOVA -- let us create a statistical analysis

Fitting the right model to our experiments is kind of like Tetris…

Sometimes you need the skinny-long bric, called the straight sometimes you need the square

In stats, its similar sometimes you need regression sometimes ANOVA but there's also ANCOVA

Today we'll look at the shape of those models.

As a quick review, in a few of our past episodes we covered the fact that ANOVAs and regressions

ANOVAs allow us to analyze the effect of variables with two or more groups on continuous variables.

And regressions allow us to analyze two continuous variables.

General Linear Models explain the data we observe by building a model to predict that

data, and then keeping track of how close the prediction is.

And both regressions and ANOVAs use a similar model setup.

It looks just like the equation for a line that you may have seen if you've taken Algebra.

The fact that they're set up the exact same way is helpful for two reasons.

One, it means we only have to remember one general mode , and two it allows us to combine

these two powerful models to give us the even more flexible ANCOVA.

For example, we might want to look at the amount of general anesthesia needed to put

There have been studies that suggest that redheads require more anesthesia than non-redheads

because the gene mutation that causes red hair, also affects pain receptors.

So we have two groups: redheads and non-redheads.

But, we also think that weight will have a meaningful impact on the amount of this specific

To make sure things are relatively equal, we look at only one kind of simple, routine

Working with a hospital, we collect data on 100 randomly selected patients.

We record their weight, natural hair color, and the amount of anesthesia needed during

We can now build a model to predict milliliters of anesthesia based on hair color and weight.

Just like its friends, regression and ANOVA, the ANCOVA looks at the overall variation

in the data, and uses different variables, like hair color and weight, to explain it.

The overall variation is, as always, measured by the sum of the squared distances between

the overall mean amount of anesthesia used, and each dose of anesthesia that was administered.

This variation is called the Sums of Squares total.

So now we can calculate an ANOVA table that shows us the sums of squares and F-tests for

Even though this is an ANCOVA model, we still usually refer to these as ANOVA tables.

And even though this table has both continuous regression factors and categorical ANOVA factors,

we read it just like it's a regular ANOVA table.

Here we can see that weight is a significant predictor of how much anesthesia you'll

need, but hair color isn't .it's really tempting to call hair color “nearly significant”

We now have a tool that allows us to combine categorical and continuous variables into

We can predict all kinds of things with all kinds of variables.

We can also use our new ANCOVA models to make stronger inferences.

In our example,we were interested, mainly, in whether being a redhead significantly increased

But we also included weight in the model, since we knew that weight plays a pretty big

Weight accounted for a lot of the variation in the model.

Its eta squared is 0.353, which means that it accounts for about 35% of the variation

And since it “soaked up” all of that variation, our Sums of Squares Error is now smaller.

If we had run a simple ANOVA with JUST hair color, the differences between anesthetic

doses due to weight would have just been chalked up to “random variation”, or error because

it's source--weight--wasn't in our model.

For both of these models, the simple case where we ONLY look at hair color, and the

more complex case where we look at both hair color and weight, the total variation in the

Total variation looks only at our outcome variable--like milliliters of anesthetic.

So, when we build our models, we're partitioning the same amount of variation into groups.

Our simple ANOVA model JUST looks at how much of this total variation is due to being or

The rest is counted as error, just because “error” refers to variation that our model

When we use the bigger model that includes both hair color and weight, we take some of

that variation that was attributed to error, and attribute it to weight instead.

This makes our pile of error variation smaller.

For this reason, many researchers will add covariates--continuous variables that are

used to explain our outcome variable--not only for inference, but also to reduce the

Say we want to look at the effect of a new brand of formula on the weight of infants.

We have two randomly assigned groups of infants: those with our new formula and those who get

But infants grow very quickly, so we want to account for any variation due to age, so

If we just ran a model that included formula type, our Sums of Squares for Error is pretty big.

And formula doesn't have a significant effect on infants' weight.

But we know that infants weights are strongly correlated with how old they are, so when

we include that in a new ANCOVA model, it takes some of the variation that was error

variation in our simple model, and accounts for it using age in days.

As you can see from this ANOVA table, adding age as a covariate allowed us to explain some

字幕列表影片播放

擬合模型就像俄羅斯方塊：速成班統計學#35。 (Fitting Models Is like Tetris: Crash Course Statistics #35)

specific

individual

significant

multiple