方差分析3：用F統計量進行假設檢驗｜概率與統計學｜可汗學院 (ANOVA 3: Hypothesis test with F-statistic | Probability and Statistics

字幕列表影片播放

In the last couple of videos we first figured out the TOTAL variation in these 9 data points right here
and we got 30, that's our Total Sum of Squares. Then we asked ourselves,
how much of that variation is due to variation WITHIN each of these groups, versus variation BETWEEN the groups themselves?
So, for the variation within the groups we have our Sum of Squares within.
And there we got 6.
And then the balance of this, 30, the balance of this variation,
came from variation between the groups, and we calculated it,
We got 24.
What I want to do in this video, is actually use this type of information,
essentially these statistics we've calculated, to do some inferential statistics,
to come to some time of conclusion, or maybe not to come to some type of conclusion.
What I want to do is to put some context around these groups.
We've been dealing with them abstractly right now, but you can imagine
these are the results of some type of experiment.
Let's say that I gave 3 different types of pills or 3 different types of food to people taking a test.
And these are the scores on the test.
So this is food 1, food 2, and then this over here is food 3.
And I want to figure out if the type of food people take going into the test really affect their scores?
If you look at these means, it looks like they perform best in group 3, than in group 2 or 1.
But is that difference purely random? Random chance?
Or can I be pretty confident that it's due to actual differences
in the population means, of all of the people who would ever take food 3 vs food 2 vs food 1?
So, my question here is, are the means and the true population means the same?
This is a sample mean based on 3 samples. But if I knew the true population means--
So my question is: Is the mean of the population of people taking Food 1 equal to the mean of Food 2?
Obviously I'll never be able to give that food to every human being that could
ever live and then make them all take an exam.
But there is some true mean there, it's just not really measurable.
So my question is "this" equal to "this" equal to the mean 3, the true population of mean 3.
And my question is, are these equal?
Because if they're not equal, that means that the type of food given does have some type of impact
on how people perform on a test.
So let's do a little hypothesis test here. Let's say that my null hypothesis
is that the means are the same. Food doesn't make a difference.
"food doesn't make a difference"
and that my Alternate hypothesis is that it does. "It does."
and the way of thinking about this quantitatively
is that if it doesn't make a difference,
the true population means of the groups will be the same.
The true population mean of the group that took food 1 will be the same
as the group that took food 2, which will be the same as the group that took food 3.
If our alternate hypothesis is correct, then these means will not be all the same.
How can we test this hypothesis?
So we're going to assume the null hypothesis, which is
what we always do when we are hypothesis testing,
we're going to assume our null hypothesis.
And then essentially figure out, what are the chances
of getting a certain statistic this extreme?
And I haven't even defined what that statistic is.
So we're going to define--we're going to assume our null hypothesis,
and then we're going to come up with a statistic called the F statistic.
So our F statistic
which has an F distribution--and we won't go real deep into the details of
the F distribution. But you can already start to think of it
as the ratio of two Chi-squared distributions that may or may not have different degrees of freedom.
Our F statistic is going to be the ratio of our Sum of Squares between the samples--
Sum of Squares between
divided by, our degrees of freedom between
and this is sometimes called the mean squares between, MSB,
that, divided by the Sum of Squares within,
so that's what I had done up here, the SSW in blue,
divided by the SSW
divided by the degrees of freedom of the SSwithin, and that was
m (n-1). Now let's just think about what this is doing right here.
If this number, the numerator, is much larger than the denominator,
then that tells us that the variation in this data is due mostly
to the differences between the actual means
and its due less to the variation within the means.
That's if this numerator is much bigger than this denominator over here.
So that should make us believe that there is a difference
in the true population mean.
So if this number is really big,
it should tell us that there is a lower probability
that our null hypothesis is correct.
If this number is really small and our denominator is larger,
that means that our variation within each sample,
makes up more of the total variation than our variation between
the samples. So that means that our variation
within each of these samples is a bigger percentage of the total variation
versus the variation between the samples.
So that would make us believe that "hey! ya know... any difference
we see between the means is probably just random."
And that would make it a little harder to reject the null.
So let's actually calculate it.
So in this case, our SSbetween, we calculated over here, was 24.
and we had 2 degrees of freedom.
And our SSwithin was 6 and we had how many degrees of freedom?
Also, 6. 6 degrees of freedom.
So this is going to be 24/2 which is 12, divided by 1.
Our F statistic that we've calculated is going to be 12.
F stands for Fischer who is the biologist and statistician who came up with this.
So our F statistic is going to be 12.
We're going to see that this is a pretty high number.
Now, one thing I forgot to mention, with any hypothesis test,
we're going to need some type of significance level.
So let's say the significance level that we care about,
for our hypothesis test, is 10%.
0.10 -- which means
that if we assume the null hypothesis, there is
less than a 10% chance of getting the result we got,
of getting this F statistic,
then we will reject the null hypothesis.
So what we want to do is figure out a critical F statistic value,
that getting that extreme of a value or greater, is 10%
and if this is bigger than our critical F statistic value,
then we're going to reject the null hypothesis,
if it's less, we can't reject the null.
So I'm not going to go into a lot of the guts of the F statistic,
but we can already appreciate that each of these Sum of squares
has a Chi-squared distribution. "This" has a Chi-squared distribution,
and "this" has a different Chi-squared distribution
This is a Chi-squared distribution with 2 degrees of freedom,
this is a Chi-squared distribution with--And we haven't normalized it and all of that--
but roughly a Chi squared distribution with 6 degrees of freedom.
So the F distribution is actually the ratio of two Chi-squared distributions
And I got this--this is a screenshot from a professor's course at UCLA,
I hope they don't mind, I need to find us an F table for us to look into.
But this is what an F distribution looks like.
And obviously it's going to look different
depending on the df of the numerator and the denominator.
There's two df to think about,
the numerator degrees of freedom and the denominator degrees of freedom
With that said, let's calculate the critical F statistic,
for alpha is equal to 0.10,
and you're actually going to see different F tables for each different alpha,
where our numerator df is 2, and our denominator df is 6.
So this table that I got, this whole table is for an alpha of 10%
or 0.10, and our numerator df was 2 and our denominator
was 6. So our critical F value is 3.46.
So our critical F value is 3.46--this value right over here is 3.46
The value that we got based on our data is much larger than this,
WAY above it. It's going to have a very, very small p value.
The probability of getting something this extreme,
just by chance, assuming the null hypothesis,
is very low. It's way bigger than our critical F statistic with
a 10% significance level.
So because of that we can reject the null hypothesis.
Which leads us to believe, "you know what, there probably
IS a difference in the population means."
Which tells us there probably is a difference in performance
on an exam if you give them the different foods.

方差分析3：用F統計量進行假設檢驗｜概率與統計學｜可汗學院 (ANOVA 3: Hypothesis test with F-statistic | Probability and Statistics | Khan Academy)