Placeholder Image

字幕列表 影片播放

  • In the last video, we were able to calculate

  • the total sum of squares for these nine data points

  • right here.

  • And these nine data points are grouped

  • into three different groups, or if we want to speak generally,

  • into m different groups.

  • What I want to do in this video is

  • to figure out how much of this total sum of squares

  • is due to variation within each group versus variation

  • between the actual groups.

  • So first, let's figure out the total variation

  • within the group.

  • So let's call that the sum of squares within.

  • So let's calculate the sum of squares within.

  • I'll do that in yellow.

  • Actually, I already used yellow, so let me do blue.

  • So the sum of squares within.

  • Let me make it clear.

  • That stands for within.

  • So we want to see how much of the variation

  • is due to how far each of these data points

  • are from their central tendency, from their respective mean.

  • So this is going to be equal to-- let's

  • start with these guys.

  • So instead of taking the distance between each data

  • point and the mean of means, I'm going

  • to find the distance between each data

  • point and that group's mean, because we

  • want to square the total sum of squares between each data

  • point and their respective mean.

  • So let's do that.

  • So it's 3 minus-- the mean here is 2-- squared, plus 2 minus 2

  • squared, plus 2 minus 2 squared, plus 1 minus 2 squared.

  • 1 minus 2 squared plus-- I'm going

  • to do this for all of the groups,

  • but for each group, the distance between each data point

  • and its mean.

  • So plus 5 minus 4, plus 5 minus 4 squared,

  • plus 4 minus 4 squared-- sorry, the next point was

  • 3-- plus 3 minus 4 squared, plus 4 minus 4 squared.

  • And then finally, we have the third group.

  • But we're finding that all of the sum of squares

  • from each point to its central tendency within that,

  • but we're going to add them all up.

  • And then we find the third group.

  • So we have 5 minus-- oh, its mean is 6-- 5 minus 6 squared,

  • plus 6 minus 6 squared, plus 7 minus 6 squared.

  • And what is this going to equal?

  • So this is going to be equal to-- up here,

  • it's going to be 1 plus 0 plus 1.

  • So that's going to be equal to 2 plus.

  • And then this is going to be equal to 1, 1 plus 1 plus 0--

  • so another 2-- plus this is going

  • to be equal to 1 plus 0 plus 1.

  • 7 minus 6 is 1 squared is 1.

  • So plus.

  • So that's 2 over here.

  • So this is going to be equal to our sum of squares

  • within, I should say, is 6.

  • So one way to think about it-- our total variation was 30.

  • And based on this calculation, 6 of that 30

  • comes from a variation within these samples.

  • Now, the next thing I want to think about

  • is how many degrees of freedom do we have in this calculation?

  • How many independent data points do we actually have?

  • Well, for each of these-- so over here,

  • we have n data points in one.

  • In particular, n is 3 here.

  • But if you know n minus 1 of them,

  • you can always figure out the nth one

  • if you know the actual sample mean.

  • So in this case, for any of these groups,

  • if you know two of these data points,

  • you can always figure out the third.

  • If you know these two, you can always figure out the third

  • if you know the sample mean.

  • So in general, let's figure out the degrees of freedom here.

  • For each group, when you did this,

  • you had n minus 1 degrees of freedom.

  • Remember, n is the number of data points

  • you had in each group.

  • So you have n minus 1 degrees of freedom

  • for each of these groups.

  • So it's n minus 1, n minus 1, n minus 1.

  • Or let me put it this way-- you have n minus 1

  • for each of these groups, and there are m groups.

  • So there's m times n minus 1 degrees of freedom.

  • And in this case in particular, each group-- n minus 1 is 2.

  • Or in each case, you had 2 degrees of freedom,

  • and there's three groups of that.

  • So there are 6 degrees of freedom.

  • And in the future, we might do a more detailed discussion

  • of what degrees of freedom mean, and how

  • to mathematically think about it.

  • But the best-- the simplest way to think about it

  • is really, truly independent data

  • points, assuming you knew, in this case,

  • the central statistic that we used

  • to calculate the squared distance in each of them.

  • If you know them already, the third data point

  • could actually be calculated from the other two.

  • So we have 6 degrees of freedom over here.

  • Now, that was how much of the total variation

  • is due to variation within each sample.

  • Now let's think about how much of the variation

  • is due to variation between the samples.

  • And to do that, we're going to calculate.

  • Let me get a nice color here.

  • I think I've run out all the colors.

  • We'll call this sum of squares between.

  • The B stands for between.

  • So another way to think about it--

  • how much of this total variation is

  • due to the variation between the means,

  • between the central tendency-- that's

  • what we're going to calculate right now--

  • and how much is due to variation from each data

  • point to its mean?

  • So let's figure out how much is due to variation

  • between these guys over here.

  • Actually, let's think about just this first group.

  • For this first group, how much variation

  • for each of these guys is due to the variation between this mean

  • and the mean of means?

  • Well, so for this first guy up here--

  • I'll just write it all out explicitly--

  • the variation is going to be its sample mean.

  • So it's going to be 2 minus the mean of means squared.

  • And then for this guy, it's going

  • to be the same thing-- his sample mean,

  • 2 minus the mean of mean squared.

  • Plus same thing for this guy, 2 minus the mean of mean squared.

  • Or another way to think about it--

  • this is equal to-- I'll write it over here--

  • this is equal to 3 times 2 minus 4 squared,

  • which is the same thing as 3.

  • This is equal to 3 times 4.

  • Three times 4 is equal to 12.

  • And then we could do it for each of them.

  • And actually, I want to find the total sum.

  • So let me just write it all out, actually.

  • I think that might be an easier thing

  • to do, because I want to find, for all of these guys combined,

  • the sum of squares due to the differences

  • between the samples.

  • So that's from the contribution from the first sample.

  • And then from the second sample, you have this guy over here.

  • Oh, sorry.

  • You don't want to calculate him.

  • For this data point, the amount of variation

  • due to the difference between the means

  • is going to be 4 minus 4 squared.

  • Same thing for this guy.

  • It's going to be 4 minus 4 squared.

  • And we're not taking it into consideration.

  • We're only taking its sample mean into consideration.

  • And then finally, plus 4 minus 4 squared.

  • We're taking this minus this squared

  • for each of these data points.

  • And then finally, we'll do that with the last group.

  • With the last group, sample mean is 6.

  • So it's going to be 6 minus 4 squared,

  • plus 6 minus 4 squared, plus 6 minus 4, plus 6 minus 4

  • squared.

  • Now, let's think about how many degrees of freedom

  • we had in this calculation right over here.

  • Well, in general, I guess the easiest way

  • to think about is, how much information did we

  • have, assuming that we knew the mean of means?

  • If we know the mean of means, how much

  • here is new information?

  • Well, if you know the mean of the mean,

  • and you know two of these sample means,

  • you can always figure out the third.

  • If you know this one and this one,

  • you can figure out that one.

  • And if you know that one and that one,

  • you can figure out that one.

  • And that's because this is the mean of these means over here.

  • So in general, if you have m groups, or if you have m means,

  • there are m minus 1 degrees of freedom here.

  • Let me write that.

  • But with that said, well, and in this case, m is 3.

  • So we could say there's two degrees of freedom

  • for this exact example.

  • Let's actually, let's calculate the sum of squares between.

  • So what is this going to be?

  • I'll just scroll down.

  • Running out of space.

  • This is going to be equal to-- this right here is 2 minus 4

  • is negative 2 squared is 4.

  • And then we have three 4's over here.

  • So it's 3 times 4, plus 3 times-- what is this?

  • 3 times 0 plus-- what is this?

  • The difference between each of these-- 6 minus 4

  • is 2 squared is 4-- so that means we have 3 times

  • 4, plus 3 times 4.

  • And we get 3 times 4 is 12, plus 0, plus 12 is equal to 24.

  • So the sum of squares, or we could

  • say, the variation due to what's the difference

  • between the groups, between the means, is 24.

  • Now, let's put it all together.

  • We said that the total variation,

  • that if you looked at all 9 data points, is 30.

  • Let me write that over here.

  • So the total sum of squares is equal to 30.

  • We figured out the sum of squares

  • between each data point and its central tendency,

  • its sample mean-- we figured out,

  • and when you totaled it all up, we got 6.

  • So the sum of squares within was equal to 6.

  • And in this case, it was 6 degrees of freedom.

  • Or if we wanted to write it generally,

  • there were m times n minus 1 degrees of freedom.

  • And actually, for the total, we figured out

  • we have m times n minus 1 degrees of freedom.

  • Actually, let me just write degrees of freedom

  • in this column right over here.

  • In this case, the number turned out to be 8.

  • And then just now we calculated the sum

  • of squares between the samples.

  • The sum of squares between the samples is equal to 24.

  • And we figured out that it had m minus 1 degrees of freedom,

  • which ended up being 2.

  • Now, the interesting thing here-- and this

  • is why this analysis of variance all fits nicely together,

  • and in future videos we'll think about how we can actually

  • test hypotheses using some of the tools

  • that we're thinking about right now--

  • is that the sum of squares within,

  • plus the sum of squares between, is

  • equal to the total sum of squares.

  • So a way to think about is that the total variation

  • in this data right here can be described

  • as the sum of the variation within each

  • of these groups, when you take that total,

  • plus the sum of the variation between the groups.

  • And even the degrees of freedom work out.

  • The sum of squares between had 2 degrees of freedom.

  • The sum of squares within each of the groups

  • had 6 degrees of freedom.

  • 2 plus 6 is 8.

  • That's the total degrees of freedom

  • we had for all of the data combined.

  • It even works if you look at the more general.

  • So our sum of squares between had

  • m minus 1 degrees of freedom.

  • Our sum of squares within had m times n

  • minus 1 degrees of freedom.

  • So this is equal to m minus 1, plus mn minus m.

  • These guys cancel out.

  • This is equal to mn minus 1 degrees of freedom, which

  • is exactly the total degrees of freedom we

  • had for the total sum of squares.

  • So the whole point of the calculations

  • that we did in the last video and in this video

  • is just to appreciate that this total variation over here,

  • this total variation that we first calculated,

  • can be viewed as the sum of these two component

  • variations-- how much variation is there

  • within each of the samples plus how much variation is there

  • between the means of the samples?

  • Hopefully that's not too confusing.

In the last video, we were able to calculate

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

方差分析2:計算SSW和SSB(方差內和方差間的總和)|可汗學院 (ANOVA 2: Calculating SSW and SSB (total sum of squares within and between) | Khan Academy)

  • 22 4
    Jack 發佈於 2021 年 01 月 14 日
影片單字