Placeholder Image

字幕列表 影片播放

  • Hi, I'm Adriene Hill, and Welcome back to Crash Course Statistics. Sometimes random

  • variation can make it tricky to tell when there are true differences or if it's just random.

  • Like whether a sample difference of $20 a

  • month represents a real difference between the average rates of two car insurance companies.

  • Or whether a 1 point increase in your AP Stats grade for every hour you study represents

  • a real relationship between the two.

  • These situations seem pretty different, but when we get down to it, they share a similar

  • pattern. There's actually one idea, which--with a few tweaks--can help us answer ALL of our

  • is it random...or is it realquestions.

  • That's what test statistics do. Test statistics allow us to quantify how close things are

  • to our expectations or theories. Something that's not always easy for us to do based

  • on our gut feelings. And test statistics allow us to add a little more mathematical rigor

  • to the process, so that we can make decisions about these questions.

  • INTRO

  • In previous episodes, z-scores helped us understand the idea that differences are relative.

  • A difference of 1 second is meaningful when you are looking at the differences in

  • the average time it takes two groups of elite Olympic athletes to complete a 100 meter freestyle swim.

  • It's less meaningful when you're looking at the differences in the average

  • time it takes two groups of recreational swimmers.

  • The amount of variance in a group is really important in judging a difference. Elite Olympic

  • athletes vary only a little. Their 100 meter times are relatively close together, and a

  • 10th of a second can mean the difference between a gold and a bronze medal. Whereas non professionals

  • have more variation; the fastest swimmers could finish a whole minute before the slower

  • swimmers.

  • A difference of 1 second isn't a big deal between two groups of recreational swimmers

  • because the difference is small compared to the natural variation we'd expect to see.

  • Two groups of casual swimmers may differ by 10 or more seconds, even if their true underlying

  • times were the same, just because of random variation.

  • That's why test statistics look at the difference between data and what we'd expect to see

  • if the null hypothesis is true. But they also include some very important context: a measure

  • ofaveragevariation we'd expect to see, like how much novice or pro swimmers

  • differ. Test statistics help us quantify whether data fits our null hypothesis well.

  • A z-score is a test statistic. Let's look at a simple example. Say your IQ is 130. You're

  • so smart. And the population mean is 100.

  • On average we expect someone to be about 15 points from the mean. So the difference we

  • observed, 30, is twice the amount that we'd expect to see on average. Your z score would be 2.

  • And you can z-score any normal distribution--like a population distribution. But also a sampling

  • distribution which is the distribution of all possible group means for a certain sample size.

  • You might remember we first learned about sampling distribution in episode 19.

  • We often have questions about groups of people. Finding out that you're two standard deviations

  • above the mean for IQ is pretty ego boosting, but it won't really help further science.

  • We could look at whether children with more than 100 books in their home have a higher

  • than average IQs. Let's say we take a random sample of 25 children with over 100 books.

  • Then we measure their IQs. The average IQ is 110.

  • We can calculate a z-score for our particular group mean. The steps are exactly the same,

  • we're just now looking at the sampling distribution of sample means rather than the population distribution.

  • Instead of taking an individual score and subtracting the population mean, we take a

  • group mean and subtract the mean of our sampling distribution under the null hypothesis. Then

  • we divide by the standard error, which is the standard deviation of the sampling distribution.

  • So, the z-score--also called the z-statistic--tells us how many standard errors away from the

  • sampling distribution mean our group mean is.

  • Z-statistics around 1 or -1 tell us that the sample mean is the typical distance we'd

  • expect a typical sample mean to be from the mean of the null hypothesis.

  • Z-statistics that are a lot bigger in magnitude than 1 or -1 mean that this sample mean is

  • more extreme.

  • Which matches the general form of a test statistic:

  • The p-value will tell us how rare or extreme our data is so that we can figure out whether

  • we think there's an effect. Like whether children with more than 100 books in their

  • home have a higher than average IQ. Historically we've done this with tables, but most statistical

  • programs, even Excel, can calculate this.

  • We can use z-tests to do hypothesis tests about means, differences between means, proportions,

  • or even differences between proportions.

  • A researcher may want to know whether people in a certain region who got this year's

  • flu vaccine were less likely to get the flu. They randomly sample 1000 people and found

  • that 600 people got the flu vaccine, and 400 didn't.

  • Out of the 600 people who got the vaccine, 20% still got the flu. Out of the 400 people

  • who did not get the vaccine, 26% got the flu.

  • It seems like you're more likely to get the flu if you didn't get a flu shot, but

  • we're not sure if this difference is pretty small compared to random variation, or pretty large.

  • To calculate our z-statistic for this question,

  • we first have to remember our general form:

  • There's a 6% difference between the proportion of the vaccinated and unvaccinated groups,

  • and we want to know howdifferent” 6% is from 0%.

  • A difference of 0% would mean there's no difference between flu rates between the two groups.

  • So our observed difference is 6 minus 0 percent, or 6%.

  • For this question, theaverage variationof what percent of people get the flu is the

  • standard error from our sampling distribution. We calculate it using the average proportion

  • of people who got the flu, and didn't get the flu:

  • If our observed difference of 6% is large compared to the standard error--which is the

  • amount of variation we expect by chance--we consider the difference to bestatistically

  • significant”. We've found evidence suggesting the null might not be accurate.

  • There's two main ways of telling whether this z-statistic, which is about 2.2295 in

  • our case, represents a statistically significant result.

  • The first way is to calculate a “criticalvalue. A critical value is a value of our

  • test statistic that marks the limits of ourextremevalues. A test statistic that

  • is more extreme than these critical values (that is it's towards the tails) causes

  • us to reject the null .

  • We calculate our critical value by finding out which test-statistic value corresponds

  • to the top 0.5, 1, or 5% most extreme values. For a z-test with alpha = 0.05, the critical

  • values are 1.96 and -1.96.

  • If your z-statistic is more extreme than the critical value, you call itstatistically

  • significant”. So, we found evidence...in this case...that the flu shot is working.

  • But sometimes, a z-test won't apply. And when that happens, we can use the t-distribution

  • and corresponding t-statistic to conduct a hypothesis test.

  • The t-test is just like our z-test. It uses the same general formula for its t-statistic.

  • But we use a t-test if we don't know the true population standard deviation.

  • As you can see, it looks like our z-statistic, except that we're using our sample standard

  • deviation instead of the population standard deviation in the denominator.

  • The t-distribution looks like the z-distribution, but with thicker tails. The tails are thicker

  • because we're estimating the true population standard deviation.

  • Estimation adds a little more uncertainty ...which means thicker tails, since extreme

  • values are a little more common. But as we get more and more data, the t-distribution

  • converges to the z-distribution, so with really large samples, the z and t-tests should give

  • us similar p-values.

  • If we're ever in a situation where we had the population standard deviation, a z-test

  • is the way to go. But a t-test is useful when we don't have that information.

  • For example, we can use a t-test to ask whether the average wait time at a car repair shop

  • across the street is different from the time you'll wait at a larger shop 10 minutes away.

  • We collect data from 50 customers who need to take their cars in for major repairs. 25

  • are randomly assigned to go to the smaller repair shop, and the other 25 are sent to

  • the larger shop.

  • After measuring the amount of time it took for repairs to be completed, we find that

  • people who went to the smaller shop had an average wait time of 14 days. People who went

  • to the larger shop had an average wait time of 13.25 days, which means there was a difference

  • of 0.75 days in wait time.

  • But we don't know whether it's likely that this 0.75 day difference is just due

  • to random variation between customers....at least not until we conduct a t-test on the

  • difference between the means of the two groups.

  • Before we do our test, we need to decide on an alpha level. We set our alpha at 0.01,

  • because we want to be a bit more cautious about rejecting the null hypothesis than we

  • would be if we used the standard of 0.05.

  • Now we can calculate the t-statistic for our two-sample t-test. If the null hypothesis

  • was true, then there would be no real difference between the mean wait times of the two groups.

  • And the alternative hypothesis is that the two means are not equal.

  • The two sample t-statistic again follows the general form:

  • We observed a 0.75 day difference in wait times between groups. We'd expect to see

  • a difference of 0 if the null were true. Our measure of average variation is the standard error.

  • The standard error is the typical distance that a sample mean will be from the population mean.

  • This time, we're looking at the sampling distribution of differences between means--all

  • the possible differences between two groups-- which is why the standard error formula may

  • look a little different.

  • Putting it all together we get a t-statistic of about 2.65.

  • If we plug that into our computer, we can see that this test statistic has a p-value

  • of about .0108. Since we set our alpha at 0.01, a p-value needs to be smaller than 0.01

  • to reject the null hypothesis. Ours isn't. Barely, but it isn't.

  • So it might have seemed like the larger repair shop was definitely going to be faster but

  • it's actually not so clear. And this doesn't mean that there isn't a difference, we just

  • couldn't find any evidence that there was one.

  • So if you're trying to decide which shop to take you car to, maybe consider something

  • other than speed. And we could do similar test experiments for cost or reliability or friendliness.

  • You might notice that throughout the examples in this episode, we used two methods of deciding

  • whether something was significant: critical values and p-values.

  • These two methods are equivalent. Large test statistics and small p-values both refer to

  • samples that are extreme. A test statistic that's bigger than our

  • critical value would allow us to reject the null hypothesis. And any test-statistic that's

  • larger than the critical value will have a p-value less than 0.05. So, the two methods

  • will lead us to the same conclusion.

  • If you have trouble remembering it, this rhyme may help: “Reject H-Oh if the p is too low

  • These two methods are equivalent. But we often use p-values instead of critical values. This

  • is because each test-statistic, like the z or t statistics, have different critical values,

  • but a p-value of less than 0.05 means that your sample is in the top 5% of extreme samples

  • no matter if you use a z or t test-statistic - or some of the other test-statistic we haven't

  • discussed like F or chi-square.

  • Test statistics form the basis of how we can test if things are actually different or what

  • we seeing is just normal variation. They help us know how likely it is that our results

  • are normal, or if something interesting is going on.

  • Like whether drinking that water upside down is actually stopping your hiccups faster

  • than doing nothing. Then you can test drinking pickle juice to stop hiccups. Or really slowly

  • eating a spoonful of creamy peanut butter. Let the testing commence! Thanks for watching.

  • I'll see you next time.

Hi, I'm Adriene Hill, and Welcome back to Crash Course Statistics. Sometimes random

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級 美國腔

測試統計。速成班統計26號 (Test Statistics: Crash Course Statistics #26)

  • 19 1
    Lun 發佈於 2021 年 01 月 14 日
影片單字