## 字幕列表 影片播放

• Hi, I'm Adriene Hill, and Welcome back to Crash Course Statistics.

• Last week I ordered a pair of gold lame pants with DFTBAQ embroidered on them.

• The delivery guy said they could come by the next day at exactly 11am on the dot!

• Just kidding. That never happens.

• Instead of an exact time, the pants guy gave me a range of times...he said they'd be

• there sometime between 8am and 2pm.

• A lot of anticipation

• We've focused a lot on point estimates, like the mean, which are our best guesses,

• but we can give ourselves a little more wiggle room.

• Let's talk about Confidence Intervals.

• INTRO

• It's useful to give pregnant mothers a “due datewhen their children will most likely be born.

• But it might be more accurate to say that doctors expect the baby to come around the

• due date, not exactly on it.

• And...when pollsters claim that a candidate will get around 30% of the vote, plus or minus 2%.

• We can represent thearoundpart with a confidence interval.

• You may have seen the termconfidence intervalpaired with a percentage like 95%.

• A “confidence intervalis an estimated range of values that seem reasonable based

• on what we've observed.

• It's center is still the sample mean, but we've got some room on either side for our uncertainty.

• So when the delivery guy says my pants are coming between 8 and 2--he's reflecting

• his uncertainty...the very LARGE frustrating uncertainty, about when he'll be there.

• For example, a dentist thinks the mean number of cavities the average person has in a 5

• year span is greater than 1 and wants to calculate a 95% CI to see if there's evidence that he's right.

• He rounds up a random sample of 100 patients from around the country, and finds that this

• group has a mean of 3 cavities with a standard deviation of 0.5 cavities.

• The way we choose that confidence range is related to the distribution of sample means.

• The dentist's estimate of the sampling distribution looks like this:

• And instead of grabbing just the mean, the dentist can include a range of the most common

• 95% of the sample means that we expect from this estimate of the distribution of sample means.

• So now we have a 95% confidence interval from 2.902 to 3.098 cavities.

• Giving a range of numbers instead of just an estimate for the mean better represents

• the fact that there's some uncertainty and variation when we estimate population parameters--like

• the mean, proportion, or regression slope--from a sample.

• The interpretation of this confidence interval is a bit more complex.

• To understand what a confidence interval really is, we have to ask ourselveswhat if?”.

• If the dentist's sample was taken again, we wouldn't expect that the mean and standard

• deviation of cavities would be exactly 3 and 0.5.

• They'd probably be a little different.

• Which means that our 95% confidence interval would be different than the one we got before.

• And if we did it 100 more times with the same sample size, we'd get 100 slightly different

• confidence intervals.

• The 95% in a 95% confidence interval tells us that if we calculated a confidence interval

• from 100 different samples, about 95 of them would contain the true population mean.

• Ourconfidenceis in the fact that the procedure of calculating this confidence interval

• will only exclude the population mean 5% of the time.

• That definition implies that it's possible that the confidence interval that we created

• doesn't include the true population mean.

• We have no way of knowing for sure.

• But the confidence intervals usually contain the true population mean.

• Now that we know what a confidence interval is, it might be useful to calculate it.

• A 95% CI is the range that contains the middle 95% of the values of our estimated sampling distribution.

• And to get that range, we can use a z-score.

• A z-score tells us the distance between the mean of a distribution and a data point in

• standard deviations.

• Previously, we've used z-scores to help us find percentiles.

• And we want the middle 95% of the data.

• So we want our cutoffs to be at the 2.5th percentile and the 97.5th percentile so that

• 95% of the values are within our range, and 5%--2.5% on either side--are not.

• To calculate the 95% confidence interval for a sample of 49 chocolate cakes with a mean

• of 3,000 calories and a standard deviation of 500 calories, we can use a z-score of 1.96

• (which we got from a table) to calculate the 97.5th percentile, and a z-score of -1.96

• to calculate the 2.5th percentile.

• But we need to turn our z-scores back into calorie values.

• To do so, we multiply by the standard error, 71.4 calories and add the mean of 3,000 calories

• to get the 95% confidence interval for our sample.

• We think it's likely that the real population mean for number of calories in a chocolate

• cake is in that range, though we're not sure.

• What we can have confidence in, is that if we're in a situation where we're constantly

• taking samples like this and we assume that the true mean is inside of every Confidence

• Interval, we'll only be wrong 5% of the time.

• For example, a gummy worm factory periodically checks whether their bagging machines are

• calibrated correctly.

• So each week, they take a sample of 100 bags of gummy worms, measure the mean weight and

• standard deviation, and calculate a 95% confidence interval.

• They use the Confidence interval to make a decision about whether to pay an expensive

• repair man to come repair the gummy worm bagging machine.

• They want their bags of gummy worms to have around 10oz of gummy treats, and decide that

• as long as the confidence interval contains 10oz--their ideal weight--they'll assume

• their machine is fine.

• Decisions based on their confidence intervals will lead them to call an unnecessary repairman

• only 5% of the time.

• Many researchers use confidence intervals to see if they contain a certain value of interest.

• A researcher may want to know if say a certain number of calories in cake is plausible.

• If the sampled value were to fall within their CI it would seem possible, but it's not

• possible to rule out even if it's outside the interval.

• Because you don't know if you got the 95% of CI's that contain the true mean or the

• 5% that don't.

• You don't always need to use a confidence interval of 95%, we can calculate other confidence intervals too.

• You can calculate a 99% confidence interval, or really any percentage confidence interval.

• But if you try to calculate a 100% confidence interval, it'll always be negative infinity

• to positive infinity, which just shows that the larger you want your confidence percentage

• to be, the wider your interval will be.

• You can be more hopeful that your confidence interval contains the true population mean,

• but it's not going to be that helpful.

• So there's a balancing act going on.

• You want a confidence interval that's narrow enough to be useful, but wide enough that

• the true population mean will usually be inside a confidence interval of that percent.

• We can't always have large samples.

• It's often the case that there's not enough time or money to collect 100s of data points

• to calculate a confidence interval.

• With small sample sizes, the distribution of sample means isn't always exactly normal,

• so we often use a t-distribution instead of a z-distribution to find out where the middle

• 95% of our data is.

• The t-distribution, like the z-distribution, is a continuous probability distribution that's unimodal.

• It's a useful way to represent sampling distributions.

• The t-distribution changes its shape according to how much information there is.

• With small sample sizes there's less information so the t-distribution has thicker tails to

• represent that our estimates are more uncertain when there's not much data.

• However as we get more and more data, the t-distribution becomes identical to the z-distribution.

• Generally, sample sizes that are greater than 30 are consideredlarge enoughbecause

• scientists generally believe that sampling distributions where the sample is 30+ are

• close enough to normal...though 30 is an arbitrary cutoff just like 0.05.

• However, when we're estimating population proportions, like the proportion of people

• who are color blind, the general rule is that your sample size need to be big enough so

• that on average, you'd expect to get at least 10 colorblind, and at least 10 non-colorblind people.

• For similar reasons, most people consider thatclose enough”.

• Since about 8% of males are colorblind, if I only had a sample of 50 males, on average

• I'd expect around 4 males per group to be color blind, so my sample size wouldn't

• be quite big enough to assume it's normal.

• Instead I'd use the almost normal t-distribution.

• If a drug that's being developed claimed to reduce the proportion of colorblind males

• born to mothers who took it, we could take a sample of 50 male infants to see if the

• proportion of colorblindness is different from 8%.

• Though colorblindness isn't usually life threatening, it can be inconvenient, so you

• decide to calculate a confidence interval to see if it's likely to be effective.

• After randomly selecting 50 male infants from mothers who took the drug, you calculate the

• sample proportion of colorblind infants, which is 6%, and calculate the distribution of sample

• proportions which has a mean of 6%--the same as the sample mean--and a standard error of 0.033.

• Since our sample size isn't big enough to assume that the distribution of sample proportions

• is shaped like the z-distribution, we can use the t-distribution to calculate the range

• of our 95% confidence interval.

• I mentioned before that the t-distribution's shape changes with how much data we have.

• We'll talk more in detail later as to how to choose the right t-distribution, but for

• now, we'll use this one:

• While t-score tables do exist, it's often easier to have a statistical program calculate

• the t-values that correspond to the 2.5th and 97.5th percentiles, since there are many

• different t-distributions.

• Your computer tells you that the t-values corresponding to those percentiles are 2.01 and -2.01.

• And to convert to a raw score from a t-score, we again use this formula, just with a t-score

• Our confidence interval for proportion of colorblind males is -0.6% to 12.63%.

• 8% is inside our confidence interval, so it's not too much of a stretch to think that 8%

• could be the true population proportion, even though we only observed a sample proportion of 6%.

• Based on this confidence interval we don't have any evidence to conclude whether this

• medicine is effective or not.

• So since the company researching the drug is pretty cautious, they decide not to go

• One place you may have seen confidence intervalsin the wildis in the news during election season.

• When newscasters report results from exit polls they'll usually say something like

• Candidate A is tracking at 64%, with a margin of error of 3 %” or you may see a

• chart like this:

• The margin or error is usually telling you how far the bounds of the confidence interval

• are from the mean, and is represented by this part of the confidence interval formula:

• The margin of error, just like a confidence interval, reflects the uncertainty that surrounds

• sample estimates of parameters like the mean or a proportion.

• If a poll shows that a Presidential candidate is tracking at of 64% of the vote, plus or

• minus 3%, we shouldn't be surprised if it turns out that the true vote was 61%, since

• that's within the margin or error.

• You can think of values inside the margin of error or confidence interval as values

• that might be reasonable estimates of the true population parameter.

• Confidence intervals quantify our uncertainty.

• They also demonstrate the tradeoff of accuracy for precision.

• A 100% confidence interval will always contain the true population mean, but it's useless.

• We have to sacrifice a little bit of accuracy in order to gain more precision.

• A 99% confidence interval will give us a more useful range since it won't be infinitely

• long..., but It's now possible that our confidence interval won't contain the true mean.

• Say you're running a marathon (like everybody does) and you want to load up your iPhone

• with music, but you don't know how long you're going to take, you could buy 150

• songs on iTunes, which is expensive, or you could buy only 70 and have a chance of running

• out of music.

• You increase your risk of not having enough, but then again you're saving yourself from

• having to buy 80 extra songs

• Maybe it's time for a streaming service?

• Confidence intervals demonstrate this delicate balancing act... and help us understand how

• to hit the sweet spot of information vs. accuracy.

• Thanks for watching, I'll see you next time in my gold lame pants.

Hi, I'm Adriene Hill, and Welcome back to Crash Course Statistics.

B1 中級

# 信心區間。速成班統計數字#20 (Confidence Intervals: Crash Course Statistics #20)

• 0 0
林宜悉 發佈於 2021 年 01 月 14 日