當預測失敗。統計速成班#43 (When Predictions Fail: Crash Course Statistics #43)

字幕列表影片播放

Hi, I'm Adriene Hill, and Welcome back to Crash Course Statistics.
We've learned a lot about how statistics can help us understand the world better, make
better decisions, and guess what will happen in the future.
Prediction is a huge part of how modern statistical analysis is used, and it's helped us make
improvements to our lives.
Big AND small.
But predictions are just educated guesses.
We use the information that we have to build up a model of how the world works.
A lot of the example we talked about earlier in the series were making predictions about
the present.
Things like “which coffee shop has better coffee” or “How much does an increase
in cigarette smoking decrease heart health”.
But in this episode, we're going to focus more on using statistics to make predictions
about the future.
Like who will win the next world series, or what stock will do well next month.
Looking back at times when we've failed to make accurate predictions can help us understand
more about how to get it right or whether we just don't have enough information.
Today, we're going to talk about three areas of prediction: markets, Earthquakes, and Elections.
We'll look at why predicting these events can be tricky why we get it wrong.
INTRO
Banks were influential in creating the perfect storm that lead to the 2008 financial crisis.
If you've seen the Big Short, or read the book it's based on, you know that.
You also know that Steve Carell should never go blonde again.
The financial crisis is really complicated we're about to simplify a lot….but if
you are interested you can check out Episode 12 of our Economics series.
For now, we're going to focus on two prediction issues related to the crisis: 1.
overestimating the independence of loan failures and 2.
Economists who didn't see the crisis coming.
So before the crisis, banks were giving out mortgages to pretty-much anyone.
Normally, banks--and lenders in general--are choosy about who they lend to.
If you give someone a million dollar loan, and they can't pay it back, you lose out.
But banks weren't hanging on to the debt they were selling it to others.
They combined mortgages into groups and sold shares of the loans as mortgage backed securities.
The banks knew some people wouldn't pay their loan in full, but when the mortgages
were packaged together, the risk was supposedly mitigated.
Say that there's a 10% chance that each borrower will default on--or fail to repay--their
loan.
While not totally risky, it's not ideal for investors.
But if you packaged even 5 similar loans together, the probability that all of them will default
is now only 0.001%.
Because the probability of all of them failing--if each loan failing is independent of another
loan failing--is 0.1 to the 5th power.
But we just made a prediction mistake.
Many investors overestimated the independence of loan failures.
They didn't take into account that if the then-overvalued housing market and subsequently
the economy began to crumble, the probability of loans going unpaid would shoot way up.
They also had bad estimates for just how risky some of these loans were.
Families were losing homes, and the unemployment rate in the U.S. steadily increased from around
5% to as high as 10% in just a couple years.
There was a global recession that most economists' models hadn't predicted.
To this day, they're still debating exactly why Economist John T. Harvey claims, “Economics
is skewed towards rewarding people for building complex mathematical models, not for explaining
how the actual economy works.”
Others theorize that we need to focus more on people and their sometimes irrational behavior.
Wharton Finance professor Franklin Allen partly attributes our inability to predict the financial
crisis to models that underplayed the impact of banks.
The same banks that were involved in the lending practices that helped create--and then deflate--the
housing bubble.
He claims, “That's a large part of the issue.
They simply didn't believe the banks were important.”
But they were.
Prediction depends a lot on whether or not you have enough data available.
But it also depends on what your model deems as “important”.
You can collect a HUGE amount of data predicting the rates of diabetes in each country.
But if your model only considers hair color, whether or not a person drives a hybrid, and
the number of raccoons they think they can fight it probably won't be a good model.
When we create a model to predict things, we're trying to use data, math, and statistics
in order to approximate how the world works.
We're never going to get it perfect, but if we include most of the important things,
we can usually get pretty close.
Even if we can tell what features will be important, it might be hard to get enough
data.
Earthquakes are particularly difficult to predict.
The United States Geological Survey even has a webpage dedicated to telling the public
that currently, earthquakes just aren't predictable.
Clusters of smaller earthquakes often happen before larger ones.
But these pre-quakes aren't that helpful in predicting when a big earthquake will hit,
because they're almost just as often followed by NOTHING.
In order to accurately predict an earthquake you would need three pieces of information:
its location, magnitude, and time.
It can be relatively easy to get two out of three of those.
For example, I predict that there will be an earthquake in the future in Los Angeles,
California.
And I'd be right.
But unless I can also specify an exact time, no one's going to be handing me any honorary
degrees in seismology.
We're not bad at earthquake forecasting even if we struggle with accurate earthquake
prediction.
Earthquake forecasting focuses on the probabilities of earthquakes, usually over longer periods
of time.
It can also help predict likely effects and damage.
This forecasting work is incredibly important for mitigating the sometimes devastating effects
of larger earthquakes.
For example, scientists might look at the likelihood of severe earthquakes along the
San Andreas fault.
Their estimates can help inform building codes, disaster plans for the area, and even earthquake
insurance rates.
And earthquakes are not without some kind of pattern.
They do tend to occur in clusters, with aftershocks following quakes in a pretty predictable pattern.
But in his book The Signal and the Noise, Nate Silver warns about looking so hard at
the data, that we see noise--random variation with no pattern--as a signal.
The causes of earthquakes are incredibly complex.
And the truth is, we're not in a place where we can accurately predict when, where, and
how they'll occur.
Especially the larger, particularly destructive earthquakes.
To predict a magnitude 9 earthquake, we'd need to look at data on other similar earthquakes.
But there just isn't that much out there.
Realistically it could be centuries before we have enough to make solid predictions.
Even for more common magnitude earthquakes, it could a lot of data before we have enough
to see the pattern amidst all the randomness.
Some experts have written off the possibility of accurate earthquake prediction almost entirely,
but others hold on to the hope that with enough data and time we'll figure it out.
Speaking of earthquakes, the 2016 US presidential election results have been described as a
political earthquake.
Many experts didn't predict the election of President Donald Trump.
It's easy to forget that predictions are not certain.
If we could be 100% certain about anything, we wouldn't really need predictions.
In the past, we've talked about the fact that when predicting percentages, like how
many people will vote for one candidate vs. the other, there are margins of error.
If candidate A is predicted to get 54 +/- 2% of the vote, that means that experts predict
that candidate A will get 54% of the vote, but wouldn't be surprised by 52 or 55%.
These margins help communicate uncertainty.
But when predictions are discrete--like “will win” or “won't win”--it can be easier
to misunderstand this uncertainty.
It's possible for predictions to fail without models being bad.
Nate Silver discusses the fact that many predictions put Trump's chance of winning the 2016 presidential
election at about 1 in 100.
Silver's prediction on his website, FiveThirtyEight, put Trump at a much higher chance of about
3 in 10.
If you had forced statisticians to predict a winner, the smart choice according to these
numbers would have been Hillary Clinton.
But here's the problem: many people see 1 in 100 odds against an event, and take it
to mean that the event is essentially impossible.
By the numbers, a 1 in 100 chance--even though low-still says the event will happen 1 every
100 times.
There's been a lot of debate about how these polls and predictions “got it wrong”.
But one thing that we should take away from the election prediction is that low probabilities
don't equal impossible events.
If a meticulously curated prediction gives a 1 in 100 chance for a candidate to win,
and that candidate wins, it doesn't mean that the prediction was wrong.
Unlikely things do happen, and we need to take that into account.
But we still should keep striving to make our polls better.
Many who have done post-mortems on the 2016 election polls and predictions attribute some
blame to biases in the polls themselves.
According to the New York Times, “Well-educated voters are much likelier to take surveys than
less educated ones.”
That means we had a non-response bias from those less educated voters.
Because of that, Nate Silver argues that pollsters put too much emphasis on the responses of
college-educated voters, who were more likely to vote for Clinton.
By improperly weighting them, they overestimated her chance of winning.
Prediction isn't easy.
Well making bad predictions is easy.
I predict that at the end of this episode, Brandon will bring me 10 German Chocolate
Cakes and I will eating them with my raccoons.
But making good predictions is hard.
And even good predictions can be hard to interpret.
In order to make accurate predictions a lot of things need to go right.
First, we need good, accurate, and unbiased data.
And lots of it.
And second, we need a good model.
One that takes into account all the important variables.
There's a quote attributed Confucius that I'm not really sure he said that goes something
like: To know what you know and what you do not know, that is true knowledge.
For example, I know that I don't know that he said that, so I am quite knowledgeable.
There's great value in knowing what we can and can't predict.
While we shouldn't stop trying to make good predictions, there's wisdom in recognizing
that we won't always be able to get it right.
Knowing what we can't accurately predict may be just as important as making accurate predictions.
Thanks for watching. I'll see you next time.