字幕列表 影片播放
It is the hot topic for data journalists in this election.
Some call it MRP, some Mr P. But the full name
is multi-level regression with post-stratification.
So what is it?
In short, it's a way of using a big national poll
to estimate how people will vote at constituency level.
National polls of 1,000 people are
good at telling us what share of the national vote
each party will get, but not so good at predicting who
will win each of the 650 seats.
And in the UK system it's seats, not votes, that counts.
For example, in 2015, the Conservatives
won 37 per cent of votes and 51 per cent of seats.
Ukip won 13 per cent of the vote and less than 1
per cent of seats.
Traditionally, pollsters have tried
to get around this using a method
like uniform national swing.
If Labour was down 11 percentage points on the last general
election nationally, the pollsters would subtract 11
percentage points from their vote share in every seat.
But that can't capture all the electoral nuance, for example,
the influence of Leave and Remain in particular areas,
or big student votes in university towns.
So in comes MRP.
Step one, a large poll sample, tens of thousands
of people across the country.
That's because you want dozens of people in each constituency,
the more the better, to pick up on what
makes that seat different from the rest.
Step two.
Don't just ask them who they're voting for.
But who they are.
Age, sex, ethnicity, education level,
housing, occupation, how they voted in the EU referendum.
You'll also have gathered lots of local information
about their constituency, from which parties have historically
done well or poorly there, to what's
happened to house prices.
So you have data at the individual level,
but also the context of the wider geographical area.
That's why it's called multi-level.
You then run a regression on that data.
That's a statistical technique that
measures the probability of someone
with those combinations of personal and local
characteristics, A, voting at all, and B,
voting for a particular party.
So we've done MR. Then comes P, post-stratification.
This is where the modellers use data
from sources like the census and the annual population survey.
They can tally up the number of people
with each combination of these demographic and socio-economic
characteristics in every constituency
and then apply the voting probabilities from the MR step
onto the population data.
So you have an estimate for how a white British male who
left school at 16 is likely to vote, for example.
And that estimate will differ between Great Grimsby,
Northwest Durham, and Glasgow Central.
In fact, you have a series of estimates
for different demographic combinations
in different places.
So you then combine them to give you
the total number of votes each party is likely to secure
in every one of the 650 constituencies,
and that can be used to calculate which party
is most likely to win each seat, which is most likely to be
its closest challenger, how big the margin
between first and second place is likely to be, and so on.
This allows parties to better target their campaigning
resources on seats that are going to be close.
And it can help people like you make a more informed decision
on who to vote for tactically.
It gives the public a more nuanced picture
of how the election is likely to play out.
Now, it's not a perfect system.
This type of modelling is complex,
and there are many variables.
And the choices the modellers make mean one model
will have different outputs to the next.
But MRP is the most refined tool we have
until the votes are actually counted.