Monday, April 2, 2012

Is polling accuracy getting worse this primary season?

My friend and longtime reader Ken made a comment on my post from a week ago about the margin of error, known technically as the 95% confidence interval. A shorter version of the post: It's nowhere near 95% accurate during the primary season, but it wasn't so bad during the 2008 general election.
Ken asked if the primary predictions from 2012 could be compared to the primary predictions from 2008. I found data online that makes an apt comparison.  
 Here's the records for the 2012 election predictions so far. The last column is what we expect all columns to look like, and as you see, none of the other columns have nearly enough of that yellow  representing the percentage of time the true result falls inside the margin of error.  Red means the poll's prediction was markedly higher than expected.  As you can see, Ron Paul gets over-predicted a lot. The purple says the poll guessed too low.  Rick Santorum is the king of exceeding expectations this year.

The 2008 race has a data set that has many features in common with 2012, though the time frame is different.  In 2008, Super Tuesday was in early February and there were over 20 contests, much earlier and much larger than Super Tuesday this time around.  There were four candidates still in the hunt at that point in the race, John McCain, Mike Huckabee, Mitt Romney and Ron Paul.  Like this years' race, the primaries showed regional differences and top three candidates had all won contests outright. Here is my analogy of candidates in the two races.

McCain 2008 = Romney 2012.  The front runner and eventual nominee.
Huckabee 2008 = Santorum 2012. The preferred candidate in the South and in terms of polling, the most often underestimated.
Romney 2008 = Gingrich 2012.  This is slightly unfair to Mitt, because he was a stronger candidate than Newt has been.  More than that, even though he won multiple contests on Super Tuesday, he was clearly in third place and he dropped out of the race later that week.
Ron Paul 2008 = Ron Paul 2012.  He's polling better this year, but he is still the man on a crusade with no chance at the nomination and no outright wins in contests with enough polling data.

I'm going to turn all this data into a single number for each candidate, much in the same way I give single numbers to my predictions and those of Nate Silver.  The percentages of low guesses, correct guesses and high guesses should be 2.5%, 95% and 2.5% respectively.  Let's say the polling data for some candidate is low 20%, correct 68% and high 12%.

low: 20% - 2.5% = 17.5% off
correct: 95% - 68% = 27% off
high: 12% - 2.5% = 9.5% off

Adding up all the percentages gives us 54% off, which means only 46% correct.  That's a pathetic result in anything tougher than hitting in baseball.  (Batting .460 would be awesome.)  So here are the distances from correct distribution for each of our eight candidates, four from 2008 and four from 2012.

Romney '08: 70% correct
Huckabee '08: 57% correct
McCain '08: 46% correct
Paul '08: 46% correct
Romney '12: 36% correct
Gingrich '12: 31% correct
Santorum '12: 17% correct
 Paul '12: 12% correct

As Charles Barkley might say, these numbers are tuuuurrrible.  And to make matters worse, every 2008 candidate was easier to predict than any 2012 candidate.

The significantly less reliable results may be caused by the increased use of caller ID.  More people are opting out of the system and that number is increasing over time and they are opting out without ever having to be counted as a hang-up or a "refused to be polled". How the pool of people who will talk to pollsters differs demographically from the general population is hard to say. The best description I have heard is that it skews in favor of the less technically savvy and the more committed politically, regardless of position on the political spectrum. 

The level of commitment is probably the difference between regular overestimation of Ron Paul and the regular underestimation of Rick Santorum.  Paul's base numbers have stayed more consistent than any other candidate's.  That tells us that if someone is a Ron Paul supporter today, there's a good chance that person was a Ron Paul booster in December as well.  The story is different for Senator Rick.  He was a third tier candidate when there were more people in the race. While he polls at over 25% today, he was getting well under 5% until Christmas of last year. This means at least 4 out of 5 Santorum voters were not on board last December and the number could be higher still, possibly closer 9 out of 10.

What effect has this less reliable polling data had on the prediction contests between my method and Nate Silver's method?  Since we rely on poll numbers, it has many times brought our general accuracy down.  If all the poll numbers are with the margins of error, both Nate and I can hope for combined error rates of less than 10%, but if one candidate does much better or worse than we could have predicted, our combined error rates can be closer to 20%. In terms of the races so far, I was at the top of the class in Iowa with a 83.9% accuracy, but everyone who relied on the polls had to miss Santorum's excellent result by a bunch. On the other hand, polling data was very accurate in Georgia for all the candidates and my completely respectable 94.6% accuracy finished second to Nate Silver's method with 94.8%.

There are primaries tomorrow in Wisconsin, Maryland and Washington DC, but so far there is only enough polling data for our algorithms to make predictions in Wisconsin. I'll print my numbers and Nate's when he has factored in the latest poll from PPP.  I'm currently leading 8-6.

1 comment:

Matty Boy said...

A quick note on my method of grading accuracy. As bad as these numbers are, it is possible to get less than 0% correct. I just noticed this fact this morning. I'm going to think about how to re-calibrate the system, but I won't change it now during the competition with Nate Silver. It would never change who did better and who did worse, but it would result in what could be called "grade inflation".