In media reports of polls, the margin of error is always given, usually an afterthought. Very few news organizations bother telling their readers what it actually means – in the U.S., only the New York Times makes an honest effort about this on a regular basis – and even if every “i" is dotted and “t” crossed, the number can be rendered meaningless by two common factors, time and apathy.
If Candidate A has 42% of the vote in a poll and the “margin of error” is reported as ±4.0%, what this means is that we are 95% confident the true number for Candidate A will be between 38% and 46%. (Why 95% confident, you ask? It’s the industry standard and it has been forever.)
Since we “catch the fish” about 95% of the time, we expect to be wrong about 5% of the time, almost equally divided between too high and too low.
Here’s the thing. In the polls that have been held since this the Republican nomination race turned into a four person field, the freshest polls for these contests cycle have been awful.
We look at the data from 43 polls from 11 different contests, none taken more than a week before the election date, we see the 95% confidence interval is more like a poorly observed guideline than it is a solid mathematical concept based on the Binomial Theorem. Everyone’s numbers should be captured in the margin of error around 41 of 43 times. Instead, the most easily predicted candidate Mitt Romney ends up in the 95% interval only 62% of the time and Ron Paul only gets captured in the interval 51% of the time.
Mathematicians are loath to scrap the Binomial Theorem, (really, it’s as pretty as a hummingbird, it can't be wrong), so we blame the messy real world for not meeting the standards our lovely calculations promise. “Data collection bias” is the culprit, the First Great Lesson of the poll taking business, the one that made George Gallup famous and helped kill The Literary Digest way back in the FDR era.
In 1936, The Literary Digest conducted a survey involving 10,000,000 questionnaires, a number no polling company today would dream of trying due to the expense. They got 2,400,000 replies, still a stunning sample size. Their prediction said Alf Landon would win in a landslide. The margin of error should have been microscopic. Instead, Roosevelt crushed Landon, who carried only Vermont and Maine.
There were two problems. One was the people who were sent the questionnaires in the first place. The mailing lists were made up of the Digest’s readers, automobile owners and people with telephones. Cars and phones may not seem like big ticket items now, but in 1936 they were and the group skewed towards the wealthy end.
The other reason was how few people returned the questionnaire, only 24%. Polling companies today have a similar problem, though not on as grand a scale.
George Gallup polled much smaller numbers, predicting Roosevelt’s victory. For good measure, he came up with a sample that would mimic The Literary Digest’s snooty demographic and got numbers within a percent of theirs. Gallup’s name was made and the publishing company Funk & Wagnalls closed the doors on The Literary Digest in 1938.
How does this relate to polling today? The important question is “Who actually answers the phone and stays on the line for an opinion poll?” How many people with caller ID refuse to respond to strange numbers? How many answer then hang up when they realize it’s an opinion poll? The difference in the numbers for Rick Santorum and Ron Paul is illuminating.
Ron Paul’s numbers get overestimated a lot, the green column above his name. Rick Santorum has not been markedly overestimated by a single poll yet, but as you can see by the blue column above his name, he has been underestimated a stunning 44% of the time.
It’s as though Ron Paul voters are waiting by the phone, wishing to be asked for their opinion the way a Tennessee Williams heroine prays for gentleman callers. On the other hand, Rick Santorum voters might as well be at prayer meetings 24/7, politely making sure their cell phones were off.
The freshest polls in the 2008 general election did much better. If we sift away the undecided from the polls and assume 100% of the vote for the two main candidates (not actually true, but let’s go with it for this conjecture) 94 of 102 results were in the 95% confidence interval. This amounts to only 92%, well within expected values.
Have polls become this much worse in just four years or is it a function of primary vs. general election? My educated guess is that these primaries in particular are tough to gauge because of their nature, so many voters not paying attention until the last minute. While the general election feels like a lingering illness for many voters, polling companies do a better job when public opinion has a longer time to fall into place.
UPDATE: A look at 2008 primaries vs. 2012 primaries. Super Tuesday was much bigger and much earlier four years ago, and there were four candidates, three of them capable of winning contests and a fourth guy named Ron Paul. Quite the coincidence!