Wednesday, April 24, 2013

Tainted Love, Part 2:
Caught in a bad romance.

Using a new earworm to get rid of yesterday's earworm.

Sometimes the magic works and sometimes it doesn't.

Yesterday, I discussed several ways I have been collecting data this century, sometimes just with the intention of understanding a situation I thought was under-reported like the prices of gold, silver and crude oil. There were also data collection methods where I hoped to make a prediction at the end, most notably the 2008 and 2012 elections where my prognostications were closer to perfect than Nate Silver's were. I also looked at other people's predictions, most notably the supermarket checkout stand predictions of deaths, pregnancies, divorces and marriages. They pretty much sucked.

Now I have two new blogs. The first, This Day In Science Fiction has a daily review the predictions made in science fiction and other sources that have dates attached.

Some of the predictions are very good but many are not. I'm interested in dates that have already passed or are just a few years from now, and some writers like Arthur C. Clarke and Robert A. Heinlein are great sources for predictions that were within their lifetimes or just a few years later, though not all of them were accurate. (Following my rule for just a few years away, The Jetsons are supposed to live in 2062 and first contact in Star Trek is 2063. Neither of them meet my criteria.)

Heinlein has so many predictions both in fiction and in predictive essays that I have two pictures of him. When he's right I use the Sensible Bob picture. This picture is the Ridiculous Bob.

It gets used a lot.

Another set of predictors are the futurists from the Victorian era. They tend to believe the future will be a socialist utopia of equal opportunity and freedom from want. My personal favorite is John Elfreth Watkins, a fellow who worked for the railroads and was asked in 1900 to make a set of predictions about life in the year 2000. These were published in that famous source of speculative fiction The Ladies' Home Journal.

You might think he would just stick to the women's issues of the day, but he made predictions about transportation, communications, education, agriculture, entertainment, warfare, you name it. He's not perfect, but he does a much better job than the sci-fi writers do generally and he's spotting them fifty or sixty years.

More than that, I love that his well groomed facial hair has no trace of irony. A handsome fella, no doubt about it.

And then there's my second blog, Math Year 2013. A lot of my posts are just about math, but I decided in the off seasons between elections to gather climate data to see if I could make heads or tails of it. The Berkeley Earth Surface Temperature project produced the world's most complete set of temperature data starting a few years back that is being updated regularly. The data set is huge, but the enormous text files are tailor made for a computer program written in C to parse. I wrote several programs and started to look at the weather season by season in different regions of the world. 

The squiggly red line represents the average temperatures each year from 1955 to 2010. I split this time period into four eras based on the temperature fluctuations across the Indian and Pacific Oceans known as La Niña and El Niño. The dotted red line in the middle tracks the median temperatures in each era. The black lines that frame the squiggly line at top and bottom are the record high and record low for each era.

I chose a season and a region that shows obvious warming. Not every season in every region is this convincing. But overall, take any serious statistical measurement that doesn't involve cherry picking and the numbers say the planet is warming, in some regions like the Sahara at an alarming rate.

Past performance is not an indicator of future trends. I don't do my stuff to predict climate. My model isn't sophisticated enough by a long shot. More than that, as a mathematician I have my doubts abut many statistical methods.

Here's my view of prediction. I'm not a genius and Nate Silver's not a genius. If you are honest and diligent - and both of us are - it's easy to get almost everything right with your last snapshot of the race on the morning of Election Day. The median of the recent polls (my method) does extremely well and the average of recent polls mixed in with some trendspotting (Silver's method) does very well also. If we disagree, my method has a better track record so far.

On the other hand, something like predicting every winner in a sixteen team knock-out tournament is very hard. Here, for example, are the opening round pairings for last year's Stanley Cup playoffs. The numbers next to the team names give the seedings. The #1 seed had the best record in their conference and gets the advantage of playing the #8 seed, the team with the worst record that still made the playoffs. #2 plays #7, #3 plays #6 and #4 plays #5. Doing well in the regular season gives you an allegedly easier path to make it to the Finals.

Sometimes the magic works and sometimes it doesn't.

Last year, the final was between the #8 seeded Los Angeles Kings in the West and the #6 seeded New Jersey Devils in the East. For both teams, every victory they achieved during the Stanley Cup was against a team that had a better overall record in the regular season. The Kings won the Cup, winning 16 games and losing only 4 over a two month span.

In general, seeded tournaments are very random indeed, some more than others.

Nate Silver and I did very well with a data set that was remarkably devoid of wacky randomness. We weren't geniuses, we were just lucky to get such an easy assignment. With no false modesty, my system did a little better than his did, 83 correct picks and one abstention vs. 81 correct, 2 incorrect and 1 abstention.

When systems get very random, like March Madness or the Stanley Cup or picking all the Oscar winners, a strong system or a superior knowledge base can still get smacked around by dumb-ass luck.

As I wrote at the beginning of the last post, I am obsessively fascinated by predictions and on the whole, I do not trust predictions as far as I can throw them. Nate Silver's book The Signal and the Noise assumes we are just a few easy steps away from significant improvements in the field of prediction. As an older man, someone who has played more poker and more backgammon than he has, my best advice is to not celebrate early. We still have a very long way to go and no certain proof things have to get better.

Tuesday, April 23, 2013

Tainted love.

Some of my readers, having read the title of this post, now have a eight-beat synthesizer hook playing on a loop in their brains. It starts with

BOMP BOMP doo doo dee dee dada de

 I apologize for this but it couldn't be helped. I could say instead that I am in a love-hate relationship, but that is exactly the kind of over-used cliche my writing hero George Orwell warns against.

I am in a long-term relationship with seriously unhealthy aspects. I am obsessively fascinated by predictions and on the whole, I do not trust predictions as far as I can throw them.

What relationship doesn't have its ups and downs?

Longtime readers will know I have made predictions of the outcomes of the elections in 2008 and 2012 with some success. The last of my snapshots in 2008 had Obama leading comfortably 353 to 174, with 11 electoral votes in the toss-up category. The final result was Obama 365, McCain 173. The toss-up state was Indiana, which Obama won narrowly 49.9% to 48.8%, about 30,000 votes out of 3,000,000. The one extra I missed was that Omaha went for Obama while the rest of the state went for McCain. Back then, I didn't have access to the data to do the district by district predictions in Nebraska and Maine.

I did well predicting that election at the end and I did well in the general election of 2012. The best known predictor now is Nate Silver. I beat him in both 2008 and 2012. In his book The Signal and the Noise, he believes predictions are getting better and wants to determine why some do well and others don't.

My view right now is that good predictions are not because of particularly clever prognosticators but instead because of not very random data.

Besides the very accurate predictions both Silver and I made in the general election, we also tried to predict the GOP primaries in late 2011 and early 2012. Our records in these contests were much worse, even though one of the central tenets of statistics says the margin of error of a percentage near zero should be much less than the margin of error of a percentage near 50%.

My hypothesis for this less valuable data is that the GOP electorate was in a serious state of flux during the primaries.  A ridiculous number of people were shown to be the front runners according to national polls and results of state primaries and caucuses. When Donald Trump made completely implausible noises about joining the race in the summer of 2011, his name shot to the top of the polls. Other people who topped the polls included Rick Perry, Herman Cain, Newt Gingrich, Rick Santorum and finally Mitt Romney. It should also be noted that Michele Bachmann won a straw poll and Ron Paul won caucuses.

Conservative media, most notably Rush Limbaugh, did not love Mitt Romney, and the GOP voters took quite a while to decide he was the one they wanted. Once he was chosen, he never had a lead over Obama. The GOP base hates Obama with a white hot hate, but it did not translate into love for their candidate that build a winning campaign.

Trying to figure out the electorate is just one of the topics on which I have collected massive amounts of data this century. Back when Bush was president, I collected data about casualties in Iraq and Afghanistan and filled spreadsheets with the price fluctuations of silver and gold and crude oil, trying to detect trends and correlations. I didn't make predictions based on these things, but I used them as an antidote to news gathering organizations that said everything was peachy keen because their important contacts in Washington said everything was peachy keen.

A few years back, I decided to start a blog keep track of the headlines of the supermarket checkout magazines. My original idea was to keep track of the things predicted that could be verified or falsified, like predictions of deaths, pregnancies, marriages or divorces. Soon enough, mission creep set in and I was putting up posts about every headline. Soon enough that meant keeping track of every belch from a Kardashian or fart from a Teen Mom. Diminishing returns set in.

I did keep special track of the people they predicted would die. Their track record was awful. They have had some famous successes. The National Enquirer said Michael Jackson had six months to live and within six months he was dead. They had similar good calls with Gary Coleman and Peter Falk. But they also suck a lot. Among the people who should already be dead by now are Bill Clinton, Queen Elizabeth, Loretta Lynn, Rush Limbaugh and Michael Douglas.

I put a picture of Anne Francis here because when I used forty names of people they predicted to die in a deadpool, I only got two hits in 2011, Ms. Francis and Elizabeth Taylor. They were finally right about Miss Taylor, but they had her on death's door off and on since before Butterfield Eight was released.

The latest fancies in my obsession with predictions can be seen in my blog This Day In Science Fiction and my work with climate data on my other new blog Math Year 2013.  I will discuss the few successes and many failures I have found there tomorrow.