Longtime readers will know that I love to collect data. Many blog posts have had a collection of data as the jump off point, but there are times when I collect data hoping to see a pattern and none becomes apparent, or I see some trend but I'm of several minds about how to present it.

One type of data set I have been collecting for two and a half years concerns the average temperature in Oakland. The website

**Weather Underground**publishes not only the daily temperature highs and lows, but compares each day to the average over the last fifteen years. I have used this data in my statistics classes, showing how to take large sets and input them in calculators using frequency tables. Most Texas Instruments calculators will balk at a data set with 365 or 366 values, but because of repetition of values, we can get all the data in the set and the important statistics from these samples, notably the five number summary - an old school way to look at outliers - and also average and standard deviation, the more modern way to discuss what numbers on a list are remarkably high or remarkably low.

This is a dot-plot of the 366 days of 2016 in Oakland, each day listed as the number of degrees above or below the average for the previous fifteen years. The tallest stack of dot is at zero degrees. This represents the mode of the set. Obviously, there are a lot more dots to the left of the tallest stack than there are to the right. The other two famous measures of center, the mean and the median, are not so apparent from this graph. The median is 2 and the average is about 2.604, with a standard deviation of 5.659. Simply put, the more commonly used measures of center say the temperature in 2016 is warmer than the rest of the century.

You might say this is evidence of climate change in Oakland. I am not 100% convinced. Here are my reasons.

**1. Should I trust the average daily temperatures given by the website?**The averages stay the same for weeks at a time, not even wobbling by a degree. That smells like they are averaging not just all the single day temperatures for example, but maybe taking the average of several days in a row, then averaging that over fifteen years.

**Not sure this is kosher.**

**2. Should I trust the**The t-score test uses average/(standard deviation) x sqrt(size of set) as the test statistic. In this case, that would be

*t-*score method and the*p*-value it produces?2.604/5.659 x sqrt(366) ~= 8.803. This is a crazy big number for a

*t*-score and it produces a

*p*-value so small it has to be written in scientific notation, 2.705 x 10 ^ -17. Written in regular notation, this is 0.0000000000000002705, which is crazy close to zero. A paper publishes with a

*p*-value this small is basically saying, "I'm right, so shut the fuck up."

But let me note here that statistics is math mixed with opinion, and not every statistician loves the

*t*-score/

*p*-value method used with a data set like this. Most notably, W. Edwards Deming, the famously practical statistician credited with turning the Japanese economy around after World War II, argued that if there was

*any*difference between any two sets, all you needed was a large enough sample size to prove that difference significant. In this case, the large sample size gives us a multiple in the formula of sqrt(366), which is about 19. Since a

*t*-score of 3 will give us a very impressive

*p*-value, having this relatively large number in the formula guarantees an impressive

*p*-value.

**3. How should we think about a year in terms of climate change data?**A hot or cold day is not climate change. I am skeptical about counting a month as a long enough time to have meaning, though Dr Michael E. Mann often tweets about a month being the hottest or second hottest (fill in the month in question) in history. Mann is not an alarmist, as was made clear when he poured cold water on the

*New York*magazine article from earlier this year that was all doom and gloom. While not an alarmist, he does want to keep climate change in the news, and it is a slow moving process, at least from the standpoint of the 24 hour news cycle.

But I have no problem about thinking a year is a length of time where we can talk about the numbers as having meaning when discussing climate change. Personally, I am uncertain as to whether years should be the basic unit of measure or should be clumped into groups to have clearer meaning. My simile is this. A year has meaning, but if we compare it to grammar, is a year a sentence or a word or merely a letter? When I wrote my math blog about climate change, I argued that we should look at periods of time between strong El Niño years that included a strong La Niña year as the basic unit.

So those are my provisos and quibbles. Here is the data.

**2015:**The temperature in 2015 was 2.605° F warmer than the average of the previous fifteen years and the standard deviation was 5.659° F. With a sample of 365 days, this data set makes a very convincing argument that things are getting warmer. Using the average and standard deviation method, an unusually cold day would be 9° F lower than average. That happened once. An unusually hot day would be 14° F higher than average. That happened seventeen times, and very unusually hot days wound be over 20° F hotter than average, which happened three times.

**2016:**The temperature was 2.242° F warmer than the fifteen year average and the standard deviation was 5.447° F. It didn't warm up quite as much as 2015, but the lower standard deviation would mean the

*t*-score/

*p*-value number would again be hard to argue against. There were no days that count as unusually cold (again, 9° F colder than average), but eighteen days at 14° F hotter than average and six days above 19° F hotter than average.

**First seven months of 2017:**So far, the average temperature is 2.321° F warmer than the previous fifteen average with a standard deviation of 5.480° F. No days have been unusually cold so far, twelve have been unusually hot and three have been very unusually hot. The cutoff points for unusually hot and very unusually hot are 14° F above average and 19° F above average, respectively. These thresholds are unchanged from the 2016 numbers, which is not surprising because the averages and standard deviations are so similar.

**Conclusion:**Here in Oakland it's getting warmer. 2015 shows the largest change upward, but note that 2015 is part of the last fifteen year average when measuring 216 and 2017. I'd love to get more raw data from a weather station that has produced data continuously for a few decades and I have an idea of how to achieve that. I also want to come up with a good way to define a heat wave and I think I have the start of an idea I need to flesh out.

Tomorrow, another math-y blog post, this time about Trump's approval numbers.

## No comments:

Post a Comment