Last week, blog buddy Namaste Nancy provided a link to a New York Times science blog, usually written by an evolutionary biologist, but for a few weeks guest written by Cornell mathematician Steven Strogatz. His topic was Zipf's Law, which says that if you list the cities of a nation in order from largest to smallest, the second biggest city will be about half the size of the largest, the third largest will be one third the size of the largest, the fourth largest one fourth the size, and so on. The general statement would be that if you divide the population of the n-th largest city in the land by the population of the largest city, the fraction should be about 1/n.
I had never heard of Zipf's Law before. I thought it was kind of cool. Any formula that can be stated so simply is compelling. There's only one problem with Zipf's Law. It isn't actually true.
The first three countries I decided to check were the United States, China and Canada. I didn't pick them "at random". My thought process was as follows. First, I'm an American. Second, I knew China was about the size of the continental U.S., but much more populous. Similarly, Canada has about the land mass of the Lower 48, but with a much smaller population. I then decided to check some countries with a lot less land mass, but ones that still had enough cities to test, Germany, Japan and the Netherlands.
I don't know how well Zipf's Law fit the United States demographics when he stated it back in the 1930's, but right now the American cities fit the rule pretty well. On the other hand, China doesn't fit Zipf's Law at all. (Bad fits are marked in yellow.) The second biggest city in China is about three fourths the size of the largest, and the tenth biggest city is well over one third the size of the largest. The Netherlands, the smallest country on this limited list, follows China's lead and doesn't follow Zipf's Law very well. Canada and Japan have large cities that aren't quite the right size, and the eighth, ninth and tenth cities on the German list are noticeably larger than the Zipf's Law projections.
Rules and definitions in math are very strict, but they are also completely reliable. When we state that the interior angles of a triangle add up to 180 degrees, we don't mean "more or less for most triangles". It's always. That's how mathematicians roll. There might be some exceptions to a rule, but even those exceptions are exceptions 100% of the time, like you can't divide by zero. There can be multiple ways to solve a problem, but all the methods come up with the same answers.
Statistics is much sloppier. There are several jokes about a mathematician, a physicist and a statistician. When a mathematician tells these jokes, the math guy knows what he's doing, the physicist kind of knows what he's doing and the statistician is a dunce. When a statistician tells these jokes, the statistician is a very practical person, the physicist somewhat less so and the mathematician is completely divorced from reality.
In the sciences, there is a phenomenon known as physics envy. For centuries, physics has been the best customer for mathematics, and some of the math formulas in physics are as reliable as the math that describes the abstract world of plane geometry. As physics has dealt with more complex objects and the odd world of what happens at sizes smaller than atoms, some physical laws have become more statistical in nature rather than mathematical. Einstein famously hated this, and when faced with the probabilistic nature of quantum mechanics is quoted as saying "God does not play dice with the universe." Allegedly, either Enrico Fermi or Niels Bohr answered "Albert! Stop telling God what to do with his dice!" (My money's on Bohr. He has plenty of other quotes that display the same sense of humor.)
I did some research into statistics textbooks this year, and the disagreements on how to define simple concepts that students in elementary classes should be taught is appalling. While some of the topics in statistics are compelling, I've also gained a greater appreciation for the jokes whose punchlines portray statisticians as people who can't find their own ass with both hands and a flashlight.