The fast and the spurious

Correlations between data sets are magical things — they tell us that two variables move together, and encourage us to claim that the two are linked. Politicians and advocates do this all the time. We hear it in advertisements and read it in promotional copy. Causality claims are so much part of the chatter around us that we rarely give them a second thought. But a fun little website goads us to give them a third, fourth, and fifth thought.

Spurious CorrelationsSpurious Correlations discovers and shares multiple pairs of data sets that track together but make no logical sense together. For example, the divorce rate in Maine correlates almost perfectly with per capita consumption of margarine. Or, the number of honey-producing bee colonies is inversely correlated with juvenile arrests for possession of marijuana. A personal favorite for the arts advocates among you: the number of works of visual art copyrighted in the US inversely correlates with the number of New York females who slipped or tripped to their death.

Granted, the site’s author admits he’s using low-grade correlation calculations and a script he wrote quickly for fun. But the results are still highly effective at nudging our lazy assumptions about data.

Imagine an infuriated politician showing one of the charts above and demanding policy action: we MUST produce more works of visual art or more females will trip to their death! We must stop the scourge of marijuana possession by offering bee colony tax credits. And, of course, we must discourage margarine consumption to save marriages in Maine (which may be the National Dairy Council’s new tagline).

We all know (don’t we?) that correlation does not imply causality. Just because data sets move together doesn’t mean that one causes another. Sometimes there’s a “lurking variable” — a separate element not in either data set — that may be in play (for example, research would likely show a high correlation between opening your umbrella and your feet getting wet). Sometimes, the correlation is just plain spurious. The data sets moved together because they did.

So, before we embargo all future Nicolas Cage movies to avoid additional drowning deaths, we had best stop and think about the data and its analysis. There are other statistical reasons to limit Nicolas Cage movies, but drowning is probably not among them. I’ll keep looking.