Tag Archives: signal to noise

The First Rule of Regression Analysis

Here is the first thing I was ever taught about regression analysis — never, ever use multi-variable regression analysis to go on a fishing expedition.  In other words, never throw in a bunch of random variables and see what turns out to have the strongest historical relationship.  Because the odds are that if you don’t understand the relationship between the variables and why you got the answer that you did, it is very likely a spurious result.

The purpose of a regression analysis is to confirm and quantify a relationship that you have a theoretical basis for believing to exist.  For example, I might think that home ownership rates might drop as interest rates rose, and vice versa, because interest rate increases effectively increase the cost of a house, and therefore should reduce the demand.  This is a perfectly valid proposition to test.  What would not be valid is to throw interest rates, population growth, regulatory levels, skirt lengths,  superbowl winners, and yogurt prices together into a regression with housing prices and see what pops up as having a correlation.   Another red flag would be, had we run our original regression between home ownership and interest rates and found the opposite result than we expected, with home ownership rising with interest rates, we need to be very very suspicious of the correlation.  If we don’t have a good theory to explain it, we should treat the result as spurious, likely the result of mutual correlation of the two variables to a third variable, or the result of time lags we have not considered correctly, etc.

Makes sense?  Well, then, what do we make of this:  Michael Mann builds temperature reconstructions from proxies.  An example is tree rings.  The theory is that warmer temperatures lead to wider tree rings, so one can correlate tree ring growth to temperature.  The same is true for a number of other proxies, such as sediment deposits.

In the particular case of the Tiljander sediments, Steve McIntyre observed that Mann had included the data upside down – meaning he had essentially reversed the sign of the proxy data.  This would be roughly equivalent to our running our interest rate – home ownership regression but plugging the changes in home ownership with the wrong sign (ie decreases shown as increases and vice versa).

You can see that the data was used upside down by comparing Mann’s own graph with the orientation of the original article, as we did last year. In the case of the Tiljander proxies, Tiljander asserted that “a definite sign could be a priori reasoned on physical grounds” – the only problem is that their sign was opposite to the one used by Mann. Mann says that multivariate regression methods don’t care about the orientation of the proxy.

The world is full of statements that are strictly true and totally wrong at the same time.  Mann’s statement in bold is such a case.  This is strictly true – the regression does not care if you get the sign right, it will still get a correlation.  But it is totally insane, because this implies that the correlation it is getting is exactly the opposite of what your physics told you to expect.  It’s like getting a positive correlation between interest rates and home ownership.  Or finding that tree rings got larger when temperatures dropped.

This is a mistake that Mann seems to make a lot — he gets buried so far down into the numbers, he forgets that they have physical meaning.  They are describing physical systems, and what they are saying in this case makes no sense.  He is essentially using a proxy that is essentially behaving exactly the opposite of what his physics tell him it should – in fact behaving exactly opposite to the whole theory of why it should be a proxy for temperature in the first place.  And this does not seem to bother him enough to toss it out.

PS-  These flawed Tiljander sediments matter.  It has been shown that the Tiljander series have an inordinate influence on Mann’s latest proxy results.  Remove them, and a couple of other flawed proxies  (and by flawed, I mean ones with manually made up data) and much of the hockey stick shape he loves so much goes away