Thursday, October 22, 2009

Tricking yourself into cherry picking.

15 October, 2009 (11:30) | Climate Sensitivity Written by: lucia

Did you know you can cherry pick without knowing it? It works like this:

You speculate there “some trees” are temperature proxies, but “other trees aren’t. (So far, you’re actually ok.)
Then, instead of trying to do a real calibration study to discover what sorts of trees are temperature proxies and which aren’t, you just take a bunch of cores and find which correlate “best” with the recent temperature record. You throw away all the rest of the cores as “not temperature proxies”.
This sort of sounds like it makes sense, right? After all, the trees that did not correlate with the current temperature record can’t be temperature proxies. So, the rest may be a little noise, but they “must” be temperature proxies, right?

Oddly, it appears that in comments, this notion
In comments, Layman Lurker shed some light into part of “the argument” between the Lorax and Hu M. Here’s what Layman relates:

IMO Lorax was a bit different. In one situation he asked something to the effect of why it would not be good practice to reject proxies which did not correlate with the insturmental record. Hu M. responded saying (paraphrasing from memory) that “the method you advocate on the surface this seems attractive” but that it would amount to cherry picking and would bias the analysis. Lorax took great offense and went off on Hu in a couple of follow up comments saying that Hu had stated that he (Lorax) advocated cherry picking. I don’t think anyone with any common sense would have interpreted Hu’s response to Lorax’s question that way.

Well… Hu is right. The method of simply rejecting the trees that fail to correlate does automatically bias a sample. Seems odd, but it’s true. So, even though the method seems reasonable, and the person doing it doesn’t intend to cherry pick, if they don’t do some very sophisticated things, rejecting trees that don’t correlate with the recent record biases an analysis. It encourages spurious results, and in the context of the whole “hockey stick” controversy, effectively imposes hockey sticks on the results.

I’m going to show a little back of the envelope analysis that highlights the point Hu was making.

Method of creating hockey stick reconstructions out of nothing
To create “hockey stick” reconstructions out of nothing, I’m going to do this:

Generate roughly 148 years worth of monthly “tree – ring” data using rand() in EXCEL. This corresponds to 1850-1998. I will impose autocorrelation with r=0.995. I’ll repeat this 154 times. (This number is chosen arbitrarily.) On the one hand, we know these functions don’t correlate with Hadley because they are synthetically generated. However, we are going to pretend we believe “some” are sensitive to temperature and see what sort of reconstruction we get.
To screen out the series that prove themselves insensitive to temperature, I’ll compute the autocorrelation, R, between Hadley monthly temperature data and the tree-ring data for each of the 154 series. To show this problem with this method, I will compute the correlation only over the years from 1960-1998. Then, I’ll keep all series that have autocorrelations with absolute values greater than 1.2* the standard deviation of the 154 autocorrelations R. I’ll assume the other randomly generated monthly series are “not sensitive” to temperature and ignore them. (Note: The series with negative values of R are the equivalent of “upside down” proxies. )
I’ll create a proxy by simply averaging over the “proxies” that passed the test just described. I’ll rebaseline so the average temperature and trends for the proxie and Hadley match between 1960-1998.
I’ll plot the average from the proxies and compare it to Hadley
The comparison from one (typical) case is shown below. The blue curve indicates is the “proxy reconstruction”; the yellow is the Hadley data (all data are 12-month smoothed.)

Notice that after 1960, the blue curve based on the average of “noise” that passed the test mimics the yellow observations. It looks good because I screened out all the noise that was “not sensitive to temperature”. (In reality, none is sensitive to temperature. I just picked the series that didn’t happen to fail. )

Because the “proxies” really are not sensitive to temperature, you will notice there is no correspondence between the blue “proxy reconstruction” and the yellow Hadley data prior to 1960. I could do this exercise a bajillion times and I’ll always get the same result. After 1960, there are always some “proxies” that by random chance correlate well with Hadley. If I throw away the other “proxies” and average over the “sensitive” ones, the series looks like Hadley after 1960. But before 1960? No dice.

Also notice that when I do this, the “blue proxie reconstruction” prior to 1960 is quite smooth. In fact, because the proxies are not sensitive, the past history prior to the “calibration” period looks unchanging. If the current period has an uptick, applying this method to red noise will make the current uptick look “unprecedented”. (The same would happen if the current period had a down turn, except we’d have unprecedented cooling. )

The red curve
Are you wondering what the red curve is? Well, after screening once, I screened again. This time, I looked at all the proxies making up the “blue” curve, and checked whether they correlated with Hadely during the periods from 1900-1960. If they did not, I threw them away. Then I averaged to get the red line. (I did not rebaseline again.)

The purpose of the second step is to “confirm” the temperature dependence.

Having done this, I get a curve that sort of looks sort of like Hadley from 1900-1960. That is: the wiggles sort of match. The “red proxy reconstructions” looks very much like Hadley after 1960: both the “wiggles” and the “absolute values” match. It’s also “noisier” than the blue curve–that’s because it contains fewer “proxies”.

But notice that aprior to 1900, the wiggles in the red proxy and the yellow hadley data don’t match. (Also, the red proxie wants to “revert to the mean.”)

Can I do this again? Sure. Here are the two plots created on the next two “refreshes” of the EXCEL spreadsheet:

I can keep doing this over and over. Some “reconstructions” look better; some look worse. But these don’t look too shabby when you consider that none of the “proxies” are sensitive to temperature at all. This is what you get if you screen red noise.

Naturally, if you use real proxies and that contain some signal, you should do better than this. But knowing you can get this close with nothing but noise should make you suspect that screening out based on a known temperature record can bias your answers to:

Make a “proxy reconstruction” based on nothing but noise match the thermometer record and Make the historical temperature variations looks flat and unvarying.
So, Hu is correct. If you screen out “bad” proxies based on a match to the current temperature record, you bias your answer. Given the appearance of the thermometer record during the 20th century, you bias toward hockey sticks!

Does this mean it’s impossible to make a reliable reconstruction? No. It means you need to think very carefully about how you select your proxies. Just screening to match the current record is not an appropriate method.