On Quality Control of Critical Data Sets

A few weeks ago, Gavin Schmidt of NASAcame out with a fairly petulant response to critics who found an error in NASA's GISS temperature database.  Most of us spent little time criticizing this particular error, but instead criticized Schmidts unhealthy distaste for criticism and the general sloppiness and lack of transparency in the NOAA and GISS temperature adjustment and averaging process.

I don't want to re-plow old ground, but I can't resist highlighting one irony.  Here is Gavin Schmidt in his recent post on RealClimate:

It is clear that many of the temperature watchers are doing so in order to show that the IPCC-class models are wrong in their projections. However, the direct approach of downloading those models, running them and looking for flaws is clearly either too onerous or too boring.

He is criticizing skeptics for not digging into the code of the individual climate models, and focusing only on how their output forecasts hold out (a silly criticism I dealt with here).  But this is EXACTLY what folks like Steve McIntyre have been trying to do for years with the NOAA, GHCN, and GISS temperature metric code.  Finding nothing about the output that makes sense given the raw data, they have asked to examine the source code.  And they have met with resistance at every turn by, among others, Gavin Schmidt.  As an example, here is what Steve gets typically when he tries to do exactly as Schmidt asks:

I'd also like to report that over a year ago, I wrote to GHCN asking for a copy of their adjustment code:

I’m interested in experimenting with your Station History Adjustment algorithm and would like to ensure that I can replicate an actual case before thinking about the interesting statistical issues.  Methodological descriptions in academic articles are usually very time-consuming to try to replicate, if indeed they can be replicated at all. Usually it’s a lot faster to look at source code in order to clarify the many little decisions that need to be made in this sort of enterprise. In econometrics, it’s standard practice to archive code at the time of publication of an article – a practice that I’ve (by and large unsuccessfully) tried to encourage in climate science, but which may interest you. Would it be possible to send me the code for the existing and the forthcoming Station History adjustments. I’m interested in both USHCN and GHCN if possible.

To which I received the following reply from a GHCN employee:

You make an interesting point about archiving code, and you might be encouraged to hear that Configuration Management is an increasingly high priority here. Regarding your request — I'm not in a position to distribute any of the code because I have not personally written any homogeneity adjustment software. I also don't know if there are any "rules" about distributing code, simply because it's never come up with me before.

I never did receive any code from them.

Here, by the way, is a statement from the NOAA web site about the GHCN data:

Both historical and near-real-time GHCN data undergo rigorous quality assurance reviews. These reviews include preprocessing checks on source data, time series checks that identify spurious changes in the mean and variance, spatial comparisons that verify the accuracy of the climatological mean and the seasonal cycle, and neighbor checks that identify outliers from both a serial and a spatial perspective.

But we will never know, because they will not share the code developed at taxpayer expense by government employees to produce official data.

A year or so ago, after intense pressure and the revelation of another mistake (again by the McIntyre/Watt online communities) the GISS did finally release some of their code.  Here is what was found:

Here are some more notes and scripts in which I've made considerable progress on GISS Step 2. As noted on many occasions, the code is a demented mess – you'd never know that NASA actually has software policies (e.g. here or here. I guess that Hansen and associates regard themselves as being above the law. At this point, I haven't even begum to approach analysis of whether the code accomplishes its underlying objective. There are innumerable decoding issues – John Goetz, an experienced programmer, compared it to descending into the hell described in a Stephen King novel. I compared it to the meaningless toy in the PPM children's song – it goes zip when it moves, bop when it stops and whirr when it's standing still. The endless machinations with binary files may have been necessary with Commodore 64s, but are totally pointless in 2008.

Because of the hapless programming, it takes a long time and considerable patience to figure out what happens when you press any particular button. The frustrating thing is that none of the operations are particularly complicated.

So Schmidt's encouragement that skeptics should go dig into the code was a) obviously not meant to be applied to hiscode and b) roughly equivalent to a mom answering her kids complaint that they were bored and had nothing to do with "you can clean your rooms" — something that looks good in the paper trail but is not really meant to be taken seriously.  As I said before:

I am sure Schmidt would love us all to go off on some wild goose chase in the innards of a few climate models and relent on comparing the output of those models against actual temperatures.

  • stan

    I am still waiting for the credibility of the AGW case to be seriously damaged by the behavior of the alarmist scientists. Why does Gavin have any credibility? If they won’t share their code, why should we believe they do quality science? When they stonewall on their obligations to share comments per IPCC, why don’t we ignore their views on science? Who would trust the science performed by people who claim the dog ate their homework?

    If the alarmist scientists were trustworthy, they would have responded to the surface station volunteer findings by immediately demanding that the problem be fixed. If they were trustworthy people, they would have rejected Mann when it became clear he’d played fast and loose. If they were trustworthy people, they would insist on transparency and data sharing.

    Bottom line — these aren’t trustworthy people with a commitment to honest science. Just look at their behavior. So why do they have any credibility?!

  • hunter

    Secretive, arrogant, self-referential, defensive, authoritarian, all describes the AGW industry of today.
    I will bet you that every stat and model used is full of these characteristics.
    This so reeks of people who have designed models to prove their assertions that I am astonished.
    I thought as AGW fell apart we would see good will, but wrong.
    Instead we see disorganization being used to hide lying.
    These promoters and charlatans dare to demean and damage those skeptics who stand up and questoin them?

  • cfdman

    Have patience,

    The CAGW community is wrong, they know it and are becoming quite defensive. We in the skeptic community should see this as the victory it truly is. The more they hide and the more they double talk, the less credibility they have. Stopping this bloated money train will take years.

    Just remember that physics is on our side, the earth is cooling just as it should due to obvious inputs and natural cycles (solar, PDO, etc). The longer they predict silly temperature rises and those increases fail to appear, the more rediculous they look and sound.

    This site is really quite awesome in bringing out the worst in the CAGW crowd. They show up and end up looking like argumentative ass clowns.

    I am totally optimistic this garbage will be gone in ten years, then we can move on to the next unjustified panic to create carreers and sell stories. Can anyone say “giant, earth killing meteor” or maybe “alien invasion”?

  • ErikTheRed

    This story has hit Drudge via a snarky (at Hansen, et al) article in The Telegraph. It’s also now making the rounds in the blogs.

  • hunter

    The pattern has been that as AGW prophecies fall apart, the AGW prophets ratchet up the fear.
    Expect a dramatic and concerted effort from the AGW academic-political-profiteer complex to ‘seal the deal’ before the rest of their fear mongering is found out.