Grading US Temperature Measurement Sites

Anthony Watts has initiated a nationwide effort to photo-document the climate stations in the US Historical Climate Network (USHCN).  His database of documented sites continues to build at  Some of my experiences contributing to his effort are here and here.

Using criteria and a scoring system devised years ago based on the design specs of the USHCN and use in practice in France, he has scored the documented stations as follows, with 1 being a high-conforming site and 5 being a site with many local biaes and issues.  (Full criterea here)


Note that category 3-5 stations can be expected to exhibit errors from 1-5 degrees C, which is huge both because these stations make up 85% of the stations surveyed to date and because this error is so much greater than the "signal."  The signal we are trying to use the USHCN to detect is global warming, which over the last century is currently thought to be about 0.6C.  This means that the potential error may be 2-8 times larger than the signal.  And don’t expect these errors to cancel out.  Because of the nature of these measurement problems and biases, almost all of these errors tend to be in the same direction – biasing temperatures higher – creating a systematic error that does not cancel out.  Also note that though this may look bad, this situation is probably far better than the temperature measurement in the rest of the world, so things will only get worse when Anthony inevitably turns his attention overseas.

Yes, scientists try to correct for these errors, but so far they have done so statistically without actually inspecting the individual installations.  And Steve McIntyre is doing a lot of work right now demonstrating just how haphazard these measurement correction currently are, though there is some recent hope that things may improve.

  • I’m confused here. If the signal to noise ratio is so high how do we know that there has been any warming at all, outside of normal cyclic patterns? Satellite measurements?

  • markm

    gmee: There are satellite measurements since about 1979. Unlike the ground weather station measurements, these sample the entire world evenly, and sample at several heights in the air. The measurement satellites have been replaced with improved models at least once, and apparently the data needed a small correction due to the differences between the satellites, but the “fudge factor” is much smaller.

    Before correction, the data showed almost no overall trend. With a correction, it shows a warming trend of less than 1 degree over almost 30 years. And remember, the starting point was in a decade when ground stations were showing unusually low temperatures and the alarmists were talking about an impending ice age. That is, if someone had picked a starting point in the 1970’s by choice rather than because the technology just didn’t exist any earlier, I’d suspect them of trying to bias the results by picking an especially cold starting point. If we had satellite data going back further, it’s anyone’s guess as to whether it would show this decade as warmer or colder than the 1930’s.

    Over longer periods and greater temperature changes, there are indicators that do not depend on measuring instruments, although their accuracy is lower. E.g., you can follow local accounts of first and last snowfall, when the pond froze over and when it thawed, and when it was judged safe for ice skating. You can find measurements of glacial advance and retreat, and of ice forming and breaking up off Alaska. By such methods, it’s quite clear that North America warmed substantially between 1850 and 1930. Since then, probably the temperatures dropped a little to the 1970’s, and then began a sharp rise for the last 30 years – but it’s not accurate enough to tell whether the world is now warmer or colder than 1937. (Ground temperature readings with the most recent set of fudge factors actually show the warmest years were in the 1930’s.) Another issue with many of these indicators is that they have a long and unknown lag time between a temperature change and the indicator changing; e.g., a glacier might start melting decades before it’s retreat is clearly measured, permafrost might have been warming up since the 1930’s and just started visibly melting, etc.

    Going further back, you can also use accounts of where cold-sensitive crops were grown (e.g., vineyards in medieval England), and of subsistence farms being established and then failing in marginal areas, such as the south of Greenland. You can also look at the growth rings in the oldest trees and in pieces of wood that have been preserved from centuries past, and try to untangle how much of the variation in growth per year is due to temperatures versus rain, fertilization, disease, etc. Put these all together and it’s quite clear that western Europe, England, and Greenland were pretty warm in 1000AD, suffered a significant cooling sometime between then and 1700, and have been recovering since then – with the warming trend starting before most of the industrial-age increase in carbon dioxide. What’s impossible to do with such methods is to clearly establish whether the world is warmer overall than it was 1,000 years ago or not. Southern Greenland certainly isn’t – former Norse farmsteads are still covered in ice and snow year round, and no one would need farm implements and cattle barns where the ground never thaws. Some tree ring data seems to show that it’s warmer now – in a few particular locations. There aren’t enough data points for a world-wide average to be meaningful

  • Thanks for your in-depth reply. It cleared away the confusion quite nicely. 🙂


  • TCO

    Direct question: what is the basis (calibration study) for the claim that quality 5 stations will be 1-5 degC off? What is the basis for thinking that this will convert to a rising temp with time, rather than a constant offset?

    I warn you that this is really crappy analysis. It’s almost like McKinsey style mooncharts for a dotcom. Or Forester reports with errors carried forward.

    Before trumpeting this stuff, you ought to parse it.

  • markm

    TCM: The rise with time is generally a step-wise increase over what you’d get at a pristine site as things are built around an originally category 1 site. Most of these things increase the temperature at the measuring instrument at least part of the year: pavement and walls reflect heat towards it, heated buildings radiate heat, and there are even airconditioning units venting hot air at the instruments at some sites. However, you aren’t going to see a neat graph showing stepwise increases because temperatures also change by tens of degrees on daily and annual cycles, and randomly fluctuate by tens of degrees with weather patterns. If there’s a nearby that remained category 1 and tracked with the suspect site until development began around it, then by comparing averages over long periods you can tease out the approximate offset.

    Cases where such comparisons are available are rare. Weather measurement stations were built to help with predicting the weather, and the value of added stations lies in being far enough away to measure different weather. It also appears that few weather scientists were willing to live way out in the sticks just so the stations they tended would never suffer from urbanization. So they’ve got to take the measured corrections from the few cases where stations were inadvertently placed too close together and a town grew up around only of them, calculate the offsets there, and then estimate that other stations in similar surroundings get similar offsets.

    Except that this is not what the weather scientists were doing for many years. Instead of looking at the stations, they were trying to compare the long runs of data from each station statistically, and deduce the station condition and the required offset from that. That involved using a complex computer program that has been proved to be buggy, and was kept secret for a long time in defiance of all principles of scientific research – but that’s not the basic flaw. The real flaw is that any such methodology will either average the good stations with the bad, to produce data that’s influenced by the bad measurements only not quite as much – or it will assume some baseline trend and “correct” the data to match. So now, someone is beginning to look at the actual stations and what they’re finding is bad…

  • TCO

    I hope that is not considered a numerate response or a citation of a calibration study. I’m well aware that there are large variations in temp over time, often larger than the size of a POSITED bias.