The CRU climate leak – ouch.

The Times is saying that the “raw” climate data was dumped back in the 1980s.  Ouch.

I assume that the really raw records still exist; the old stuff is still in logbooks in libraries.   Presumably, temperature measurements are archived by the people who recorded them, so they should still be available, even if scattered across the globe.  If the CRU people threw away irreplaceable data, that would be another matter entirely.  If so, and if the loss is substantial, it may be one of science’s disasters, eclipsing Piltdown Man and Cold Fusion.   But, hopefully, it’s just a matter of a few  man-years to collect all the scattered pieces.

Given that

  • we do not know how (in detail) the available processed data (e.g. CRUTEM3, HadCRUT3v)  came about,
  • there is reasonable doubt that it was done correctly (e.g. here or here or here), and
  • this stuff is  important – it is a part of the basis for trillion-dollar changes in the global economy,

I see no alternative except opening all the relevant remaining data and code of the CRU, either to the public or to investigators.   It may not be possible to tell if the published data is correct and trustworthy.  If not, somebody will have to do it all again: collecting the records, entering them, dealing with all the messiness of inconsistent station names, changing data scalings, et cetera.

Of course, HARRY_READ_ME.txt may be a worst-case example, not the normal processing practice.  RealClimate.org says that it was not used in the main project, but rather was “…associated with the legacy CRU TS 2.1 product, which is not the same as the HadCRUT data (see Mitchell and Jones, 2003 for details). The CSU TS 3.0 is available now (via ClimateExplorer for instance), and so presumably the database problems got fixed.”

Also, when one is involved in an analysis task, it is natural to talk about the problems.   That’s because you’re always thinking about the problems, trying to solve them.  You don’t think about the problems you’ve solved; they get forgotten.  Personally, I often find it a challenge, when someone asks “How’s it going?” to avoid giving them a data dump on all my recent bugs, the things I don’t understand, and the problems that I am trying to fix.  I have a lot of sympathy with Harry, in fact, and rather suspect that the analysis wasn’t going nearly as badly as it looks in HARRY_READ_ME.txt; he was focussing on the things to solve and ignoring what he had solved.

But, all this notwithstanding, there is sufficient doubt and this is sufficiently important that we should not be depending upon “presumably.”

Note that there may have been good reasons why the data could not have been made accessible.  For instance, RealClimate.org says this:

From the date of the first FOI request to CRU (in 2007), it has been made abundantly clear that the main impediment to releasing the whole CRU archive is the small % of it that was given to CRU on the understanding it wouldn’t be passed on to third parties. Those restrictions are in place because of the originating organisations (the various National Met. Services) around the world and are not CRU’s to break.”

While that may be true, it is time for the originating organizations to allow access.

But, this excuse is also somewhat of a straw man.  If it is just a small percentage that is restricted, the CRU could have (in my opinion, should have) released the remainder.  A not-quite-global dataset would still have had value.  If CRU has/had their data well organized, this should not have been particularly difficult.  Data should have been tagged with its source, and a sweep through the raw data with a awk script should have removed all the data with restrictions.  Based on some of the leaked scripts, it seems the data may not have been organized to easily allow that, and it may have became impossible when they discarded the raw data.

In my opinion, if the data is managed properly, a request for the unrestricted data should have been no worse than a moderate distraction and annoyance.  A request for the restricted data should have been met by an offer of the unrestricted data.  (Perhaps along with some private grumbling.)

Incidentally, I applaud the RealClimate.org Data Sources page, and hope it prospers.