The CRU Climate Leak – 2


PJP took me to task (in a comment) for not reading enough, and pointed to …/FOIA/documents/HARRY_READ_ME.txt .    That’s a 15,000 line stream-of-consciousness or stream-of-work log that has been discussed elsewhere [1, 2, 3, 4].  The best discussion I’ve found seems to be [5].

It must be said that this is a rather odd document.  I’ve never seen such an extensive log in real life.   And, very few people are so voluble in their comments when they’re in the midst of an analysis.  Almost invariably, the problem is that people keep insufficient or terse notes.

That file clearly indicates that this part of the data analysis was out of control (at least for the period covered by that log).  Now, it could be true that (a) either they didn’t use the databases produced in the HARRY_READ_ME.txt session, or (b) someone later spent several months going over everything with a fine-toothed comb.  It is sometimes true that a part of a project goes out of control without affecting the end result.

But, Harry (if that’s the right name) is clearly out of his depth, either that or he has been given an impossibly complex task.  Either way, the results aren’t to be trusted.  It seems to indicate some serious management problems or some serious IT support problems, unless Harry is acting as a loose cannon within the group.

Harry’s not attacking the problem the way I’d do it/tell my postdocs to do it.   I’d build lots of little scripts, run by one big script so that — in the end — you could just push the “go” button and the processing would be done, as scripted.    The scripts then provide documentation on what you have done, and if you later find an error, you can easily fix the script and re-run it.  Trying to do things manually is bad for a complex analysis because it is hard to keep track of what you have done, and if/when you find a mistake, you need to repeat a lot of manual work, which is always painful.

Of course, the postdoc might choose not to listen.  I suppose, this could potentially be the log of someone learning the hard way.  I wonder who Harry is.  Is this a student project?

Harry seems stuck in a swamp of programs that don’t behave and that he doesn’t quite understand.  He’s clearly trying to deal with a large and difficult data set that doesn’t have sufficient documentation.

It needs to be said that Harry is trying hard, doing basic sanity checks on his results, and doing a lot of little things right.  Harry seems (based on my reading of a few thousand lines) to be honest and trying to get real answers.  What’s amazing is that the log doesn’t contain any obvious indication that anyone was helping Harry.   Search for “talk” or “told”: nothing relevant.  “Discussed”: only once.  It seems to me that this ought to have been a major group crisis.   Why not?   Was it somehow not going to be important?   Was it somehow ignored?   Was Harry someone who hated to talk to people?   I don’t know.

This reminds me of the field of Robotics in the early 1980s.   One of the important problems in the field at the time was to take a box of mixed-up metallic parts and pick one part out of the box.  It was a challenging problem: you needed computer vision and very clever algorithms to recognize a part that’s rotated and partially buried under other parts.  You need sophisticated algorithms to choose a way to grip this part by whatever sticks up.  Then, you need massive cleverness to figure out how to rotate it and place it whereever it goes without crashing your robot’s grippers into something delicate.

Then, around 1985, somebody brilliant decided that the right solution was to fire the guy who mixed up all the little metal parts in the first place.  Around that time, people started arranging for the parts to come from the factory in plastic trays that held the parts in nice arrays, all with the same side pointing up.  At a stroke, the central challenge of the field was eliminated.

Of course, it’s easy to say “Don’t let it get messed up.”   That is the correct answer, but what do you do if it has already gotten messed up?  That’s a harder question.