Script us not into temptation

I’m teaching a statistics course and everyone wants to use GUI tools.  The only trouble is, they’ll give you the wrong answer.

Why?   Not because they are mathematically wrong (well, doubtless they have some errors) but because they make it unpleasant to fix your mistakes, and one of the defining features of humans is that we make mistakes.  So, they give you the wrong answer because you give them the wrong data or the wrong commands.

Consider these scenarios:

  1. You have an analysis that takes 3 hours.  It’s done on a spreadsheet, with a certain amount of cutting and pasting.  You get an answer.
    • Later, walking home, you wonder if you had the frammistan tolerance was adjusted properly.
  2. You are working on a spreadsheet.  The phone rings, when you’re in the middle of refribbing the data.  Because of this, you miss item 146.
    • When you get back to work, you wonder exactly how far you got.  145?  146?   Somewhere around there.
  3. You’re doing an analysis you’ve never done before.  The software manual focusses on what buttons to push and statistical jargon.  You find yourself in the midst of a multivariate limited-influence test for hypersphericity, and you choose column C for the “indicator variable”.
    • Later, you learn that an indicator variable isn’t quite what you thought it was.
  4. You do an analysis.
    • When you try to publish the paper, months later, a reviewer asks about a detail.  It wasn’t a detail you thought about when you did the analysis.

In all of these scenarios, you have this basic choice: do you (a) repeat the whole analysis, or (b) shrug and publish?  GUI-based statistical software makes (b) tempting because (a) is a lot of extra work.  Oh, sure, in many situations, we’ll do the right thing.  But, what about when you have faint suspicions?

Let’s suppose that a faint suspicion corresponds to a 1% chance that we did something  wrong.  (And it could be more than that.)  What does it imply?

  • First, there’s no point in doing any statistical test beyond the 99% confidence level.  If there’s even a 1% chance you made a mistake, then 99.99% confidence (as reported by the statistics package) is really 99%,  99% reported confidence is 98%, and 95% reported confidence is 94%.  Naive readers of your papers will take your conclusions literally.
  • Second, it’s pretty obvious that one shouldn’t trust a large or complex analysis done with GUI tools.   The chance of making a mistake will grow as the analysis gets bigger.  In response, people will use overly simple analyses for complex problems.

From a human point of view, it is bad practice to tempt people to shrug and publish.  As a species, we’ve learned that temptations are best avoided because we aren’t nearly perfect at ignoring them.  We’ve learned that governmental power should not be concentrated in a single person.  We’ve learned that financial things need to be audited occasionally, otherwise money just evaporates.  We’ve learned that locks and keys are good ways to reduce the temptation of theft.

So, what we’d really like is a way to do an analysis that has these properties:

  1. You have a record of exactly what you did.  This lets you check your work, and it lets you write the paper accurately, even after months of forgetting.
  2. If you find a mistake, you can repair it and re-do the analysis easily.
  3. If you suspect that you might be doing the wrong analysis, you should be able to run a test analysis on some data where you already know the answer.
  4. There should be no invisible mistakes.
  5. There are no subjective selection of data.
  6. You should be able to put checks into the analysis.

The answer is to do you statistics and data analysis in a script, rather than a spreadsheet.

  1. Your scripts become a log of your analysis (if you don’t recycle your scripts, and you keep track of where the input came from and where the output went to).  You can always look back at your script, to make sure.
  2. If you find a mistake, you just have to run the script again.  Let the computer do the repetitive work.
  3. Design your scripts well, and you can test them on data sets (or simulations) that you understand.  You can confirm that they actually do the right thing.
  4. Mistakes in the script will be visible.  If your script does all the data manipulation, then there will be no accidents and typos.
  5. If you must select data, it is best done by rule.  That reduces the possibility that you bias the results, and also makes it easier to explain how you selected the data in the eventual paper.
  6. A lot of languages have a wonderful thing called the “assert” statement.  For instance, if you think the number here should always be positive, you just “assert here > 0”.   If it’s true, nothing happens.   If it turns out to be false, the assert statement crashes your analysis.  That’s good, because it lets you find some kinds of problems that aren’t obvious in the final analysis.