references – Science and Language

What physicist would ever reference Isaac Newton these days? I wanted to, once, just as a lark, so I tried to find a citation for the Principia by looking in the Web of Science. (I admit I wasn’t going to read the book, like I ought to have.) But, what did I find? All the recent references for “Newton I” were about birds. Isaac seemingly became an ornithologist in his old age! The point is, no one references the original Newton any more, since he’s in all the textbooks. The birds are modern. In the end, I gave up on the idea, published my paper without a reference to Isaac, and went on.

Citation is one of the cores of science, but does that mean you need to cite everyone? This blog post worries about people falsely claiming to be the first experiment on a given topic, perhaps as a result of an incomplete literature search. That’s not good, but how bad is it?

Except perhaps for the most expensive, life-critical research, there is no real advantage to exhaustive citation. What does it gain society? Sure, you don’t want to waste money repeating experiments that have been done 100 times, but science does actually need a certain amount of replication of prior work. If some modest extra amount of replication happens by accident, no big deal.

Obviously, some people will disagree with me. Some people seemingly want to use references to allocate credit in science. (This sounds like a bad idea to me: references should be there to help the reader, not to help the research manager. Make references financially important, and everyone will game the system, leading to a system that doesn’t educate the reader.) Some people are worried about the danger of clinical trials. (Which is rather a better reason, but there is more to science than clinical trials.)

As another objector, the editors of the Journal of Clinical Investigation clearly put a lot of emphasis on being first: so much that they would prefer to reject a paper that replicates someone else’s work. But, is that the right approach? Even if you believe everyone’s statistics, most results have a built-in 5% chance of being wrong, because people tend to use a “confidence level” of 5%. In reality, there are other sources of error too. Mistakes, misunderstandings, misinterpretations and mis-communications.

For instance, I’ve talked to a researcher who told me a story about some fMRI work of his. fMRI compares brain activity in two different experimental conditions and draws you a map of which parts of the brain are used more in one condition than the other. This guy had done some nice experiments, had run the analysis, had got the map, and had written a careful, well-informed discussion connecting their observations of active brain areas with linguistic theories. It was nearly ready to publish, until one last check showed that they had accidentally gotten the order of the two experimental conditions backwards in the analysis. As a result, the regions that they thought were especially active were actually the especially inactive regions. Oops; time to rewrite those conclusions.

He told this story as a warning of how easy it is to convince yourself that your data fits your favourite theory. But, you can also look at it as an error that was barely caught. So, doubtless there are occasional fMRI papers that don’t get that final check and publish results that are exactly backwards. This is a source of error that won’t be caught by the statistics.

In some fields (like linguistics and psychology) it is sometimes impossible to design a perfect experiment where everything is identical except for a single factor. One has to assume that the other uncontrolled factors are relatively unimportant. And, maybe they are, but doubtless sometimes they aren’t. Such problems with experimental designs are hard or impossible to include in your statistical analysis, so this is yet another possible source of errors.

Ultimately, statistics won’t save us from all errors. (Especially if people do their statistics wrong, which doubtless happens occasionally.) Ultimately, the thing that makes science work reliably is that interesting results get replicated, preferably by different people in different conditions.

So, here’s a more-or-less good reason not to cite a paper: Imagine that there’s a paper out there that claims to answer a research question, but you don’t quite trust it. What do you do? Yes, you could reference it and laboriously try to describe your suspicions. Unfortunately, a description of someone else’s errors is normally speculation, so you’re likely to be wrong in detail. What, then, does the reader gain by your attempt? Referencing a dubious paper without giving your suspicions would merely give the reader the wrong impression. Consequently, the best strategy — from an overall perspective — may just be to ignore it.

And, of course, sometimes you just don’t know what to say about a paper. Or, you really don’t know if it applies, or…

I talked about this post at lunch today, and (of course) half the people disagreed with me. The best argument against this post — and it’s pretty good — seems to be that this might give people license to ignore opposing views. Freedom to pick and choose your references also might be lead people to take a complicated situation (where people disagree, or where it is not clear exactly what to do) and ignore the complexity. Of course, I’m not recommending either of those bad behaviors.