When not to cite.

What physicist would ever reference Isaac Newton these days? I wanted to, once, just as a lark, so I tried to find a citation for the Principia by looking in the Web of Science. (I admit I wasn’t going to read the book, like I ought to have.) But, what did I find? All the recent references for “Newton I” were about birds. Isaac seemingly became an ornithologist in his old age! The point is, no one references the original Newton any more, since he’s in all the textbooks. The birds are modern. In the end, I gave up on the idea, published my paper without a reference to Isaac, and went on.

Citation is one of the cores of science, but does that mean you need to cite everyone? This blog post worries about people falsely claiming to be the first experiment on a given topic, perhaps as a result of an incomplete literature search. That’s not good, but how bad is it?

Except perhaps for the most expensive, life-critical research, there is no real advantage to exhaustive citation. What does it gain society? Sure, you don’t want to waste money repeating experiments that have been done 100 times, but science does actually need a certain amount of replication of prior work. If some modest extra amount of replication happens by accident, no big deal.

Obviously, some people will disagree with me. Some people seemingly want to use references to allocate credit in science. (This sounds like a bad idea to me: references should be there to help the reader, not to help the research manager. Make references financially important, and everyone will game the system, leading to a system that doesn’t educate the reader.) Some people are worried about the danger of clinical trials. (Which is rather a better reason, but there is more to science than clinical trials.)

As another objector, the editors of the Journal of Clinical Investigation clearly put a lot of emphasis on being first: so much that they would prefer to reject a paper that replicates someone else’s work. But, is that the right approach? Even if you believe everyone’s statistics, most results have a built-in 5% chance of being wrong, because people tend to use a “confidence level” of 5%. In reality, there are other sources of error too. Mistakes, misunderstandings, misinterpretations and mis-communications.

For instance, I’ve talked to a researcher who told me a story about some fMRI work of his. fMRI compares brain activity in two different experimental conditions and draws you a map of which parts of the brain are used more in one condition than the other. This guy had done some nice experiments, had run the analysis, had got the map, and had written a careful, well-informed discussion connecting their observations of active brain areas with linguistic theories. It was nearly ready to publish, until one last check showed that they had accidentally gotten the order of the two experimental conditions backwards in the analysis. As a result, the regions that they thought were especially active were actually the especially inactive regions. Oops; time to rewrite those conclusions.

He told this story as a warning of how easy it is to convince yourself that your data fits your favourite theory. But, you can also look at it as an error that was barely caught. So, doubtless there are occasional fMRI papers that don’t get that final check and publish results that are exactly backwards. This is a source of error that won’t be caught by the statistics.

In some fields (like linguistics and psychology) it is sometimes impossible to design a perfect experiment where everything is identical except for a single factor. One has to assume that the other uncontrolled factors are relatively unimportant. And, maybe they are, but doubtless sometimes they aren’t. Such problems with experimental designs are hard or impossible to include in your statistical analysis, so this is yet another possible source of errors.

Ultimately, statistics won’t save us from all errors. (Especially if people do their statistics wrong, which doubtless happens occasionally.) Ultimately, the thing that makes science work reliably is that interesting results get replicated, preferably by different people in different conditions.

So, here’s a more-or-less good reason not to cite a paper: Imagine that there’s a paper out there that claims to answer a research question, but you don’t quite trust it. What do you do? Yes, you could reference it and laboriously try to describe your suspicions. Unfortunately, a description of someone else’s errors is normally speculation, so you’re likely to be wrong in detail. What, then, does the reader gain by your attempt? Referencing a dubious paper without giving your suspicions would merely give the reader the wrong impression. Consequently, the best strategy — from an overall perspective — may just be to ignore it.

And, of course, sometimes you just don’t know what to say about a paper. Or, you really don’t know if it applies, or…

I talked about this post at lunch today, and (of course) half the people disagreed with me. The best argument against this post — and it’s pretty good — seems to be that this might give people license to ignore opposing views. Freedom to pick and choose your references also might be lead people to take a complicated situation (where people disagree, or where it is not clear exactly what to do) and ignore the complexity. Of course, I’m not recommending either of those bad behaviors.

January 21, 2011

gpk

academics, ethics, publishing and copyright, science and how it works, writing

citations, communication, errors, impact, references, research, science, who-watches-the-watchers, writing

One response to “When not to cite.”

Kami says:

2011-02-07 at 9:08 am

Hi!
La verdad es que a mi también me parece un tanto peligroso otorgar licencia para ignorar fuentes relacionadas con tu trabajo. Entre otras razones, y además de las que tú bien refieres, creo que atenta en contra del avance ‘acumulativo’ de la ciencia. Diferentes visiones van otorgando diferentes matices al estudio del fenómeno, por lo que arbitrariamente dejar ciertas visiones de lado en la bibliografía tiende a atentar en contra de esta cadena colectiva de construcción de conocimiento.
Además, creo que no citar cierta posición puede incitar a errores posteriores en quienes lean tu trabajo y quieran construir sobre él. Podría suceder que quien lea tu investigación tenga la idea de indagar justamente sobre el aspecto que tú no mencionaste pensando que es una idea original.

Debo agregar que desde el punto de vista del estudiante, siempre se agradecen las críticas y discusiones entre académicos más experimentados. Recuerdo haber entrado al área de la lingüística con relativa ingenuidad y un tanto de excesiva confianza en lo “científico” para, en el transcurso del año recién pasado, encontrarme con varias polémicas, críticas y discusiones, lo que indudablemente incrementó mi visión crítica respecto de lo que es un paper científico. En este sentido, yo apoyaría la visión de una crítica concienzuda, no necesariamente exhaustiva, de la visión opositora en pro del beneficio de quienes siguen tu trabajo.

No hay que negar que el tema es complejo ya que en el medio de toda esta polémica también se encuentran los egos individuales, etc.[ej. investigadores que critican a aquellos más conocidos buscando ‘colgarse de su fama’ :)] sin embargo en lo general, yo decidiría por citar.

En lo que concuerdo 100% contigo es tu opinión respecto de la replicación de experimentos.

Un blog realmente interesante!! Para mi ha sido un verdadero aporte leerte.

Saludos!