Archive for January 21st, 2008

Did insects kill the dinosaurs?

Monday, January 21st, 2008

Here’s an interesting story: Insect Attack May Have Finished Off Dinosaurs.

Apparently, a lot of disease carrying insects appeared at the time of the dinosaur mass extinction. If the dinosaurs’ immune system was not up to the task of defending the host against this, that might be what killed them all off.

“We can’t say for certain that insects are the smoking gun, but we believe they were an extremely significant force in the decline of the dinosaurs,” Poinar said. “Our research with amber shows that there were evolving, disease-carrying vectors in the Cretaceous, and that at least some of the pathogens they carried infected reptiles. This clearly fills in some gaps regarding dinosaur extinctions.”

Personally, I know nothing about dinos and cannot judge if this is a reasonable theory or not, but I did find it an interesting read.

A study of duplicate citations in Medline

Monday, January 21st, 2008

In the latest issue of Bioinformatics, there’s a paper on duplicated publications:

Déjà vu—A study of duplicate citations in Medline

M. Errami et al.

Motivation: Duplicate publication impacts the quality of the scientific corpus, has been difficult to detect, and studies this far have been limited in scope and size. Using text similarity searches, we were able to identify signatures of duplicate citations among a body of abstracts.

Results: A sample of 62 213 Medline citations was examined and a database of manually verified duplicate citations was created to study author publication behavior. We found that 0.04% of the citations with no shared authors were highly similar and are thus potential cases of plagiarism. 1.35% with shared authors were sufficiently similar to be considered a duplicate. Extrapolating, this would correspond to 3500 and 117 500 duplicate citations in total, respectively.

They have gone text mining looking for significant (textual) overlap between papers, spotting both cases of plagiarism and of duplicated papers from the same authors.

Both situations are unethical. Plagiarism is plain old stealing — the scientific ideas of a scientist is the most important contributions of the scientist, so if someone else steals those ideas, it is probably the worse thing that can happen. At least there were only a few cases of plagiarism.

Duplicated publications are just annoying. It when I am discovering halfway through a paper that I have read it elsewhere from a different journal. Of course, if it is an exact duplication I will discover it earlier than half-way through, but on several occasions it is somewhat re-written but the results are exactly the same as a previous paper. The cited paper discovers 1.35% of duplications, but how well their text-mining spots duplicated results with slightly re-written papers, I don’t know.

In any case, they only compare abstracts, and I don’t remember a case where I have spotted a duplication based on the abstract.

If you want to browse their discovered duplications, you can find their database here.

My own duplications

I have a few duplications myself, I must admit, but except for one case (which I’ll get back to below), those are journal Special Issue version of conference contributions. What happens there is that a subset of the conference contributions are selected for journal publication (in most cases in a slightly extended version).

In such cases, where it is blindly obvious that it is a duplication of a conference paper (the journal makes that very explicit) I don’t see any problems with duplications. The Bionformatics paper agrees: Quoting from the paper:

While some duplications may be justified, arguably to promote wider dissemination or to provide important updates to clinical trials, surreptitious duplications that are covert and do not properly acknowledge the original work are unethical.

The last case of duplicated publications for me is the two papers

Algorithms for Computing the Quartet Distance between Trees of Arbitrary Degree
C. Christiansen, T. Mailund, C.N.S. Pedersen, and M. Randers
Proceedings of Workshop on Algorithms in Bioinformatics (WABI), 2005, LNBI 3692, pp. 77-88 © Springer-Verlag.

Quartet Distance between General Trees (extended abstract)
C. Christiansen, T. Mailund, C.N.S. Pedersen, and M. Randers
Proceedings of International Conference on Numerical Analysis and Applied Mathematics (ICNAAM) 2005, pp. 796-799 © Wiley-VCH Verlag GmbH & Co.

and there is a bit of a story behind this.

We first submitted to WABI, but then discovered an error in the paper that we couldn’t fix — it was in the time analysis of one of the algorithms where we had claimed O(n2) but couldn’t get below O(n2d2). So we retracted the paper from WABI, fixed the analysis, and submitted to ICNAAM where it got accepted as well. The retraction was ignored, however, despite several emails to the PC chairs, so in the end we had to submit a final version. Since the ICNAAM version is just an extended abstract and the WABI paper is full length, we figured we could justify this, but it is a bit borderline, I think.

Acceptable duplication?

Determining if a paper is a duplicate based on only text similarity is a bit unsafe, of course. I tend to describe the problems I am working on, related work, consequences, etc. in similar terms from paper to paper. I try to avoid phrasing it the same, but it is hard not to do, and I know several cases where the introduction section of my papers read very similar.

I personally don’t see a problem in this, if the results presented are novel, but I guess it is a bit borderline as well.

Smallest publishable increment

Something that annoys me more than duplicated publications, though, is papers describing tiny increments on existing results. These papers mean that you have to read 4-5 papers to get the information the could easily be contained in a single paper.

Now, some of this is unavoidable. If the authors get an idea after the original idea is published — this has happened to me a few times — the choice is either never to publish, or to publish a minor increment. But with some authors — I could name names but I won’t — more than half the papers are tiny increments to previous ideas. This tells me that they either publish way too early, or that the willfully try to get as many publications out of as little thinking as they can.

Why bother?

Why would you publish the same results twice, or publish tiny increments?

It will boost the number of publications, but who cares about that? Even the silliest bureaucrats have figured out that what matters is impact.

If you want to boil impact down to a single number, so you can reduce the quality of a research to something that is easily measured, you don’t use the number of publications. You pick something like the h-index or such. There, the number of publications matters, but only if people cite them. You are better off with 10 papers cited 10 times each than with 50 papers cited 5 times each.

Duplicating publications doesn’t lead to greater impact. Citing the Bioinformatics paper again:

In the Duplicate/DA category, however, we observed that duplications were predominantly in journals with no impact factor and that these articles were rarely cited. If the primary value of a publication is to disseminate scientific findings and knowledge, it is not accomplished by publications in this category, so one must question the intent of the author of a Duplicate/DA publication.

In short, duplicated publications will not increase the impact, so why bother?


The citation, for Research Blogger:
Errami, M., Hicks, J.M., Fisher, W., Trusty, D., Wren, J.D., Long, T.C., Garner, H.R. (2007). Deja vu A study of duplicate citations in Medline. Bioinformatics, 24(2), 243-249. DOI: 10.1093/bioinformatics/btm574