Alignment bias in genomics

I have previously written a bit about how optimal alignment algorithms introduce an alignment bias and even done some work on it myself (currently submitted for publication, so I cannot link to it yet). Today I saw a paper in the current issue of Science addressing the same problem.

A summary can be found in

Lining Up to Avoid Bias

Antonis Rokas

Science Vol. 319. no. 5862, pp. 416 – 417

and the full paper (probably requires a subscription) is

Alignment Uncertainty and Genomic Analysis

Karen M. Wong, Marc A. Suchard, and John P. Huelsenbech

Science Vol. 319. no. 5862, pp. 473 – 476

The problem with alignments

I’ve already described the problem in the previous post, where I used the examples from Gerton Lunter’s paper

Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes

G. A. Lunter

Bioinformatics 2007; DOI: 10.1093/bioinformatics/btm185

although there the focus was on the problems with indels. Of course, without indels there simply isn’t any problem with alignment, so that is not as unreasonable as it might sound.

Essentially, the problem is that we use algorithms to infer optimal alignments and then treat these alignments as absolute truth, ignoring the uncertainty in the inference.

In Wong et al. they compare seven different alignment algorithms and consider typical evolutionary analysis — inference of phylogenies and detecting selection — based on the inferred alignments, and see a large variability of analysis result dependent on inference method.

The solution proposed in Wong et al. is the same as Gerton proposes: statistical alignmentet methods. Quoting Wong et al.:

The problem of alignment uncertainty in genomic studies, identified here, is not a problem of sloppy analysis. Many comparative genomics studies are carefully performed and reasonable in design. However, even carefully designed and carried out analyses can suffer from these types of problems because the methods used in the analysis of the genomic data do not properly accommodate alignment uncertainty in the first place.

In a comparative genomics study, we advocate that alignment be treated as a random variable, and inferences of parameters of interest to the genomicist, such as the amount of nonsynonymous divergence or the phylogeny, consider the different possible alignments in proportion to their probability.

Of course, this is what the statistical alignment people in Oxford have been trying for years and it is not quite as easy as it sounds.


Citations, for Research Blogging:Rokas, A. (2008). GENOMICS: Lining Up to Avoid Bias. Science, 319(5862), 416-417. DOI: 10.1126/science.1153156Wong, K.M., Suchard, M.A., Huelsenbeck, J.P. (2008). Alignment Uncertainty and Genomic Analysis. Science, 319(5862), 473-476. DOI: 10.1126/science.1151532

Author: Thomas Mailund

My name is Thomas Mailund and I am a research associate professor at the Bioinformatics Research Center, Uni Aarhus. Before this I did a postdoc at the Dept of Statistics, Uni Oxford, and got my PhD from the Dept of Computer Science, Uni Aarhus.

2 thoughts on “Alignment bias in genomics”

Leave a Reply