Alignment bias in genomics

I have previously written a bit about how optimal alignment algorithms introduce an alignment bias and even done some work on it myself (currently submitted for publication, so I cannot link to it yet). Today I saw a paper in the current issue of Science addressing the same problem.

A summary can be found in

Lining Up to Avoid Bias

Antonis Rokas

Science Vol. 319. no. 5862, pp. 416 – 417

and the full paper (probably requires a subscription) is

Alignment Uncertainty and Genomic Analysis

Karen M. Wong, Marc A. Suchard, and John P. Huelsenbech

Science Vol. 319. no. 5862, pp. 473 – 476

The problem with alignments

I’ve already described the problem in the previous post, where I used the examples from Gerton Lunter’s paper

Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes

G. A. Lunter

Bioinformatics 2007; DOI: 10.1093/bioinformatics/btm185

although there the focus was on the problems with indels. Of course, without indels there simply isn’t any problem with alignment, so that is not as unreasonable as it might sound.

Essentially, the problem is that we use algorithms to infer optimal alignments and then treat these alignments as absolute truth, ignoring the uncertainty in the inference.

In Wong et al. they compare seven different alignment algorithms and consider typical evolutionary analysis — inference of phylogenies and detecting selection — based on the inferred alignments, and see a large variability of analysis result dependent on inference method.

The solution proposed in Wong et al. is the same as Gerton proposes: statistical alignmentet methods. Quoting Wong et al.:

The problem of alignment uncertainty in genomic studies, identified here, is not a problem of sloppy analysis. Many comparative genomics studies are carefully performed and reasonable in design. However, even carefully designed and carried out analyses can suffer from these types of problems because the methods used in the analysis of the genomic data do not properly accommodate alignment uncertainty in the first place.

In a comparative genomics study, we advocate that alignment be treated as a random variable, and inferences of parameters of interest to the genomicist, such as the amount of nonsynonymous divergence or the phylogeny, consider the different possible alignments in proportion to their probability.

Of course, this is what the statistical alignment people in Oxford have been trying for years and it is not quite as easy as it sounds.


Citations, for Research Blogging:Rokas, A. (2008). GENOMICS: Lining Up to Avoid Bias. Science, 319(5862), 416-417. DOI: 10.1126/science.1153156Wong, K.M., Suchard, M.A., Huelsenbeck, J.P. (2008). Alignment Uncertainty and Genomic Analysis. Science, 319(5862), 473-476. DOI: 10.1126/science.1151532

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , ,

2 Responses to “Alignment bias in genomics”

  1. Uncertainty in the alignment « paradoxus Says:

    [...] given a basic summary of the emerging statistical procedures to tackle alignment uncertainty, and Thomas Mailund has a bit more [...]

  2. Mailund on the Internet ยป Uncertainty in inferred alignments Says:

    [...] that is typically ignored when doing comparative genomics. For two others, see my reviews: Alignment bias in genomics and Probabillistic whole-genome alignments reveal high indel rates in the human and mouse genomes. [...]

Leave a Reply