I have previously written a bit about how optimal alignment algorithms introduce an alignment bias and even done some work on it myself (currently submitted for publication, so I cannot link to it yet). Today I saw a paper in the current issue of Science addressing the same problem.
A summary can be found in
Lining Up to Avoid Bias
Antonis Rokas
Science Vol. 319. no. 5862, pp. 416 – 417
and the full paper (probably requires a subscription) is
Alignment Uncertainty and Genomic Analysis
Karen M. Wong, Marc A. Suchard, and John P. Huelsenbech
Science Vol. 319. no. 5862, pp. 473 – 476
The problem with alignments
I’ve already described the problem in the previous post, where I used the examples from Gerton Lunter’s paper
Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes
G. A. Lunter
Bioinformatics 2007; DOI: 10.1093/bioinformatics/btm185
although there the focus was on the problems with indels. Of course, without indels there simply isn’t any problem with alignment, so that is not as unreasonable as it might sound.
Essentially, the problem is that we use algorithms to infer optimal alignments and then treat these alignments as absolute truth, ignoring the uncertainty in the inference.
In Wong et al. they compare seven different alignment algorithms and consider typical evolutionary analysis — inference of phylogenies and detecting selection — based on the inferred alignments, and see a large variability of analysis result dependent on inference method.
The solution proposed in Wong et al. is the same as Gerton proposes: statistical alignmentet methods. Quoting Wong et al.:
The problem of alignment uncertainty in genomic studies, identified here, is not a problem of sloppy analysis. Many comparative genomics studies are carefully performed and reasonable in design. However, even carefully designed and carried out analyses can suffer from these types of problems because the methods used in the analysis of the genomic data do not properly accommodate alignment uncertainty in the first place.
…
In a comparative genomics study, we advocate that alignment be treated as a random variable, and inferences of parameters of interest to the genomicist, such as the amount of nonsynonymous divergence or the phylogeny, consider the different possible alignments in proportion to their probability.
Of course, this is what the statistical alignment people in Oxford have been trying for years and it is not quite as easy as it sounds.
Citations, for
Research Blogging:
Rokas, A. (2008). GENOMICS: Lining Up to Avoid Bias.
Science, 319(5862), 416-417. DOI:
10.1126/science.1153156Wong, K.M., Suchard, M.A., Huelsenbeck, J.P. (2008). Alignment Uncertainty and Genomic Analysis.
Science, 319(5862), 473-476. DOI:
10.1126/science.1151532