Heads or tails and reliable alignments
I have on several occasions written about the uncertainty inherent in inferred alignments and how this is a potential problem. I hadn't really thought it would be quite so serious as the results in the paper I just read:
Heads or Tails: A Simple Reliability Check for Multiple Sequence Alignments
Giddy Landan and Dan Graur
Molecular Biology and Evolution 2007 24(6):1380-1383; doi:10.1093/molbev/msm060
The question of multiple sequence alignment quality has received much attention from developers of alignment methods. Less forthcoming, however, are practical measures for addressing alignment quality issues in real life settings. Here, we present a simple methodology to help identify and quantify the uncertainties in multiple sequence alignments and their effects on subsequent analyses. The proposed methodology is based upon the a priori expectation that sequence alignment results should be independent of the orientation of the input sequences. Thus, for totally unambiguous cases, reversing residue order prior to alignment should yield an exact reversed alignment of that obtained by using the unreversed sequences. Such "ideal" alignments, however, are the exception in real life settings, and the two alignments, which we term the heads and tails alignments, are usually different to a greater or lesser degree. The degree of agreement or discrepancy between these two alignments may be used to assess the reliability of the sequence alignment. Furthermore, any alignment dependent sequence analysis protocol can be carried out separately for each of the two alignments, and the two sets of results may be compared with each other, providing us with valuable information regarding the robustness of the whole analytical process. The heads-or-tails (HoT) methodology can be easily implemented for any choice of alignment method and for any subsequent analytical protocol. We demonstrate the utility of HoT for phylogenetic reconstruction for the case of 130 sequences belonging to the chemoreceptor superfamily in Drosophila melanogaster, and by analysis of the BaliBASE alignment database. Surprisingly, Neighbor-Joining methods of phylogenetic reconstruction turned out to be less affected by alignment errors than maximum likelihood and Bayesian methods.
In this paper they analyse the quality of multiple sequence alignments in an extremely simple manner: They first align the sequences left to right, then reverse them to essentially align them right to left. Unless the alignment algorithm has a preferred order of symbols, you'd expect to get the same alignment going left to right as right to left.
Not always, of course: if the algorithm is based on oligonucleotides or such, then the order matters, but in many cases it doesn't.
Comparing head and tail alignments
When the order shouldn't matter, the left-to-right and right-to-left alignments (head and tail alignments in the paper) should be similar, so comparing them should give an indication of how much faith you can have in the inferred alignment.
They try this out on a family of 130 amino acid sequences of length around 400 using three different alignment tools. This is the result:
What is causing this?
There is uncertainty in inferring alignments, but why would the same algorithm give different results when running left-to-right compared to right-to-left?
As far as I can see, there are two different things going on here. One having to do with there being more than one optimal alignment (also discussed in the paper), and one having to do with heuristics in searching for optimal alignments.
When there are more than one optimal alignment (which is often the case), even algorithms guaranteed to find an optimal alignment will give you and arbitrary one (though usually a deterministic arbitrary choice). The arbitrary choice can easily differ between running left-to-right or right-to-left.
For multiple sequence alignments, it is computational infeasible to guarantee to compute an optimal alignment, and heuristics are used to search for (near or locally) optimal alignments. This is often some variation on a greedy strategy, and each choice there will potentially lead to a different alignment. It is easy to see how left-to-right and right-to-left alignments can be different with such a strategy.
In any case, the take home message is, once again: don't trust alignments!
Landan, G., Graur, D. (2007). Heads or Tails: A Simple Reliability Check for Multiple Sequence Alignments. Molecular Biology and Evolution, 24(6), 1380-1383. DOI: 10.1093/molbev/msm060