StatAlign: a new statistical alignment tool
There’s an application note in the current issue of Bioinformatics that describes a new tool for statistical alignment, StatAlign, developed in my old group in Oxford.
Ádám Novák , István Miklós, Rune Lyngsø and Jotun Hein
Bioinformatics 2008 24(20):2403-2404
Motivation: Bayesian analysis is one of the most popular methods in phylogenetic inference. The most commonly used methods fix a single multiple alignment and consider only substitutions as phylogenetically informative mutations, though alignments and phylogenies should be inferred jointly as insertions and deletions also carry informative signals. Methods addressing these issues have been developed only recently and there has not been so far a user-friendly program with a graphical interface that implements these methods.
Results: We have developed an extendable software package in the Java programming language that samples from the joint posterior distribution of phylogenies, alignments and evolutionary parameters by applying the Markov chain Monte Carlo method. The package also offers tools for efficient on-the-fly summarization of the results. It has a graphical interface to configure, start and supervise the analysis, to track the status of the Markov chain and to save the results. The background model for insertions and deletions can be combined with any substitution model. It is easy to add new substitution models to the software package as plugins. The samples from the Markov chain can be summarized in several ways, and new postprocessing plugins may also be installed.
I am personally a firm believer in statistical alignment. I think it is the way to go, to deal with the uncertainty in inferred alignments and to avoid the artefacts they can create.
For a good introduction to the problems (and how statistical approaches to alignment can help), you should read Lunter et al. Uncertainty in homology inferences: Assessing and improving genomic sequence alignment Genome Res. 18:298-309, 2008 (or my summary of it here).
StatAlign, the tool in the application note, looks like a nice way to attack alignments. Unlike previous approaches I’ve blogged about — and unlike my own small work in statistical alignment — it deals with multiple sequences (where MCMC is needed besides just HMMs).
It samples over both alignments and phylogenies, which is nice if there is any uncertainty in the phylogeny inference (which is typically based on alignments in the first place).
I can imagine that integrating over the phylogenies in the MCMC is the main time-killer, though, so it could be nice if you can turn that part of the state space exploration off in case you have a reasonable idea about the phylogeny but you are uncertain about some parts of the alignment…
A. Novak, I. Miklos, R. Lyngso, J. Hein (2008). StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees Bioinformatics, 24 (20), 2403-2404 DOI: 10.1093/bioinformatics/btn457
October 10th, 2008 at 7:03 am
[...] StatAlign: a new statistical alignment toolA. Novak, I. Miklos, R. Lyngso, J. Hein (2008). StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees Bioinformatics, 24 (20), 2403-2404 DOI: 10.1093/bioinformatics/btn457. [...]
October 13th, 2008 at 9:23 pm
Hi, I agree with you on statistical alignment. For the record, I’ll mention three other programs: Bali-Phy and POY, and BEAST.
BEAST was used for this by Lunter et al. (2005 BMC Bioinf).