Phylogenetic inference under recombination using Bayesian stochastic topology selection

There’s an interesting paper in the current issue of Bioinformatics that I’ve just finished reading:

Phylogenetic inference under recombination using Bayesian stochastic topology selection

Webb et al. Bioinformatics 25(2) 197-203


Motivation: Conventional phylogenetic analysis for characterizing the relatedness between taxa typically assumes that a single relationship exists between species at every site along the genome. This assumption fails to take into account recombination which is a fundamental process for generating diversity and can lead to spurious results. Recombination induces a localized phylogenetic structure which may vary along the genome. Here, we generalize a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence alignments while accounting for rate heterogeneity; the hidden states refer to the unobserved phylogenic topology underlying the relatedness at a genomic location. The dimensionality of the number of hidden states (topologies) and their structure are random (not known a priori) and are sampled using Markov chain Monte Carlo algorithms. The HMM structure allows us to analytically integrate out over all possible changepoints in topologies as well as all the unknown branch lengths.

Results: We demonstrate our approach on simulated data and also to the genome of a suspected HIV recombinant strain as well as to an investigation of recombination in the sequences of 15 laboratory mouse strains sequenced by Perlegen Sciences. Our findings indicate that our method allows us to distinguish between rate heterogeneity and variation in phylogeny caused by recombination without being restricted to 4-taxa data.

The paper presents a new method for analysing sequences that have undergone recombination.

When sequences have not undergone recombination, a nice methodology for analysing them is the PhyloHMM (PDF). With this method, you have a hidden Markov model where the emission probability is determined by a phylogeny, and usually computed using Felsenstein’s pruning algorithm.

When there is recombination, the problem is that there are more than one topology for the underlying phylogeny, and if you do not know the topologies you cannot immediately calculate the emission probabilities.

You can instead model the unknown topologies as hidden states. This approach was taken by Husmeier and McGuire (2003) and is also the approach we take in our CoalHMM method (Hobolth et al 2007).

This approach doesn’t scale, however, since the number of possible toplogies grows super-exponential with the number of species.

In this paper the solve the problem by using only a few topologies as states in the HMM, but sampling over all possible topologies to be used, in an MCMC approach.  Ideally the number of topologies should be variable, but that requires a reversible jump MCMC and they haven’t implemented that.  Still, it seems to work very well.

I remember discussing the problem with both Alex and Chris when I was last in Oxford, but back then it didn’t work so well, so I am happy to read that they’ve solved the problems. Properly handling recombination and changing topologies is important for accurate sequence analysis.

A. Webb, J. M. Hancock, C. C. Holmes (2008). Phylogenetic inference under recombination using Bayesian stochastic topology selection Bioinformatics, 25 (2), 197-203 DOI: 10.1093/bioinformatics/btn607

Ancient DNA analysis of the Icelandic settlers

I’ve just finished reading this paper in PLoS Genetics:

Sequences From First Settlers Reveal Rapid Evolution in Icelandic mtDNA Pool Helgason et al. PLoS Genetics, 5 (1) DOI: 10.1371/journal.pgen.1000343


A major task in human genetics is to understand the nature of the evolutionary processes that have shaped the gene pools of contemporary populations. Ancient DNA studies have great potential to shed light on the evolution of populations because they provide the opportunity to sample from the same population at different points in time. Here, we show that a sample of mitochondrial DNA (mtDNA) control region sequences from 68 early medieval Icelandic skeletal remains is more closely related to sequences from contemporary inhabitants of Scotland, Ireland, and Scandinavia than to those from the modern Icelandic population. Due to a faster rate of genetic drift in the Icelandic mtDNA pool during the last 1,100 years, the sequences carried by the first settlers were better preserved in their ancestral gene pools than among their descendants in Iceland. These results demonstrate the inferential power gained in ancient DNA studies through the application of population genetics analyses to relatively large samples.

The paper has already been discussed by MacArthur at Genetic Future and Razib at Gene Expression so I will only give a very short review here.

Iceland was settled in the late first millennium by Vikings (replacing settlements by Irish monks) bringing with them Irish and Scottish slaves.

Iceland has been pretty isolated since the early settlement. Consequently, it is a very homogeneous population — at least genetically speaking — and is one of the best studied populations, not least by deCODE Genetics.

Analysis of contemporary DNA shows that the mitrocondrial DNA (mtDNA) is primarily of Scottish/Irish decent while Y chromosomes are primarily Scandinavian. You can probably draw your own conclusions from that.

What is new in the PLoS paper is that they have sequenced ancient mtDNA from medieval skeletons and compared them with contemporary samples from Iceland, Scandinavia and Scotland and Ireland.

The data doesn’t change much with respect to the genetic origin of the Icelanders, but an interesting finding is that the medieval Icelanders are genetically closer related to present day samples from the source populations than they are to the present day Icelanders.

In other words, the Icelanders have evolved faster.

No, this doesn’t mean that they are mutating faster or that selection has had a hand in this.  It can quite easily be explained by the isolation of the relatively small population.

Drift, one of the driving forces of evolution, simply works much faster on small (effective) population sizes than it does on larger sizes.

Drift is essentially a question of random sampling, and with smaller effective population sizes the sampling is more random than in larger effective populations.  This is very well explained in Razib’s post, so I will direct you there instead of repeating the arguments here.

Agnar Helgason, Carles Lalueza-Fox, Shyamali Ghosh, Sigrún Sigurðardóttir, Maria Lourdes Sampietro, Elena Gigli, Adam Baker, Jaume Bertranpetit, Lilja Árnadóttir, Unnur Þorsteinsdottir, Kári Stefánsson (2009). Sequences From First Settlers Reveal Rapid Evolution in Icelandic mtDNA Pool PLoS Genetics, 5 (1) DOI: 10.1371/journal.pgen.1000343


Life on Mars?

Is there life on Mars?


NASA has found methane on Mars, and that could be signs of life. Of course, it could also just be geology.

Methane — four atoms of hydrogen bound to a carbon atom — is the main component of natural gas on Earth. It’s of interest to astrobiologists because organisms release much of Earth’s methane as they digest nutrients. However, other purely geological processes, like oxidation of iron, also release methane. “Right now, we don’t have enough information to tell if biology or geology — or both — is producing the methane on Mars,” said Mumma. “But it does tell us that the planet is still alive, at least in a geologic sense. It’s as if Mars is challenging us, saying, hey, find out what this means.” Mumma is lead author of a paper on this research appearing in Science Express Jan. 15.

If microscopic Martian life is producing the methane, it likely resides far below the surface, where it’s still warm enough for liquid water to exist. Liquid water, as well as energy sources and a supply of carbon, are necessary for all known forms of life.

It will take future missions, like NASA’s Mars Science Laboratory, to discover the origin of the Martian methane. One way to tell if life is the source of the gas is by measuring isotope ratios. Isotopes are heavier versions of an element; for example, deuterium is a heavier version of hydrogen. In molecules that contain hydrogen, like water and methane, the rare deuterium occasionally replaces a hydrogen atom. Since life prefers to use the lighter isotopes, if the methane has less deuterium than the water released with it on Mars, it’s a sign that life is producing the methane.