If you are interested in phylogenomics and primate evolution -- including human evolution -- this new review in Genome Research is a must read.
Phylogenomics of primates and their ancestral populations
Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly evolutionary and phylogenetic approach to comparative genomics, called phylogenomics, will be essential in unlocking the valuable information about evolutionary history and genomic function that is contained within these genomes. However, most phylogenomic analyses so far have ignored the effects of variation in ancestral populations on patterns of sequence divergence. These effects can be pronounced in the primates, owing to large ancestral effective population sizes relative to the intervals between speciation events. In particular, local genealogies can vary considerably across loci, which can produce biases and diminished power in many phylogenomic analyses of interest, including phylogeny reconstruction, the identification of functional elements, and the detection of natural selection. At the same time, this variation in genealogies can be exploited to gain insight into the nature of ancestral populations. In this Perspective, I explore this area of intersection between phylogenetics and population genetics, and its implications for primate phylogenomics. I begin by “lifting the hood” on the conventional tree-like representation of the phylogenetic relationships between species, to expose the population-genetic processes that operate along its branches. Next, I briefly review an emerging literature that makes use of the complex relationships among coalescence, recombination, and speciation to produce inferences about evolutionary histories, ancestral populations, and natural selection. Finally, I discuss remaining challenges and future prospects at this nexus of phylogenetics, population genetics, and genomics.
...and if you are wondering why my blog is so quiet these days, it is because I am swamped with four of the genome projects mentioned in the paper: orangutan, bonobo, gorilla and macaque...
Any summary of this paper that I write will not really do justice to it -- you really should read it yourself and you will be happy you did -- so I'll just briefly summarize the topics that Adam covers.
First he covers basic phylogenetics, that is figuring out species relationships. This is, by now, a well known field and essentially boils down to modeling sequence evolution as Markov chains so you can estimate divergence times and tree relationships from the substitutions between sequences.
For closely related species, though, that is only a small part of the picture, and the more interesting part of the paper involves introducing population genetics to phylogenetics. You have to remember that speciation somehow involves populations; two species do not just split up, rather groups of individuals diverge and their genomes start diverging as groups rather than individuals. That leads to varying sequence divergence as you scan along the genomes, and under certain conditions to incomplete lineage sorting, where gene trees are different from species trees.
This doesn't just cause complications in genomic inference, though. It provides valuable information about ancestral species and about speciation processes, which is the next topic Adam covers. For primates, this is especially important. The time intervals between speciations are short, and the ancestral effective population sizes are large *, so 1) if you ignore this your results will be way off, but 2) if you embrace it you have a lot of information to learn about the ancestry of the primates.
This then leads us to speciation models. There are plenty of those, where the simplest (allopatric speciation) just assumes that some barrier appears between two populations after which they evolve independently to the point where they can no longer reproduce as hybrids. That is probably a good model for the chimp/bonobo split, where the Congo River got in the way (chimps can't swim), but it is a bit simple so more complex scenarios are worth considering for most speciation events. The point here just is that different scenarios will leave different signals in the genomes, and we should be able to work this out by looking at the extant genomes.
There's a nice review of the work done so far in the paper, but honestly we are still only at the starting phase of modeling this, and a lot of work remains before we can say anything conclusively about any of the primate speciations.
Next we get to selection. With the whole neutral theory we have turned to believe that we can explain most of genome evolution with neutral mutations -- well I have anyway, but that might just be me. Recent results, though, hints at selection being a major force in genome evolution anyway. My older colleagues tells me that selection was much more important in theory years back, but my background gave me the intuition that it could pretty much be ignored when comparing genomes; maybe I was wrong on that.
Perhaps the null model when we look at entire genomes shouldn't be neutrality after all, I don't know... We are seeing signals to that effect in our own work, anyway, but I'll tell you all about that later when those papers are out, for now let's just read Adam's paper that is much more interesting anyway!
The last part of the paper is on Future Prospects. Well, most papers are, so no surprise there, but if you are getting into the field there are some interesting areas to start thinking about in this review.
How do we incorporate the ancestral recombination graph (ARG) into phylogenetic analysis? How do we model it without the combinatorial state space explosion? How do we infer anything usable from the weak signals that is in the data for this? How do we combine model sophistication with computational efficiency to alleviate the state space explosion? Which model assumptions are essential and which can we get away with approximating?
Let me add a few of my own: How do we model this complex system without too much complex math so that when we have results we can actually interpret the results? How do we check if deviations from our model actually shows evidence for some model over another, and are not just showing that we have the wrong model?
Go read the paper! Seriously, it is a great read!
* Yeah, about ancestral population sizes... there are consistent estimates of very large ancestral effective population sizes, using very different methods, but generally it seems like the ancestral species were more diverge than the extant species are. The consistent results, with different methods, indicates that this might be true, but it still is somewhat suspicious, but I guess we will learn more over the coming years as we get more data and more sophisticated methods.
Siepel, A. (2009). Phylogenomics of primates and their ancestral populations Genome Research, 19 (11), 1929-1941 DOI: 10.1101/gr.084228.108