Posts Tagged ‘speciation’

Estimating parameters of speciation models

Monday, February 18th, 2008

Another paper that addresses the speciation process in apes is:

A new approach to estimate parameters of speciation models with application to apes

Becquet and Przeworski

Genome Research 17:1505-1519

Abstract

How populations diverge and give rise to distinct species remains a fundamental question in evolutionary biology, with important implications for a wide range of fields, from conservation genetics to human evolution. A promising approach is to estimate parameters of simple speciation models using polymorphism data from multiple loci. Existing methods, however, make a number of assumptions that severely limit their applicability, notably, no gene flow after the populations split and no intralocus recombination. To overcome these limitations, we developed a new Markov chain Monte Carlo method to estimate parameters of an isolation-migration model. The approach uses summaries of polymorphism data at multiple loci surveyed in a pair of diverging populations or closely related species and, importantly, allows for intralocus recombination. To illustrate its potential, we applied it to extensive polymorphism data from populations and species of apes, whose demographic histories are largely unknown. The isolation-migration model appears to provide a reasonable fit to the data. It suggests that the two chimpanzee species became reproductively isolated in allopatry ~850 Kya, while Western and Central chimpanzee populations split ~440 Kya but continued to exchange migrants. Similarly, Eastern and Western gorillas and Sumatran and Bornean orangutans appear to have experienced gene flow since their splits ~90 and over 250 Kya, respectively.

becquet-przeworski-fig1.pngIn this they develop a method to infer the coalescence parameters in a model that is essentially a population split with migration (click on the figure for details).

The effective population sizes, the Ns, tells us something about the diversity of the species (where NA tells us about the ancestral species). The split time, T, gives us the speciation time, and the migration parameter, m, tells us something about the way the speciation occured (an allopatric vs parapatric model).

As usual for coalescence models, the full likelihood of the parameters is computational demanding to compute, so the authors use summary statistics instead — somewhat like an Approximate Bayesian Computation (ABC) method if you can call it that when you want to match the summaries exactly — and then develop a Markov Chain Monte Carlo (MCMC) method to sample from the likelihood function over the summary statistics.

Based on this model, they then estimate speciation times for sub-species of chimps, gorillas and orangutans.


Citation for Research Blogging:Becquet, C., Przeworski, M. (2007). A new approach to estimate parameters of speciation models with application to apes. Genome Research, 17(10), 1505-1519. DOI: 10.1101/gr.6409707

Mapping human genetic ancestry

Wednesday, January 30th, 2008

Yesterday I read the paper

Mapping human genetic ancestry I. Ebersberger et al.Molecular Biology and Evolution 2007 24(10):2266-2276

that addresses the same problem that we addressed in

Genomic relationships and speciation times of human, chimpanzee and gorilla infered from a coalescent hidden Markov model A. Hobolth et al.PLoS Genetics 2007 3(2): doi:10.1371/journal.pgen.0030007

although taking a different approach to the problem but using a lot more data.

Tracing the ancestry of the human genome

Species trees and gene treesHuman’s closest living relatives are the chimps and the closest relatives to human and chimps are the gorillas, but the species are so closely related that not all of the genome follows the species genealogy. Click on the figure on the right to get an illustration of this.The reason this happens is that as we trace the history of a piece of our DNA back in time, we will necessarily find the most recent common ancestor of humans and chimps further back in time than the speciation time of humans and chimps. If this time is so far back that it also precedes the speciation time of the human/chimp ancestor and the gorilla ancestor, then the most recent common ancestor of chimps and gorillas, or humans and gorillas, might be younger than the most recent common ancestor of all the species.Looking at the DNA of the three species we can infer the average time in the past where the DNA splits into the different species and using coalescent theory we can then infer the speciation times.In Hobolth et al. we approximated the coalescent process using a hidden Markov model which enabled us to efficiently analyse large alignments of DNA sequences and from this extract the parameters needed to infer speciation times, information about the diversity in ancestral species and to annotate the alignments with the most likely genealogy e.g. showing us in which part of our genome we are closer related to gorillas than to chimps.

CoalHMM

We applied this to five large alignments, but covering only a small fraction of the entire genome.In Ebersberger et al. they construct a large number of (smaller) alignments covering the entire genome and consider the same problem in analysing this data.The statistical model they use is slightly less sophisticated than what we did, but that is probably more than compensated for by the much larger data-set. What they do is construct a single tree for each alignment, by picking the most likely phylogeny of all the possible, discarding alignments when there is no clear winner.They then use coalescent theory to infer the diversity of the ancestral species measured as the parameter Ne (effective population size) — essentially doing the same as we did — but as far as I understand they equate DNA divergence time with speciation time which strictly speaking is incorrect (I might be wrong here, I didn’t check in detail how they inferred the time interval between human/chimp divergence and their divergence from the gorilla).

Diversity of the human-chimp ancestor along the human genome

A plot of diversity is shown on the bottom half of the figure on the right. Click to enlarge.

Their estimates of Ne are pretty close to ours (65,000 ± 30,000). This is pretty good news, considering that the results come about using different methods (although based on the same underlying theory).

However, the assumptions we put into the analysis differs. To calibrate the molecular clock in the analysis we both use the divergence time from the orangutan, but where we used 18 million years (Myr) ago they use 16Myr ago. The generation time is also very important in estimating the divergence and where we used 25 years as the average generation time they used 20 years. Our estimate of generation time is a bit on the high side — Ebersberger et al. calls unrealistically high — but we really had no idea what to use here when we did our analysis.

How much have these assumptions affected the results?

With help from Julien Dutheil — who has just re-written the entire CoalHMM software — I got the numbers our analysis would have obtained had we used the assumptions from Ebersberger et al. The human-chimp divergence we estimate is 5.1 Myr (as opposed to their 5.7) and the divergence with the gorilla we estimate to 8.4 Myr (as opposed to their 7.8). This is reasonably close enough to be the same. When we then estimate the speciation time — where the generation time assumption is important — we get 3.6Myr for the human/chimp speciation and 5.7 Myr for the (human/chimp)/gorilla speciation. These look very recent to me, and I don’t fully trust them. I have seen numbers around 4 Myr for the closest distance between human and chimp, but the fossil record just doesn’t match that.

For the Ne estimate, the new assumptions give us a whooping 81,000 for the human/chimp ancestor. I’m not really sure why. Using their assumptions moves us further from their estimates. This is probably worth looking into.


Citations, for Research Blogging:Ebersberger, I., Galgoczy, P., Taudien, S., Taenzer, S., Platzer, M., von Haeseler, A. (2007). Mapping Human Genetic Ancestry. Molecular biology and evolution, 24(10), 2266-2276.Hobolth, A., Christensen, O.F., Mailund, T., Schierup, M.H. (2007). Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model. PLoS Genetics, 3(2), e7. DOI: 10.1371/journal.pgen.0030007

When did humans split from the apes anyway?

Sunday, December 9th, 2007

During some random surfing I stumbled upon these two blog posts:

both by John Hawks.

I found these interesting not least because he refers to a paper that we published earlier this year:

Hobolth A, Christensen OF, Mailund T, Schierup MH. 2007. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3:e7. doi:10.1371/journal.pgen.0030007

That paper was mainly on a new statistical method for analysing speciation. A method that combined comparative genomics with population genetics through a model that joined hidden Markov models with coalescence theory. Of course, that is not really what caught people’s attention. What we did in the paper was to apply our new method on data from human, gorrilla, chimp and orangutan, and one result that came out of that was a very recent split between human and chimp; a split only 4.1 million years old.

We get a very resent speciation split between human and apes exactly because of the combined population genetics and genomics. If we only look at the genomic sequences, the distance between these will necessarily be larger than the distance between the species — it takes a while from the time a piece of DNA is in the same individual until it is two different individuals in separate species — and our method is able to estimate the speciation split from the genome split.

I’m not sure how well I am explaining this here. I gave a (not too technical) talk in the computer science department some months ago, maybe that explains it better:

(sorry about the quality of the slides here, it looks like slideshare messed up the fonts)

A few other studies of genomic data before our own also reported more recent speciation times of human and chimp than previously believed — moving the time from about 6-8 million years ago down to maybe 4-5 million years ago — so a recent divergence between human and chimp might not be too far fetch after all, but still, I think our estimate is a bit too recent.

This is also what John Hawks writes.

Why do we get such a recent divergence, then?

It is hard to say. The 4.1 million years is what comes out of applying our method on the (admittedly small) data we had. It is a very new method, however. There is a lot we do not take into account in it and there might be biases in it we haven’t fully understood yet.

We are currently working on improving the method and once we get more data — the orangutan has already been sequenced and is now being assemblied and the gorilla genome is in the process of being sequenced — we will redo our analysis. It will be interesting to see how that turns out.