Effective population size…

In population genetics we have this thing called the “effective population size”. I’m thinking about it right now because we are discussing it on twitter.

It’s a parameter of different mathematical models, and essentially a way of translating from one model to another.

In the Wrights-Fisher model, that I’ve mentioned a number of times before, you imagine that you have a fixed population of N individuals. If you have diploid individuals (like you and me) you have 2N genomes in your model, so mostly you will see the model using 2N genes, and the model then describes how these 2N genes evolve in a population.

The model describes how these 2N genes change in frequency as some have fewer and some have more descendants.

It’s not the only model we use, however, there are lots of different models that are useful for different purposes, so we tend to pick one that fits our purpose for whatever we are trying to do. One example is the coalescence model, where instead of having a population of 2N genes evolving forward in time, you consider a sample of n genes out of the 2N and looking back in time on how their ancestry was.

In coalescence models, you don’t have a population evolving one generation after another, instead you have a continuous time (moving backward in time, so time t=0 is the present and t>0 is in the past), and you model how these genes find common ancestors as you go back in time.

In the coalescence model, the rate at which two genes find a common ancestor (the rate at which the coalesce) is 1, but this correspond to the number of generations in the WF model we have to go back in time for two randomly selected genes will last have had a common ancestor (which turns out to be 2N generations) so the coalescence rate in the coalescence model is directly related to the size of the WF population.

In diffusion models we work forward in time, like the WF model, but in continuos time, like the coalescence models. This time, we model how frequencies of gene variants changes over time as solutions to differential equations, but the speed at which frequencies change are directly related to the 2N in the WF. If we want to know how fast we expect to move from frequency x1 to x2 we can model this in a diffusion model, and if we consider a unit time in this model (so time t=0 to t=1) this will correspond to a time unit of 2N in the WF model.

We can move from one model to another by changing time units. A time unit of 1 in coalescence or diffusion models correspond to a unit of 2N generations in the WF models. So N is important as a parameter because it lets us translate from one time unit to another, and relate results we can get mathematically in one model to another model where the same then is true.

Incidentally, the “effective” population size, Ne, is just the N in this model, except that we talk about the breeding part of a population. If only 10% of a population actually contribute to the next generation, we don’t so much care about the full population size, N, but only those that actually matter for the genetics, which we then call the “effective” population size, Ne.  In the models above, though, N is just Ne.

Things get a little complicated when you don’t just have a single parameter to translate from one population to another. In the models I have mentioned, the N can directly give you the expected number of mutations you see as differences between two randomly chosen individuals and at the same time the total amount of differences within a population. Those two numbers are not exactly the same measure of diversity in a population however. If you pick a sample of n individuals and these all have their most recent common ancestor (MRCA) in exactly the same individual, then differences between them are independent. If the MRCA depend on which pair you choose, mutations will be shared between some lineages and not others.

The probability that the MRCAs are different or the same depends on demographics, and in an expanding population (like humans) you are more likely to have independent lineages than in a population that has had the same size for a long time.

I like to think of Ne as a measure of the time it takes to lineages to go back to their MRCA, which works well for a coalescence model. If you have an expanding population, though, this way of looking at it doesn’t quite let you translate between models.  In an expanding population you will find a MRCA faster than in staple population but the gene frequencies changes slower, so coalescence model will want a smaller Ne and a diffusion model will need a larger Ne.

The effective population size is a mathematical parameter and if you understand the models it is easy enough to translate the Ne from one model to another, but when it gets to modelling populations that are not at equilibrium you really need to just consider it parts of the math and not try to interpret it too much…

Author: Thomas Mailund

My name is Thomas Mailund and I am a research associate professor at the Bioinformatics Research Center, Uni Aarhus. Before this I did a postdoc at the Dept of Statistics, Uni Oxford, and got my PhD from the Dept of Computer Science, Uni Aarhus.

Leave a Reply