Recombination and substitution rates
In a paper from PLoS Genetics earlier this month, Laurent Duret and Peter F. Arndt did a genome wide analysis of the correlation between recombination rate and substitution rate (and bias).
The Impact of Recombination on Nucleotide Substitutions in the Human Genome
Duret, L., Arndt, P.F. PLoS Genetics, 4(5) 2008
Abstract
Unraveling the evolutionary forces responsible for variations of neutral substitution patterns among taxa or along genomes is a major issue for detecting selection within sequences. Mammalian genomes show large-scale regional variations of GC-content (the isochores), but the substitution processes at the origin of this structure are poorly understood. We analyzed the pattern of neutral substitutions in 1 Gb of primate non-coding regions. We show that the GC-content toward which sequences are evolving is strongly negatively correlated to the distance to telomeres and positively correlated to the rate of crossovers (R2 = 47%). This demonstrates that recombination has a major impact on substitution patterns in human, driving the evolution of GC-content. The evolution of GC-content correlates much more strongly with male than with female crossover rate, which rules out selectionist models for the evolution of isochores. This effect of recombination is most probably a consequence of the neutral process of biased gene conversion (BGC) occurring within recombination hotspots. We show that the predictions of this model fit very well with the observed substitution patterns in the human genome. This model notably explains the positive correlation between substitution rate and recombination rate. Theoretical calculations indicate that variations in population size or density in recombination hotspots can have a very strong impact on the evolution of base composition. Furthermore, recombination hotspots can create strong substitution hotspots. This molecular drive affects both coding and non-coding regions. We therefore conclude that along with mutation, selection and drift, BGC is one of the major factors driving genome evolution. Our results also shed light on variations in the rate of crossover relative to non-crossover events, along chromosomes and according to sex, and also on the conservation of hotspot density between human and chimp.
The main point of this paper is the evolution of the GC content of the human genome, that varies significantly in various regions of the genome — the so-called isochore structure.
The evolution of isochores
The content of GC nucleotides vary along the genome, with some regions having very high fractions of GC and some having very low, and this variation is not what we would expect the sequence to look like if the entire genome was evolving under the same neutral process.
Why the genome has this structure has been debated (at time heated debates) the last two decades. Different explanations have been suggested, including:
- The mutation rate is biased and varies along the genome.
- Selection prefers high GC content in some regions and not in others.
- Gene conversion is biased, preferring to replace AT alleles with GC alleles.
where the later is a theory developed, among others, by the authors of this new paper.
Biased mutation rates is of course a possibility, but doesn’t explain the correlation with the recombination rate, unless the latter is mutagenic or causes this bias.
Selection is the explanation of Bernardi, the discoverer of the isochore structure.
Biased gene conversion is a neutral process that looks a lot like selection. The idea is as follows: there is no particular need for a bias in the mutation process — the AT to GC and GC to AT substitutions are not necessarily occurring at different rates in GC rich and GC poor regions — but once a polymorphism exists, gene-conversion between a GC allele and an AT allele will replace the AT allele with the GC allele more often than the other way around.
A consequence of this is, that although the mutation rate might not vary along the genome, the substitution rate will, and this substitution rate will be correlated with the recombination rate.
Eyre-Walker and Hurst (2001) gives more details on the three theories above.
The case for biased gene conversion
In the PLoS Genetics paper they argue for the biased gene conversion explanation (not surprisingly), and reasonably convincingly, in my opinion, but I am not an expert…
First, they construct a model of sequence evolution that does not assume time-reversibility and that the current sequences are at stationarity (which is usually assumed, but might not be true).
From this model, they estimate the substitution rate of the various types of substitutions, and they estimate the equilibrium GC content (called GC* in the paper). In the model, the equilibrium GC content can be different than the current GC content, as stationarity is not assumed, and in general GC* < GC meaning that the GC content in our genome — and this especially in GC rich areas — is decreasing. Very slowly, though.
This could suggest that whatever mechanism created the GC rich areas of our genome is either no longer in effect, or at least is weaker than it was when the GC rich areas were created.
They then consider the correlation between recombination rates and GC / GC* and notice a significant correlation, with a stronger correlation between recombintion rate and GC* than between recombination and GC.
This is take as evidence that it is recombination that drives the direction of mutations toward GC content, rather than base pair composition that determines recombination rate; if the recombination rate was determined by the base pair composition, then the present day GC content should be more correlated with the rate than some far future stationary GC content.
The biased gene conversion model suggest a preference for AT to GC substitutions in regions with high recombination rates, but where the strength of this preference depends on the effective population size.
The positive correlation between GC* and the recombination rate supports this, and the present day effective population size (or the present day recombination rate) can explain why the GC structure in the genome is eroding towards a higher AT content in the present day GC rich regions. The GC rich regions of today could have appeared in an ancestor with either a larger effective population size, or regional larger recombination rates, and the reduction in the effective population size in the present day humans is just not large enough that the biased gene conversion mechanism can keep the GC content at a high level.
The case against biased mutation and against selection
The biased mutation explanation is argued against based on the frequency patterns of polymorphisms. If the mutations are biased, but the resulting polymorphisms are selectively neutral, then the frequency of GC and AT derived polymorphisms should be the same. However, GC alleles segregate at higher frequencies than AT alleles.
The first argument against selection is less convincing, I feel, but essentially says: it is hard to imagine why selection should prefer the occasional GC in Mbp long regions with plenty of genes under selection, and even if it did, it probably wouldn’t be strong enough to drive the changes in GC content. Well…
The second argument is that selection does not explain why GC content, and especially GC*, should be correlated with the recombination rate. One possible explanation is the Hill-Robertson effect, but then the correlation should be between GC* and the population recombination, but GC* is stronger correlated with male recombination rate than with female recombination rate, something Hill-Robertson does not explain.
Conclusion
I read this paper because I was reading up on the correlation between effective population size and recombination rate for a project I’m working on. I knew about the debate about isochores — I’ve chatted with some of the biased gene conversion proponents who have visited BiRC — but I never really read up on it.
It turns out that several of my colleagues at BiRC are interested in this, so we’ve discussed the paper over the last two days, and I’ve had a lot of fun reading my way through some of the references in the paper.
I would recommend it as an introduction to this, but of course not a neutral discussion of the three theories.
Duret, L., Arndt, P.F. (2008). The Impact of Recombination on Nucleotide Substitutions in the Human Genome. PLoS Genetics, 4(5), e1000071. DOI: 10.1371/journal.pgen.1000071
Eyre-Walker, A., Hurst, L.D. (2001). The evolution of isochores. Nature Reviews Genetics, 2(7), 549-555. DOI: 10.1038/35080577
May 22nd, 2008 at 7:08 pm
Thanks for the report. So it should be possible to use sequence analysis to predict to some extent the recombination rates across a genome. Do you happen to know of similar studies done in fungal species ?
May 22nd, 2008 at 7:24 pm
I’m not sure how you would estimate recombination rates from sequence data alone, unless you would include polymorphism data as sequence data (in which case tools such as LDhat, that was used to map the hotspots in humans, can be used).
There is a correlation between substitution rates and recombination rates, but it is not strong enough that I would try to estimate one from the other; at least I wouldn’t expect to be that successful. I’m being intentionally vague here — the guy in the office next to me is looking at this stuff and probably has a good idea bout how much of the recombination variation can be explained by the substitution rate and vice versa, but he has left for today and I don’t really know…
Fungi I know absolutely nothing about, so I cannot answer that, sorry.