Selective sweeps across twenty millions years of primate evolution

New paper out

Kasper Munch, Kiwoong Nam, Mikkel Heide Schierup and Thomas Mailund

The contribution from selective sweeps to variation in genetic diversity has proven notoriously difficult to assess, in part because polymorphism data only allows detection of sweeps in the most recent few hundred thousand years. Here we show how linked selection in ancestral species can be quantified across evolutionary timescales by analyzing patterns of incomplete lineage sorting (ILS) along the genomes of closely related species. We show that sweeps in the human-chimpanzee and human-orangutan ancestors can be identified as depletions of ILS in regions in excess of 100 kb in length. Sweeps predicted in each ancestral species, as well as recurrent sweeps predicted in both species, often overlap sweeps predicted in humans. This suggests that many genomic regions experience recurrent selective sweeps. By comparing the ILS patterns along the genomes of the closely related human-chimpanzee and human-orangutan ancestors, we are further able to quantify the impact of selective sweeps relative to that of background selection. Compared to the human-orangutan ancestor, the human-chimpanzee ancestor shows a strong excess of regions depleted of ILS as well as a stronger reduction in ILS around genes. We conclude that sweeps play a strong role in reducing diversity along the genome and that sweeps have reduced diversity in the human-chimpanzee ancestor much more than in the human-orangutan ancestor.

A genomic history of Aboriginal Australia

Out in Nature now

Anna-Sapfo Malaspinas, Michael C. Westaway, Craig Muller, Vitor C. Sousa, Oscar Lao, Isabel Alves, Anders Bergström, Georgios Athanasiadis, Jade Y. Cheng, Jacob E. Crawford, Tim H. Heupink, Enrico Macholdt, Stephan Peischl, Simon Rasmussen, Stephan Schiffels, Sankar Subramanian, Joanne L. Wright, Anders Albrechtsen, Chiara Barbieri, Isabelle Dupanloup, Anders Eriksson, Ashot Margaryan, Ida Moltke, Irina Pugach, Thorfinn S. Korneliussen, Ivan P. Levkivskyi, J. Víctor Moreno-Mayar, Shengyu Ni, Fernando Racimo, Martin Sikora, Yali Xue, Farhang A. Aghakhanian, Nicolas Brucato, Søren Brunak, Paula F. Campos, Warren Clark, Sturla Ellingvåg, Gudjugudju Fourmile, Pascale Gerbault, Darren Injie, George Koki, Matthew Leavesley, Betty Logan, Aubrey Lynch, Elizabeth A. Matisoo-Smith, Peter J. McAllister, Alexander J. Mentzer, Mait Metspalu, Andrea B. Migliano, Les Murgha, Maude E. Phipps, William Pomat, Doc Reynolds, Francois-Xavier Ricaut, Peter Siba, Mark G. Thomas, Thomas Wales, Colleen Ma’run Wall, Stephen J. Oppenheimer, Chris Tyler-Smith, Richard Durbin, Joe Dortch, Andrea Manica, Mikkel H. Schierup, Robert A. Foley, Marta Mirazón Lahr, Claire Bowern, Jeffrey D. Wall, Thomas Mailund, Mark Stoneking, Rasmus Nielsen, Manjinder S. Sandhu, Laurent Excoffier, David M. Lambert & Eske Willerslev

The population history of Aboriginal Australians remains largely uncharacterized. Here we generate high-coverage genomes for 83 Aboriginal Australians (speakers of Pama–Nyungan languages) and 25 Papuans from the New Guinea Highlands. We find that Papuan and Aboriginal Australian ancestors diversified 25–40 thousand years ago (kya), suggesting pre-Holocene population structure in the ancient continent of Sahul (Australia, New Guinea and Tasmania). However, all of the studied Aboriginal Australians descend from a single founding population that differentiated 10–32 kya. We infer a population expansion in northeast Australia during the Holocene epoch (past 10,000 years) associated with limited gene flow from this region to the rest of Australia, consistent with the spread of the Pama–Nyungan languages. We estimate that Aboriginal Australians and Papuans diverged from Eurasians 51–72 kya, following a single out-of-Africa dispersal, and subsequently admixed with archaic populations. Finally, we report evidence of selection in Aboriginal Australians potentially associated with living in the desert.

That’s a bit of a bummer

My Data Science textbook had a very short life as teaching material. I wrote it to have material for when we change from a quarter to a semester structure next year — I already had lecture notes for one of two classes I teach on data science, but the plan was to merge the two classes and then have the same text book for both — but after lots of discussions on how to structure our modified Master’s in Bioinformatics program, this is no longer the plan.

The plan now is to merge four quarter classes into two semester classes. Data Science 1 and 2 merged with Learning from Genome Data 1 and 2. All four classes taught R and some basic statistics, just with emphasis on different aspects, so it makes a lot of sense to combine them. The two new classes will probably be named Data Science and Statistical Learning or something to that effect.

They will have more statistical theory than I have in my current data science classes but less R programming. The reason there will be less R programming is simply that in our Master’s program we get people with different backgrounds, typically either biological or computational, and in the “statistical column” where the two new classes will live we don’t want to expect a lot of programming experience. So no fancy functional or object oriented programming in these classes — that might go into another class in the “computational column”.

With everything changed, I can no longer just use the text book I just finished, so I guess it is time to get started on new material fitted to the new classes… At least they don’t start until next summer, so there is time enough to get it done.

First draft finished of Functional Programming in R

It took me a while to get the last chapter finished, mostly because I couldn’t think of enough material to put into it. I thought there would be more to say about point-free programming, but I finally gave up and decided to just have a short last chapter.

So now I am done with the draft. There are a few things I want to add to some of the previous chapters that I have put on my TODO list, and of course there is still some editing to do, but at least now all chapters are done.

You can get the book at LeanPub.

On the genetic structure of Denmark

This is a guest post by Yorgos Athanasiadis, our postdoc who did the analysis in the two papers discussed below — Thomas

Scandinavian countries present close linguistic, cultural and historical links with each other. Yet, in our recently published paper (1) we found that they can differ considerably in their genetic fine print. Our study centred primarily on Denmark, but also explored genetic patterns in Sweden and Norway thanks to collaborations with international GWAS consortia.

Our data came from a highly successful citizen science project back in 2013|14 targeting high school students from across the Denmark (2). The resulting sample of about 800 students represents a snapshot of a single generation born in the mid-1990’s, allowing us to calibrate more accurately all of our historical estimates.

A total of 36 schools with good coverage of the entire country participated in the project’s outreach activities. Map taken from ref. 2.

A total of 36 schools with good coverage of the entire country participated in the project’s outreach activities. Map taken from ref. 2.

Classical PCA showed no geographic structure in a subset of about 400 students with all four of their grandparents born in Denmark, but there was weak correlation between PC1 and grandparental place of birth (measured as averaged geographic coordinates). Moreover, average pairwise FST between six well-defined geographic regions was extremely low (0.0002), ranking in between England and Scotland (as reported in Nature). Finally, Cheng and Nielsen’s Ohana revealed similar mixture profiles in all six geographic regions from Denmark, with populations in the east presenting slightly higher affinity with Poland (inset in the following Figure).

Results from Ohana For K = 4. The method helped us identify two well-defined geographical clusters (the Iberian in blue and the East European in yellow), as well as two that are more open to interpretations (we call them Central and Nordic European clusters in red and green, respectively).

Results from Ohana For K = 4. The method helped us identify two well-defined geographical clusters (the Iberian in blue and the East European in yellow), as well as two that are more open to interpretations (we call them Central and Nordic European clusters in red and green, respectively).

All the above methods assume to some extent independence between the used genetic markers, i.e. they do not model LD explicitly. To gain more power in our investigations, we also considered IBD-based methods that leverage haplotype sharing between individuals.

The first thing we tried was to paint Danish chromosomes according to a set of Western European donors and to use the similarities/differences in identifying clusters within Denmark. Interestingly, the method failed to detect any meaningful clustering, lumping all individuals into one big cluster, a fact that points out even further the lack of strong genetic structure in Denmark. This lack of structure was also reflected by the very similar mixture profiles produced by this method across all six regions of Denmark.

Clustering and admixture results for the six geographic regions of Denmark based on chromosome-painting methods. Bar plots are best interpreted as mixture profiles, although in some cases historical insights can also be extracted (e.g. Polish admixture in the Southeast of Denmark). Note that Iberians, contrary to the popular belief, do not seem to have left their mark on the genetic makeup of present-day Denmark.

Clustering and admixture results for the six geographic regions of Denmark based on chromosome-painting methods. Bar plots are best interpreted as mixture profiles, although in some cases historical insights can also be extracted (e.g. Polish admixture in the Southeast of Denmark). Note that Iberians, contrary to the popular belief, do not seem to have left their mark on the genetic makeup of present-day Denmark.

The six Danish regions showed highest affinity with a cluster that we call BRI(tish), because it’s mostly made up by British samples, followed by the NOR(wegian) and SWE(dish) clusters. This is not to say that Danes are about 40% made up by British DNA, as some enthusiastic twitters have mentioned. The BRI cluster also includes German, Belgian and Dutch samples, meaning that it might as well be reflecting some other ethnic component; in lack of a better name, we called it BRI. Another interesting fact is that because of the presence of this cluster, haplotype sharing with other Scandinavians was about 40%. Finally, a small Polish component was detected in the South of Denmark, in the regions where history informs us about the presence of Wend settlements from the 10th century on. Co-ancestry curves provided time estimates for an admixing event that involved a Polish-like ancestral population around 1052 AD – a result that is too congruous to ignore!

We used total IBD sharing within Denmark as a proxy for relatedness and found that participants tend to live close to their closest “genomic relatives”. The geographic distance between any two such individuals presented a bimodal distribution enriched for distances up to 50 Km – probably representing individuals living in urban regions. There was also a significant negative relationship between genetic closeness and geographic proximity.

Results from geographic analysis of IBD sharing patterns. In both plots we see that participants tend to live close to their closest genomic relatives. This observation points out that Denmark presents weak structure that is undetectable by methods assuming unlinked markers.

Results from geographic analysis of IBD sharing patterns. In both plots we see that participants tend to live close to their closest genomic relatives. This observation points out that Denmark presents weak structure that is undetectable by methods assuming unlinked markers.

Finally, IBD sharing was also used to study Ne in historical terms, in a manner similar to the PSMC curves. Interestingly, the three Scandinavian countries presented quite different patterns of historical Ne. Sweden and Norway had more inflated recent Ne, compared to Denmark, possibly due to the lack of strong structure in the latter. Indeed, Sweden and Norway are much larger countries and their landscape provides more opportunities for partial genetic isolation contrary to Denmark’s flattened land and good maritime network.

Results from the IBDNe analysis of three Scandinavian countries. We observed quite distinct patterns of Ne change in the last centuries, possibly reflecting different levels of genetic structure, with Denmark presenting the lowest of them all.

Results from the IBDNe analysis of three Scandinavian countries. We observed quite distinct patterns of Ne change in the last centuries, possibly reflecting different levels of genetic structure, with Denmark presenting the lowest of them all.

Our papers stand as an example of how far one can nowadays go with SNP data in order to answer questions of historical relevance. It is tempting to see our results as proving the obvious: that “Danes are Danes”. However, experience from other genomic projects in European countries has shown that the degree of population structure can be surprisingly high, even for areas of the same size or smaller than Denmark (e.g. the Netherlands and Western France).

References
1. Athanasiadis G, et al.: Nationwide genomic study in Denmark reveals remarkable population homogeneity. Genetics 2016.
2. Athanasiadis G, et al.: Spitting for science: Danish high school students commit to a large-scale self-reported genetic study. PLOS ONE 2016;11:e0161822.