More on worlwide and genomewide variation… Just to finish the trilogy — the three papers examining genome wide polymorphism in this weeks Nature and Science — I should mention Li et al.’s Science paper covering essentially the same as the Jakobsson et al. I just reviewed.

Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation

Li et al.


Human genetic diversity is shaped by both demographic and biological factors and has fundamental implications for understanding the genetic basis of diseases. We studied 938 unrelated individuals from 51 populations of the Human Genome Diversity Panel at 650,000 common single-nucleotide polymorphism loci. Individual ancestry and population substructure were detectable with very high resolution. The relationship between haplotype heterozygosity and geography was consistent with the hypothesis of a serial founder effect with a single origin in sub-Saharan Africa. In addition, we observed a pattern of ancestral allele frequency distributions that reflects variation in population dynamics among geographic regions. This data set allows the most comprehensive characterization to date of human genetic variation.

The results do not differ that much from Jakobsson et al. but the analysis is different.

First, they use a maximum likelihood method to cluster the sampled individuals into K unknown “ancestral clusters” and considered the clustering obtained with different Ks. For increasing Ks, the individuals cluster into smaller and smaller groupings, indicating their relatedness compared to the whole sample.

Once K is high enough (K=7), the populations mainly cluster together, with most populations being derived from the same single cluster but with some populations (Middle Easterns and South/Central Asians) being a mix of the ancestral clusters.

They then construct a maximum likelihood phylogeny for the populations and find that it fits nicely with the Out of Africa model.

Considering haplotype heterozygosity, they observe that heterozygosity decreases with distance from East Africa, similar to what Jakobsson et al. reports.

Li, J.Z., Absher, D.M., Tang, H., Southwick, A.M., Casto, A.M., Ramachandran, S., Cann, H.M., Barsh, G.S., Feldman, M., Cavalli-Sforza, L.L., Myers, R.M. (2008). Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. Science, 319(5866), 1100-1104. DOI: 10.1126/science.1153717

Worldwide, genomewide patterns of variation

Another interesting paper in Wednesday’s Nature concerns the worldwide patterns of variation by Jakobsson et al. Again I refer to John Hawks’ blog for a human evolution perspective. Wired also has a nice discussion of the results (together with the Lohmueller et al. paper I just reviewed and a Science paper that I haven’t read yet).

Genotype, haplotype and copy-number variation in worldwide human populations

Jakobsson et al.

Nature 451, 998-1003


Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected—including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas—the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.

This paper uses ~500K single nucleotide polymorphism (SNP) markers and ~400 copy number variable (CNV) markers in 29 populations. From this, they construct neighbour-joining trees using SNP frequencies, inferred haplotypes or CNVs and compare the trees with the geographical location of the populations.

Fig2aConsidering differentiation (the FST statistics) between populations, they observe the expected increased differentiation between East Africans and other populations as a function of geographical distance from East Africa (see the figure on the left, cut from Fig. 2 in the paper). From what we know from previous studies, there is very little surprise here.

Fig2cThey then consider linkage equilibrium (LD) in some detail, both based on individual SNPs and inferred haplotypes (using an extension of the FastPHASE algorithm, as far as I understand the paper — but I haven’t checked the supplemental material) and show increased LD as a function of geographical distance from Africa, once again confirming the Out of Africa expansion of humans (Fig. 2c from the paper on the left).

The only really surprising discovery in this paper is that CNV variation is higher in Oceanian and American populations where in general variation decreases with distance from African (as the SNP analysis in this paper also confirms). I did not find an explanation for this in the paper, and I cannot think of a good explanation myself. We don’t really know that much about CNV polymorphism yet, at least not compared to SNP variation, so perhaps there are some interesting discoveries waiting for us here?

Jakobsson, M., Scholz, S.W., Scheet, P., Gibbs, J.R., VanLiere, J.M., Fung, H., Szpiech, Z.A., Degnan, J.H., Wang, K., Guerreiro, R., Bras, J.M., Schymick, J.C., Hernandez, D.G., Traynor, B.J., Simon-Sanchez, J., Matarin, M., Britton, A., van de Leemput, J., Rafferty, I., Bucan, M., Cann, H.M., Hardy, J.A., Rosenberg, N.A., Singleton, A.B. (2008). Genotype, haplotype and copy-number variation in worldwide human populations. Nature, 451(7181), 998-1003. DOI: 10.1038/nature06742

Harmful mutations in Europeans and Africans

What I wanted to blog about yesterday, but didn’t get around to as I explained in the previous post, was two letters in the latest version of Nature on human variation and the distribution of deleterious mutations. I’ll split it in two posts; in this post I’ll discuss Lohmueller et al. Genetic Future beat me to it so I suggest you also read the dicussion there. The paper is also covered in the latest Nature Podcast and commented on at Nature. For a human evolution perspective, read John Hawks’ post on the topic.

Proportionally more deleterious genetic variation in European than in African populations

Lohmueller et al.


Quantifying the number of deleterious mutations per diploid human genome is of crucial concern to both evolutionary and medical geneticists. Here we combine genome-wide polymorphism data from PCR-based exon resequencing, comparative genomic data across mammalian species, and protein structure predictions to estimate the number of functionally consequential single-nucleotide polymorphisms (SNPs) carried by each of 15 African American (AA) and 20 European American (EA) individuals. We find that AAs show significantly higher levels of nucleotide heterozygosity than do EAs for all categories of functional SNPs considered, including synonymous, non-synonymous, predicted ‘benign’, predicted ‘possibly damaging’ and predicted ‘probably damaging’ SNPs. This result is wholly consistent with previous work showing higher overall levels of nucleotide variation in African populations than in Europeans. EA individuals, in contrast, have significantly more genotypes homozygous for the derived allele at synonymous and non-synonymous SNPs and for the damaging allele at ‘probably damaging’ SNPs than AAs do. For SNPs segregating only in one population or the other, the proportion of non-synonymous SNPs is significantly higher in the EA sample (55.4%) than in the AA sample (47.0%; P < 2.3 x 10-37). We observe a similar proportional excess of SNPs that are inferred to be ‘probably damaging’ (15.9% in EA; 12.1% in AA; P < 3.3 x 10-11). Using extensive simulations, we show that this excess proportion of segregating damaging alleles in Europeans is probably a consequence of a bottleneck that Europeans experienced at about the time of the migration out of Africa.

In this paper, the authors compare the genetic variability in African decent and Euroean decent Americans, classify the variations according to estimated fitness, and how the “fitness” of the variations differ between the two populations.

Classifying variations and comparing the populations

Using genome-wide exon re-sequencing, the authors identified SNP variation in the sample and compared with the chimpanzee genome to infer ancestral and derived alleles. Ignoring for a bit the effect of mutations, just from knowing the variations and which alleles are ancestral and derived, we can learn about the history of the populations.

First off, we can consider the variation within the populations. Are there more variable sites in one population than in the other? Is there more heterogenity (meaning are people more likely to carry two different alleles) in one population or the other?

The results in the paper confirms previous studies that has shown that there are more variability in African than European decent individuals, matching the Out of Africa hypothesis. If humans originated in Africa — which everything indicates and I doubt anyone disagrees with any more — and populations outside Africa are relatively recent, then we expect the variability in Africa to be greater than outside Africa. A small population branching off a larger will only carry some of the variants with it, and it takes time for this to level out.

The SNPs can be classified in two categories: synonymous SNPs — those that do not change the amino acid the gene codes for — and non-synonymous — those that do. Roughly speaking, we expect the non-synonymous mutations to have an effect on fitness but not the synonymous. This is very rough, however, since the synonymous mutations can have major effects on regulation, splicing, etc., but still…

Using bioinformatics methods, the authors classify non-synonymous mutations into deleterious and non-deleterious mutations based on protein structure and conservation. They then observe that the deleterious mutations are relatively more frequent in European decent individuals.

Why is this an expected result?

To understand why this is the case, we turn to population genetics.

We expect deleterious mutations to be removed — or at least kept down in frequency — by selection, but there is a certain stochasticity in this. The frequency of an allele vary somewhat randomly in a population. Offspring will inherit one allele or the other with equal probability and pass that allele off to their offspring with equal probability. With no selection acting on the allele, the frequency will shrink or grow randomly until either fixed in the population or lost completely. When selection is acting on the allele, the number of offspring will depend on the alleles an individual carry. There is still a randomness, but the distribution of the number of offspring will change, more or less, depending on the strength of the selection.

How does this explain that there are more deleterious mutations in Europeans, then? This has to do with how stochastic the process really is.

Generally in stochastic processes, when we consider small numbers the variants in the process is larger than when we consider larger numbers. For very larger numbers, a stochastic process can behave almost deterministically, while for very small numbers the process can appear completely random.

A consequence of this is that weak selection requires a large population to have any observable effect over the background randomness of the process. The weaker the selection, the larger the population needs to be for the selection to have any effect.

If a population goes through a bottleneck, as the non-African populations are thought to have done, the selection that would act on the African population would have little effect on the non-African populations. Mutations that are selected against in the African population will not have been selected against in the non-African populations, simply because the selection wasn’t strong enough to have any effect in the smaller populations.

The paper finishes with a simulation study that shows that a bottleneck following the immigration out of Africa, followed by a population expansion, gives the observed pattern of variation, nicely confirming this.

Lohmueller, K.E., Indap, A.R., Schmidt, S., Boyko, A.R., Hernandez, R.D., Hubisz, M.J., Sninsky, J.J., White, T.J., Sunyaev, S.R., Nielsen, R., Clark, A.G., Bustamante, C.D. (2008). Proportionally more deleterious genetic variation in European than in African populations. Nature, 451(7181), 994-997. DOI: 10.1038/nature06611


Sorry, no bioinformatics blogging today. I’m just too angry to focus on it.

I had just made it to the office today — after a trip to the gym — when I got a phone call from one of my neighbours informing me that my front door was kicked in.  A few minutes later, the police called and asked me to come home to look at the place.

Nothing has been stolen, at least not as far as I can see.  The two laptops I had sitting on my desk were still there.  My TVs, my B&O and my various gadgets are still here.  They just trashed the place, probably looking for cash and nothing else.

Some of my neighbours were home, and one of them called the police when they broke into her house, with her inside.  So, anyway, the police arived quickly and scared the thieves away.  The neighbours saw them, so the police got a description — three “ethnic” youths — but even with the police dogs looking for them, they got away.

This is the second time since I moved to my new place only four years ago.  Last time was on Christmas Eve!

I am so pissed off about this.  I had a lot of work planned for today, but I’ve spent the time until now trying to find someone who can fix my door so I can close it before the storm expected tonight (and it is pretty windy already, and pretty cold in my house).

Ok, I’ll stop now before I start cursing again.  I need to calm down…