Worldwide, genomewide patterns of variation
Saturday, February 23rd, 2008Another interesting paper in Wednesday’s Nature concerns the worldwide patterns of variation by Jakobsson et al. Again I refer to John Hawks’ blog for a human evolution perspective. Wired also has a nice discussion of the results (together with the Lohmueller et al. paper I just reviewed and a Science paper that I haven’t read yet).
Genotype, haplotype and copy-number variation in worldwide human populations
Jakobsson et al.
Nature 451, 998-1003
Abstract
Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected—including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas—the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.
This paper uses ~500K single nucleotide polymorphism (SNP) markers and ~400 copy number variable (CNV) markers in 29 populations. From this, they construct neighbour-joining trees using SNP frequencies, inferred haplotypes or CNVs and compare the trees with the geographical location of the populations.
Considering differentiation (the FST statistics) between populations, they observe the expected increased differentiation between East Africans and other populations as a function of geographical distance from East Africa (see the figure on the left, cut from Fig. 2 in the paper). From what we know from previous studies, there is very little surprise here.
They then consider linkage equilibrium (LD) in some detail, both based on individual SNPs and inferred haplotypes (using an extension of the FastPHASE algorithm, as far as I understand the paper — but I haven’t checked the supplemental material) and show increased LD as a function of geographical distance from Africa, once again confirming the Out of Africa expansion of humans (Fig. 2c from the paper on the left).
The only really surprising discovery in this paper is that CNV variation is higher in Oceanian and American populations where in general variation decreases with distance from African (as the SNP analysis in this paper also confirms). I did not find an explanation for this in the paper, and I cannot think of a good explanation myself. We don’t really know that much about CNV polymorphism yet, at least not compared to SNP variation, so perhaps there are some interesting discoveries waiting for us here?
Jakobsson, M., Scholz, S.W., Scheet, P., Gibbs, J.R., VanLiere, J.M., Fung, H., Szpiech, Z.A., Degnan, J.H., Wang, K., Guerreiro, R., Bras, J.M., Schymick, J.C., Hernandez, D.G., Traynor, B.J., Simon-Sanchez, J., Matarin, M., Britton, A., van de Leemput, J., Rafferty, I., Bucan, M., Cann, H.M., Hardy, J.A., Rosenberg, N.A., Singleton, A.B. (2008). Genotype, haplotype and copy-number variation in worldwide human populations. Nature, 451(7181), 998-1003. DOI: 10.1038/nature06742
Yesterday I visited CLC Bio for a lunch with Roald Forsberg, but afterwards I had a discussion with Bjarne Knudsen about HMM implementations using SIMD instructions. I have student working on it for his thesis and CLC Bio is using it in their software and (I think) their