Posts Tagged ‘Variation’

More on worlwide and genomewide variation...

Saturday, February 23rd, 2008

ResearchBlogging.org Just to finish the trilogy -- the three papers examining genome wide polymorphism in this weeks Nature and Science -- I should mention Li et al.'s Science paper covering essentially the same as the Jakobsson et al. I just reviewed.

Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation

Li et al.

Abstract

Human genetic diversity is shaped by both demographic and biological factors and has fundamental implications for understanding the genetic basis of diseases. We studied 938 unrelated individuals from 51 populations of the Human Genome Diversity Panel at 650,000 common single-nucleotide polymorphism loci. Individual ancestry and population substructure were detectable with very high resolution. The relationship between haplotype heterozygosity and geography was consistent with the hypothesis of a serial founder effect with a single origin in sub-Saharan Africa. In addition, we observed a pattern of ancestral allele frequency distributions that reflects variation in population dynamics among geographic regions. This data set allows the most comprehensive characterization to date of human genetic variation.

The results do not differ that much from Jakobsson et al. but the analysis is different.

First, they use a maximum likelihood method to cluster the sampled individuals into K unknown "ancestral clusters" and considered the clustering obtained with different Ks. For increasing Ks, the individuals cluster into smaller and smaller groupings, indicating their relatedness compared to the whole sample.

Once K is high enough (K=7), the populations mainly cluster together, with most populations being derived from the same single cluster but with some populations (Middle Easterns and South/Central Asians) being a mix of the ancestral clusters.

They then construct a maximum likelihood phylogeny for the populations and find that it fits nicely with the Out of Africa model.

Considering haplotype heterozygosity, they observe that heterozygosity decreases with distance from East Africa, similar to what Jakobsson et al. reports.


Li, J.Z., Absher, D.M., Tang, H., Southwick, A.M., Casto, A.M., Ramachandran, S., Cann, H.M., Barsh, G.S., Feldman, M., Cavalli-Sforza, L.L., Myers, R.M. (2008). Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. Science, 319(5866), 1100-1104. DOI: 10.1126/science.1153717

Worldwide, genomewide patterns of variation

Saturday, February 23rd, 2008

ResearchBlogging.org

Another interesting paper in Wednesday's Nature concerns the worldwide patterns of variation by Jakobsson et al. Again I refer to John Hawks' blog for a human evolution perspective. Wired also has a nice discussion of the results (together with the Lohmueller et al. paper I just reviewed and a Science paper that I haven't read yet).

Genotype, haplotype and copy-number variation in worldwide human populations

Jakobsson et al.

Nature 451, 998-1003

Abstract

Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected—including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas—the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.

This paper uses ~500K single nucleotide polymorphism (SNP) markers and ~400 copy number variable (CNV) markers in 29 populations. From this, they construct neighbour-joining trees using SNP frequencies, inferred haplotypes or CNVs and compare the trees with the geographical location of the populations.

Fig2aConsidering differentiation (the FST statistics) between populations, they observe the expected increased differentiation between East Africans and other populations as a function of geographical distance from East Africa (see the figure on the left, cut from Fig. 2 in the paper). From what we know from previous studies, there is very little surprise here.

Fig2cThey then consider linkage equilibrium (LD) in some detail, both based on individual SNPs and inferred haplotypes (using an extension of the FastPHASE algorithm, as far as I understand the paper -- but I haven't checked the supplemental material) and show increased LD as a function of geographical distance from Africa, once again confirming the Out of Africa expansion of humans (Fig. 2c from the paper on the left).

The only really surprising discovery in this paper is that CNV variation is higher in Oceanian and American populations where in general variation decreases with distance from African (as the SNP analysis in this paper also confirms). I did not find an explanation for this in the paper, and I cannot think of a good explanation myself. We don't really know that much about CNV polymorphism yet, at least not compared to SNP variation, so perhaps there are some interesting discoveries waiting for us here?


Jakobsson, M., Scholz, S.W., Scheet, P., Gibbs, J.R., VanLiere, J.M., Fung, H., Szpiech, Z.A., Degnan, J.H., Wang, K., Guerreiro, R., Bras, J.M., Schymick, J.C., Hernandez, D.G., Traynor, B.J., Simon-Sanchez, J., Matarin, M., Britton, A., van de Leemput, J., Rafferty, I., Bucan, M., Cann, H.M., Hardy, J.A., Rosenberg, N.A., Singleton, A.B. (2008). Genotype, haplotype and copy-number variation in worldwide human populations. Nature, 451(7181), 998-1003. DOI: 10.1038/nature06742

Harmful mutations in Europeans and Africans

Saturday, February 23rd, 2008

ResearchBlogging.org

What I wanted to blog about yesterday, but didn't get around to as I explained in the previous post, was two letters in the latest version of Nature on human variation and the distribution of deleterious mutations. I'll split it in two posts; in this post I'll discuss Lohmueller et al. Genetic Future beat me to it so I suggest you also read the dicussion there. The paper is also covered in the latest Nature Podcast and commented on at Nature. For a human evolution perspective, read John Hawks' post on the topic.

Proportionally more deleterious genetic variation in European than in African populations

Lohmueller et al.

Abstract

Quantifying the number of deleterious mutations per diploid human genome is of crucial concern to both evolutionary and medical geneticists. Here we combine genome-wide polymorphism data from PCR-based exon resequencing, comparative genomic data across mammalian species, and protein structure predictions to estimate the number of functionally consequential single-nucleotide polymorphisms (SNPs) carried by each of 15 African American (AA) and 20 European American (EA) individuals. We find that AAs show significantly higher levels of nucleotide heterozygosity than do EAs for all categories of functional SNPs considered, including synonymous, non-synonymous, predicted 'benign', predicted 'possibly damaging' and predicted 'probably damaging' SNPs. This result is wholly consistent with previous work showing higher overall levels of nucleotide variation in African populations than in Europeans. EA individuals, in contrast, have significantly more genotypes homozygous for the derived allele at synonymous and non-synonymous SNPs and for the damaging allele at 'probably damaging' SNPs than AAs do. For SNPs segregating only in one population or the other, the proportion of non-synonymous SNPs is significantly higher in the EA sample (55.4%) than in the AA sample (47.0%; P < 2.3 x 10-37). We observe a similar proportional excess of SNPs that are inferred to be 'probably damaging' (15.9% in EA; 12.1% in AA; P < 3.3 x 10-11). Using extensive simulations, we show that this excess proportion of segregating damaging alleles in Europeans is probably a consequence of a bottleneck that Europeans experienced at about the time of the migration out of Africa.

In this paper, the authors compare the genetic variability in African decent and Euroean decent Americans, classify the variations according to estimated fitness, and how the "fitness" of the variations differ between the two populations.

Classifying variations and comparing the populations

Using genome-wide exon re-sequencing, the authors identified SNP variation in the sample and compared with the chimpanzee genome to infer ancestral and derived alleles. Ignoring for a bit the effect of mutations, just from knowing the variations and which alleles are ancestral and derived, we can learn about the history of the populations.

First off, we can consider the variation within the populations. Are there more variable sites in one population than in the other? Is there more heterogenity (meaning are people more likely to carry two different alleles) in one population or the other?

The results in the paper confirms previous studies that has shown that there are more variability in African than European decent individuals, matching the Out of Africa hypothesis. If humans originated in Africa -- which everything indicates and I doubt anyone disagrees with any more -- and populations outside Africa are relatively recent, then we expect the variability in Africa to be greater than outside Africa. A small population branching off a larger will only carry some of the variants with it, and it takes time for this to level out.

The SNPs can be classified in two categories: synonymous SNPs -- those that do not change the amino acid the gene codes for -- and non-synonymous -- those that do. Roughly speaking, we expect the non-synonymous mutations to have an effect on fitness but not the synonymous. This is very rough, however, since the synonymous mutations can have major effects on regulation, splicing, etc., but still...

Using bioinformatics methods, the authors classify non-synonymous mutations into deleterious and non-deleterious mutations based on protein structure and conservation. They then observe that the deleterious mutations are relatively more frequent in European decent individuals.

Why is this an expected result?

To understand why this is the case, we turn to population genetics.

We expect deleterious mutations to be removed -- or at least kept down in frequency -- by selection, but there is a certain stochasticity in this. The frequency of an allele vary somewhat randomly in a population. Offspring will inherit one allele or the other with equal probability and pass that allele off to their offspring with equal probability. With no selection acting on the allele, the frequency will shrink or grow randomly until either fixed in the population or lost completely. When selection is acting on the allele, the number of offspring will depend on the alleles an individual carry. There is still a randomness, but the distribution of the number of offspring will change, more or less, depending on the strength of the selection.

How does this explain that there are more deleterious mutations in Europeans, then? This has to do with how stochastic the process really is.

Generally in stochastic processes, when we consider small numbers the variants in the process is larger than when we consider larger numbers. For very larger numbers, a stochastic process can behave almost deterministically, while for very small numbers the process can appear completely random.

A consequence of this is that weak selection requires a large population to have any observable effect over the background randomness of the process. The weaker the selection, the larger the population needs to be for the selection to have any effect.

If a population goes through a bottleneck, as the non-African populations are thought to have done, the selection that would act on the African population would have little effect on the non-African populations. Mutations that are selected against in the African population will not have been selected against in the non-African populations, simply because the selection wasn't strong enough to have any effect in the smaller populations.

The paper finishes with a simulation study that shows that a bottleneck following the immigration out of Africa, followed by a population expansion, gives the observed pattern of variation, nicely confirming this.


Lohmueller, K.E., Indap, A.R., Schmidt, S., Boyko, A.R., Hernandez, R.D., Hubisz, M.J., Sninsky, J.J., White, T.J., Sunyaev, S.R., Nielsen, R., Clark, A.G., Bustamante, C.D. (2008). Proportionally more deleterious genetic variation in European than in African populations. Nature, 451(7181), 994-997. DOI: 10.1038/nature06611

"Identical" twins

Saturday, February 16th, 2008

Now there's a study that shows that identical (monozygotic) twins do not have identical genomes (I spotted it here at DNA Direct talk -- I'm getting a lot of science news now that I follow the DNA network).

The genomes are pretty close, but not identical. There seem to be a lot of structural variation between them.

I guess it doesn't surprise me all that much, even if it looks like a major discovery. Considering that the cells within an individual have almost but not quite identical genomes, I would be very surprised if twins' genomes were identical.

For reading about the somatic cell differences, this is an excellent paper:

Genomic Variability within an Organism Exposes Its Cell Lineage Tree

Frumkin D, Wasserstrom A, Kaplan S, Feige U, Shapiro E

Genomic Variability within an Organism Exposes Its Cell Lineage Tree. PLoS Comput Biol 1(5): e50 doi:10.1371/journal.pcbi.0010050

Abstract

What is the lineage relation among the cells of an organism? The answer is sought by developmental biology, immunology, stem cell research, brain research, and cancer research, yet complete cell lineage trees have been reconstructed only for simple organisms such as Caenorhabditis elegans. We discovered that somatic mutations accumulated during normal development of a higher organism implicitly encode its entire cell lineage tree with very high precision. Our mathematical analysis of known mutation rates in microsatellites (MSs) shows that the entire cell lineage tree of a human embryo, or a mouse, in which no cell is a descendent of more than 40 divisions, can be reconstructed from information on somatic MS mutations alone with no errors, with probability greater than 99.95%. Analyzing all ~1.5 million MSs of each cell of an organism may not be practical at present, but we also show that in a genetically unstable organism, analyzing only a few hundred MSs may suffice to reconstruct portions of its cell lineage tree. We demonstrate the utility of the approach by reconstructing cell lineage trees from DNA samples of a human cell line displaying MS instability. Our discovery and its associated procedure, which we have automated, may point the way to a future “Human Cell Lineage Project” that would aim to resolve fundamental open questions in biology and medicine by reconstructing ever larger portions of the human cell lineage tree.

The applications for analysing genetic diseases that the researchers mention still makes this an interesting result, if only you can find sufficent twins with one affected and one unaffected twin...