Archive for April 7th, 2008

What can we learn from genome-wide association studies conducted so far?

Monday, April 7th, 2008

ResearchBlogging.orgGenome-wide association studies rely on the “common disease / common variant” hypothesis: that the major genetic effect of a common disease (major effect at the population level, not necessarily for the individual) is caused by a few genetic variants that are common in the population. It is of course a parsimonious explanation, explaining the high disease frequency with a few common variants rather than many rare variants, but there are also population genetics arguments in favour of this hypothesis. Is it true, then? Can we conclude that, from the association mapping studies published the last year and a half? If we had found absolutely nothing we would probably reject it, but we have found something, just not enough to explain all the genetic contribution to the diseases we have studied, so we haven’t really answered the question yet.This paper makes an attempt at answering the question:

What Can Genome-Wide Association Studies Tell Us about the Genetics of Common DiseaseMark M. Iles. PLoS Genet 4(2): e33. doi:10.1371/journal.pgen.0040033

Abstract

The success of genome-wide association studies relies on much of the risk of common diseases being due to common genetic variants; but evidence for this is inconclusive. The results of published genome-wide association studies are examined to see what can be learnt about the distribution of disease-associated variants and how this might influence future study design. Although replicated disease-associated variants tend to be very common and frequency is inversely correlated with estimated effect size, our simulations suggest that such observations are the result of power. We find that for studies conducted to date, the frequency and effect size of significantly associated alleles are likely to be similar to those of the underlying disease alleles that they represent. Little of the genetic variation of disease has been explained so far, but current studies are only adequately powered to detect very common alleles unless they greatly increase disease risk. Thus, although the truth of the common disease / common variant hypothesis remains undecided, recent successes suggest that there are many more common genetic disease-associated variants, requiring larger studies to be identified.

First, the author notices that there is a negative correlation between the strength of the genetic and the allele frequency of the increased-risk variants in the published studies. One could argue that selection is the cause of this: if there is selection against the disease, then the disease variant will be kept to low frequencies, but there is also a negative correlation between the minor allele frequency of the disease marker and the genetic effect even when the increased-risk allele is the major allele, which could instead suggest that the correlation is caused by the statistical power to detect low-frequency disease markers: only for high genetic effects have we observed any.I’m not completely comfortable with this argument myself. At the very least, it should be argued that the effect is not simply a consequence of the at risk allele being the minor allele more often than by chance, but anyway the argument is not essential for the rest of the paper, where the power question is addressed.The big question is: would we see different distributions of discovered disease allele frequencies if the main genetic component is a few common variants (common disease/common variant) or if the genetic component was caused by several low frequency variants?The question is addressed through a simulation study, where data is simulated with i) mainly low-frequent disease variants, ii) some low-frequent and some high-frequent disease variants, and iii) mainly high-frequent variants. An association test is performed, and the frequencies of the significant markers is examined.If there is a difference, the distribution that looks the most like the observed distribution from existing studies would be the most likely explanation for the real genetic architecture underlying common diseases.As it turns out, the frequencies of the detected markers are not different under the three setups unless either the genetic effect is strong (genetic relative risk GRR >= 2) or the sample size is large (n=3000).  While we have several studies with high enough sample sizes, the effect sizes we have seen so far are rather small (GRR from 1.1 to 1.5 or so), so we are in the ranger where we might be able to see a difference, but not where we are guaranteed to see it.In other words: with the sample size we have used so far, we do not really have the power to detect the rare variants that would tell us if the common disease/common variant hypothesis is true or not.  Regardless of whether it is true, or whether more low-frequency alleles contribute the major part to the genetic component of a disease, we would see the same distribution of frequencies as we have observed so far.I will conclude with the final paragraph from the paper:

For now, it is unlikely that much can be inferred about the CDCV hypothesis from the results of GWA studies. The successes in finding common variants associated with common diseases are encouraging, but, as our findings show, we cannot yet be sure whether the common disease-associated variants found so far represent the tip of the iceberg or the bottom of the barrel.

which is essentially where the post started…


Iles, M.M. (2008). What Can Genome-Wide Association Studies Tell Us about the Genetics of Common Disease. PLoS Genetics, 4(2), e33. DOI: 10.1371/journal.pgen.0040033