What can we learn from genome-wide association studies conducted so far?
Genome-wide association studies rely on the “common disease / common variant” hypothesis: that the major genetic effect of a common disease (major effect at the population level, not necessarily for the individual) is caused by a few genetic variants that are common in the population. It is of course a parsimonious explanation, explaining the high disease frequency with a few common variants rather than many rare variants, but there are also population genetics arguments in favour of this hypothesis. Is it true, then? Can we conclude that, from the association mapping studies published the last year and a half? If we had found absolutely nothing we would probably reject it, but we have found something, just not enough to explain all the genetic contribution to the diseases we have studied, so we haven’t really answered the question yet.This paper makes an attempt at answering the question:
What Can Genome-Wide Association Studies Tell Us about the Genetics of Common DiseaseMark M. Iles. PLoS Genet 4(2): e33. doi:10.1371/journal.pgen.0040033Abstract
The success of genome-wide association studies relies on much of the risk of common diseases being due to common genetic variants; but evidence for this is inconclusive. The results of published genome-wide association studies are examined to see what can be learnt about the distribution of disease-associated variants and how this might influence future study design. Although replicated disease-associated variants tend to be very common and frequency is inversely correlated with estimated effect size, our simulations suggest that such observations are the result of power. We find that for studies conducted to date, the frequency and effect size of significantly associated alleles are likely to be similar to those of the underlying disease alleles that they represent. Little of the genetic variation of disease has been explained so far, but current studies are only adequately powered to detect very common alleles unless they greatly increase disease risk. Thus, although the truth of the common disease / common variant hypothesis remains undecided, recent successes suggest that there are many more common genetic disease-associated variants, requiring larger studies to be identified.
First, the author notices that there is a negative correlation between the strength of the genetic and the allele frequency of the increased-risk variants in the published studies. One could argue that selection is the cause of this: if there is selection against the disease, then the disease variant will be kept to low frequencies, but there is also a negative correlation between the minor allele frequency of the disease marker and the genetic effect even when the increased-risk allele is the major allele, which could instead suggest that the correlation is caused by the statistical power to detect low-frequency disease markers: only for high genetic effects have we observed any.I’m not completely comfortable with this argument myself. At the very least, it should be argued that the effect is not simply a consequence of the at risk allele being the minor allele more often than by chance, but anyway the argument is not essential for the rest of the paper, where the power question is addressed.The big question is: would we see different distributions of discovered disease allele frequencies if the main genetic component is a few common variants (common disease/common variant) or if the genetic component was caused by several low frequency variants?The question is addressed through a simulation study, where data is simulated with i) mainly low-frequent disease variants, ii) some low-frequent and some high-frequent disease variants, and iii) mainly high-frequent variants. An association test is performed, and the frequencies of the significant markers is examined.If there is a difference, the distribution that looks the most like the observed distribution from existing studies would be the most likely explanation for the real genetic architecture underlying common diseases.As it turns out, the frequencies of the detected markers are not different under the three setups unless either the genetic effect is strong (genetic relative risk GRR >= 2) or the sample size is large (n=3000). While we have several studies with high enough sample sizes, the effect sizes we have seen so far are rather small (GRR from 1.1 to 1.5 or so), so we are in the ranger where we might be able to see a difference, but not where we are guaranteed to see it.In other words: with the sample size we have used so far, we do not really have the power to detect the rare variants that would tell us if the common disease/common variant hypothesis is true or not. Regardless of whether it is true, or whether more low-frequency alleles contribute the major part to the genetic component of a disease, we would see the same distribution of frequencies as we have observed so far.I will conclude with the final paragraph from the paper:
For now, it is unlikely that much can be inferred about the CDCV hypothesis from the results of GWA studies. The successes in finding common variants associated with common diseases are encouraging, but, as our findings show, we cannot yet be sure whether the common disease-associated variants found so far represent the tip of the iceberg or the bottom of the barrel.
which is essentially where the post started…
Iles, M.M. (2008). What Can Genome-Wide Association Studies Tell Us about the Genetics of Common Disease. PLoS Genetics, 4(2), e33. DOI: 10.1371/journal.pgen.0040033
April 7th, 2008 at 10:25 am
Could there also be an effect of the effect size on the genetic variance, so that only rare alleles with large effects are going to be seen having an appreciable effect on the genetic variance? IOW selection bias would occur before the association mapping was even commenced. After all, who would bother with association mapping a trait where no genetic variation has been detected?
April 7th, 2008 at 11:51 am
Really rare variants are not going to be represented in a sample at all. If you sample a few thousands of individuals, you will not see variants that are one in a million. But that is not the big problem, the big problem is tagging. We only tag markers with a reasonably large minor allele frequency (otherwise we would need a lot more tagSNPs). A low frequency marker is therefore most likely poorly tagged, and that means that its effect has to be very large to be detected. The whole tagging approach strongly biases the outcome towards common variants. That is why we in general say that the common disease/common variant hypothesis is an underlying assumption of genome wide association mapping. We just don’t know if the hypothesis is actually true.
April 7th, 2008 at 11:53 am
Ah wait, I think I misunderstood the question… you are saying that we might not suspect any genetic component to a disease in the first place if it is only caused by rare variants? I guess so, but in that case the effect is either small — in which case we wouldn’t care much (it might be important for the individual, but not at a population level) — or if it is strong enough to make a difference, a family study would be the way to go… I think.
April 8th, 2008 at 4:57 am
Except you wouldn’t even do the family study, because you wouldn’t see the genetic effect from population studies in the first place.
I haven’t fully thought this through, it might need a good model of how genetic effects are identified (i.e. right from the start) to see the size of these biases.
April 8th, 2008 at 6:06 am
If the effect is rare, but large, you would certainly notice that it clustered in families, so you would suspect a genetic component in it.
June 23rd, 2008 at 2:05 pm
As far as I know, read and observe in our data, the proportion of the variance found so far by replicated common SNPs is “massively” lower than the heritability predicted by familial studies. Of course we could expect some other variants to pop up in larger GWAs (or GWas with additional tags). But, if we take the example of obesity. The most associated finding, FTO, seems to explain around 1% of the variance of BMI while the lowest heritability estimates are at 30 % (and usually higher). I am not sure we’re going to find 30 FTO-like variants (maybe 2 or 3 with improced tagging + pooling of all the GWAs).
Of course, we could oberve an overwhelming propotion of Common Variant (very low effect) Common Disease variants : like 300 explaining each 0.1 % of the variance). But then we are approaching a polygenic component hypothesis.
Exciting times anyway, and maybe interesting that CVCD does not explain all; otherwise all we’d need would be to collect patients and ask for gants to genotype averybody and then watch the chi-square list :-)
September 17th, 2008 at 7:44 am
[...] would say the jury is still out on this one, but it is clear that the CD/CV isn’t as common as it was hyped to be. We can only explain [...]