Posts Tagged ‘statistical power’

Exploiting Hardy-Weinberg Equilibrium for association mapping

Thursday, April 24th, 2008

ResearchBlogging.org Testing single SNP markers for disease association is typically done by comparing the genotype frequencies of cases with those of controls, to see if they differ. The genotype frequencies, of course, must be estimated based on the sampled individuals, and there is some uncertainty in this estimate that might reduce the power. If the genotypes are in Hardy-Weinberg equilibrium (HWE), however, there's a constraint on them that makes the estimate more accurate, so exploiting this in association mapping could increase the statistical power. This is the idea presented in this paper:

Exploiting Hardy-Weinberg Equilibrium for Efficient Screening of Single SNP Associations fro Case-Control Studies

Chen and Chatterjee

Human Heredity 63: 196-204, 2007

Abstract

In case-control studies, the assessment of the association between a binary disease outcome and a single nucleotide polymorphism (SNP) is often based on comparing the observed genotype distribution for the cases against that for the controls. In this article, we investigate an alternative analytic strategy in which the observed genotype frequencies of cases are compared against the expected genotype frequencies of controls assuming Hardy-Weinberg Equilibrium (HWE). Assuming HWE for controls, we derive closed-form expressions for maximum likelihood estimates of the genotype-specific disease odds ratio (OR) parameters and related variance-covariances. Based on these estimates and their variance-covariance structure, we then propose a two-degree-of-freedom test for disease-SNP association. We show that the proposed test can have substantially higher power than a variety of existing methods, especially when the true effect of the SNP is recessive. We also obtain analytic expressions for the bias of the OR estimates when the underlying HWE assumption is violated. We conclude that the novel test would be particularly useful for analyzing data from the initial 'screening' stages of contemporary multi-stage association studies.

It is actually something we have been playing with ourselves in my group, although for epistasis where the genotype frequencies are much harder to estimate because of very few observations of the rare genotypes. It was suggested to us by Patrick Sulem from DeCODE, but this paper is the first I've seen that describes the underlying statistics of it.

Hardy-Weinberg Equilibrium and exploiting it in association mapping

Hardy-Weinberg Equilibrium, or HWE, is a result from population genetics that says that in a random mating population, the proportions of alleles, AA, Aa and aa, is given by p2, 2pq, q2, where p is the allele frequency for A and q=1-p is the allele frequency for a. This equilibrium can, of course, be off in various ways, but in general it is the proportions we expect to observe the three genotypes. Now if the genotypes are in HWE, we need only estimate the allele frequencies (one parameter) rather than the genotype frequencies (two parameters). As a rule of thumb, the fewer parameters we need to estimate before we perform our test, the better off we are. (This is of course something that must be checked from case to case, but in this case it is true...).

Now, if we assume that the population as such is in HWE, and that the genetic effect of the disease is not too severe, then we would expect the controls to be in HWE. So rather than estimating genotype frequencies for the controls, we can instead estimate allele frequencies and get the genotype frequencies from the allele frequencies and the HWE assumption. We can then use these expected genotype frequencies in the association test. For cases we probably cannot assume HWE, at least it is hard to see how cases can be in HWE if the locus has an effect on disease status...

Anyway, in this paper they show that using the expected genotype frequencies -- expected under the HWE assumption -- the power of the test is improved. Quite dramatically for recessive disease effects and less so for dominant and multiplicative effects.

The HWE assumption might be violated, so to trust the test we must know how robust it is to violations of this assumption. The paper shows that deviations from HWE certainly does affect the test, but will do so by increasing the number of false positives. The authors then suggest that the test can be used to screen GWA data in an initial stage, but that it probably shouldn't be used in later stages.

Personally, I am a bit curious about how you could go about detecting the degree of Hardy-Weinberg disequilibrium and perhaps compensate for it in the test.  Of course, that would give you another parameter to estimate, so  you might end up with loosing the power gained by assuming HWE, so it might not be the way to go...


Chen, J., Chatterjee, N. (2007). Exploiting Hardy-Weinberg Equilibrium for Efficient Screening of Single SNP Associations from Case-Control Studies. Human Heredity, 63, 196-204.

What can we learn from genome-wide association studies conducted so far?

Monday, April 7th, 2008

ResearchBlogging.orgGenome-wide association studies rely on the "common disease / common variant" hypothesis: that the major genetic effect of a common disease (major effect at the population level, not necessarily for the individual) is caused by a few genetic variants that are common in the population. It is of course a parsimonious explanation, explaining the high disease frequency with a few common variants rather than many rare variants, but there are also population genetics arguments in favour of this hypothesis. Is it true, then? Can we conclude that, from the association mapping studies published the last year and a half? If we had found absolutely nothing we would probably reject it, but we have found something, just not enough to explain all the genetic contribution to the diseases we have studied, so we haven't really answered the question yet.This paper makes an attempt at answering the question:

What Can Genome-Wide Association Studies Tell Us about the Genetics of Common DiseaseMark M. Iles. PLoS Genet 4(2): e33. doi:10.1371/journal.pgen.0040033

Abstract

The success of genome-wide association studies relies on much of the risk of common diseases being due to common genetic variants; but evidence for this is inconclusive. The results of published genome-wide association studies are examined to see what can be learnt about the distribution of disease-associated variants and how this might influence future study design. Although replicated disease-associated variants tend to be very common and frequency is inversely correlated with estimated effect size, our simulations suggest that such observations are the result of power. We find that for studies conducted to date, the frequency and effect size of significantly associated alleles are likely to be similar to those of the underlying disease alleles that they represent. Little of the genetic variation of disease has been explained so far, but current studies are only adequately powered to detect very common alleles unless they greatly increase disease risk. Thus, although the truth of the common disease / common variant hypothesis remains undecided, recent successes suggest that there are many more common genetic disease-associated variants, requiring larger studies to be identified.

First, the author notices that there is a negative correlation between the strength of the genetic and the allele frequency of the increased-risk variants in the published studies. One could argue that selection is the cause of this: if there is selection against the disease, then the disease variant will be kept to low frequencies, but there is also a negative correlation between the minor allele frequency of the disease marker and the genetic effect even when the increased-risk allele is the major allele, which could instead suggest that the correlation is caused by the statistical power to detect low-frequency disease markers: only for high genetic effects have we observed any.I'm not completely comfortable with this argument myself. At the very least, it should be argued that the effect is not simply a consequence of the at risk allele being the minor allele more often than by chance, but anyway the argument is not essential for the rest of the paper, where the power question is addressed.The big question is: would we see different distributions of discovered disease allele frequencies if the main genetic component is a few common variants (common disease/common variant) or if the genetic component was caused by several low frequency variants?The question is addressed through a simulation study, where data is simulated with i) mainly low-frequent disease variants, ii) some low-frequent and some high-frequent disease variants, and iii) mainly high-frequent variants. An association test is performed, and the frequencies of the significant markers is examined.If there is a difference, the distribution that looks the most like the observed distribution from existing studies would be the most likely explanation for the real genetic architecture underlying common diseases.As it turns out, the frequencies of the detected markers are not different under the three setups unless either the genetic effect is strong (genetic relative risk GRR >= 2) or the sample size is large (n=3000).  While we have several studies with high enough sample sizes, the effect sizes we have seen so far are rather small (GRR from 1.1 to 1.5 or so), so we are in the ranger where we might be able to see a difference, but not where we are guaranteed to see it.In other words: with the sample size we have used so far, we do not really have the power to detect the rare variants that would tell us if the common disease/common variant hypothesis is true or not.  Regardless of whether it is true, or whether more low-frequency alleles contribute the major part to the genetic component of a disease, we would see the same distribution of frequencies as we have observed so far.I will conclude with the final paragraph from the paper:

For now, it is unlikely that much can be inferred about the CDCV hypothesis from the results of GWA studies. The successes in finding common variants associated with common diseases are encouraging, but, as our findings show, we cannot yet be sure whether the common disease-associated variants found so far represent the tip of the iceberg or the bottom of the barrel.

which is essentially where the post started...


Iles, M.M. (2008). What Can Genome-Wide Association Studies Tell Us about the Genetics of Common Disease. PLoS Genetics, 4(2), e33. DOI: 10.1371/journal.pgen.0040033