Exploiting Hardy-Weinberg Equilibrium for association mapping
Testing single SNP markers for disease association is typically done by comparing the genotype frequencies of cases with those of controls, to see if they differ. The genotype frequencies, of course, must be estimated based on the sampled individuals, and there is some uncertainty in this estimate that might reduce the power. If the genotypes are in Hardy-Weinberg equilibrium (HWE), however, there's a constraint on them that makes the estimate more accurate, so exploiting this in association mapping could increase the statistical power. This is the idea presented in this paper:
Chen and Chatterjee
Human Heredity 63: 196-204, 2007
Abstract
In case-control studies, the assessment of the association between a binary disease outcome and a single nucleotide polymorphism (SNP) is often based on comparing the observed genotype distribution for the cases against that for the controls. In this article, we investigate an alternative analytic strategy in which the observed genotype frequencies of cases are compared against the expected genotype frequencies of controls assuming Hardy-Weinberg Equilibrium (HWE). Assuming HWE for controls, we derive closed-form expressions for maximum likelihood estimates of the genotype-specific disease odds ratio (OR) parameters and related variance-covariances. Based on these estimates and their variance-covariance structure, we then propose a two-degree-of-freedom test for disease-SNP association. We show that the proposed test can have substantially higher power than a variety of existing methods, especially when the true effect of the SNP is recessive. We also obtain analytic expressions for the bias of the OR estimates when the underlying HWE assumption is violated. We conclude that the novel test would be particularly useful for analyzing data from the initial 'screening' stages of contemporary multi-stage association studies.
It is actually something we have been playing with ourselves in my group, although for epistasis where the genotype frequencies are much harder to estimate because of very few observations of the rare genotypes. It was suggested to us by Patrick Sulem from DeCODE, but this paper is the first I've seen that describes the underlying statistics of it.
Hardy-Weinberg Equilibrium and exploiting it in association mapping
Hardy-Weinberg Equilibrium, or HWE, is a result from population genetics that says that in a random mating population, the proportions of alleles, AA, Aa and aa, is given by p2, 2pq, q2, where p is the allele frequency for A and q=1-p is the allele frequency for a. This equilibrium can, of course, be off in various ways, but in general it is the proportions we expect to observe the three genotypes. Now if the genotypes are in HWE, we need only estimate the allele frequencies (one parameter) rather than the genotype frequencies (two parameters). As a rule of thumb, the fewer parameters we need to estimate before we perform our test, the better off we are. (This is of course something that must be checked from case to case, but in this case it is true...).
Now, if we assume that the population as such is in HWE, and that the genetic effect of the disease is not too severe, then we would expect the controls to be in HWE. So rather than estimating genotype frequencies for the controls, we can instead estimate allele frequencies and get the genotype frequencies from the allele frequencies and the HWE assumption. We can then use these expected genotype frequencies in the association test. For cases we probably cannot assume HWE, at least it is hard to see how cases can be in HWE if the locus has an effect on disease status...
Anyway, in this paper they show that using the expected genotype frequencies -- expected under the HWE assumption -- the power of the test is improved. Quite dramatically for recessive disease effects and less so for dominant and multiplicative effects.
The HWE assumption might be violated, so to trust the test we must know how robust it is to violations of this assumption. The paper shows that deviations from HWE certainly does affect the test, but will do so by increasing the number of false positives. The authors then suggest that the test can be used to screen GWA data in an initial stage, but that it probably shouldn't be used in later stages.
Personally, I am a bit curious about how you could go about detecting the degree of Hardy-Weinberg disequilibrium and perhaps compensate for it in the test. Of course, that would give you another parameter to estimate, so you might end up with loosing the power gained by assuming HWE, so it might not be the way to go...
Chen, J., Chatterjee, N. (2007). Exploiting Hardy-Weinberg Equilibrium for Efficient Screening of Single SNP Associations from Case-Control Studies. Human Heredity, 63, 196-204.
April 24th, 2008 at 9:24 am
The power of tests for HWE is notoriously bad, so in practice the gain in power is being traded off against model mis-specification. I guess I should read the paper to see how bad this is.
April 24th, 2008 at 1:41 pm
this slighlty older paper on HWE and disease models may also be of interest:
http://www.ajhg.org/retrieve/pii/S0002929707628948
April 26th, 2008 at 3:29 am
Bob: They are not actually testing for HWE they just assume it. So yes, in a sense it is misspecification of the model. In our own experiments with it, though, we do not see much of a problem. We tried analysing the Wellcome trust data with permutated phenotypes (to get the null distribution) and we are not far from the expected null if we do that. Of course, it only works as a rough filtering of the markers, so (agian as they suggest in the paper) any hits should be analysed in more detail.
April 26th, 2008 at 3:31 am
G: thanks for the reference, I'll have a look at the paper.