Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis
Relevant for my previous post is this paper:
Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis
Bradley Efron
Abstract
Current scientific techniques in genomics and image processing routinely produce hypothesis testing problems with hundreds or thousands of cases to consider simultaneously. This poses new difficulties for the statistician, but also opens new opportunities. In particular it allows empirical estimation of an appropriate null hypothesis. The empirical null may be considerably more dispersed than the usual theoretical null distribution that would be used for any one case considered separately. An empirical Bayes analysis plan for this situation is developed, using a local version of the false discovery rate to examine the inference issues. Two genomics problems are used as examples to show the importance of correctly choosing the null hypothesis.
The topic is exactly the case when the real data does not follow the theoretical null distribution, but is a mixture of two distributions where we want to find the observations from the “interesting” of the two distribution. The suggestion in the paper is to estimate an empirical null distribution from the data, and then use that distribution to find the significant observations.
–
185-187=-2