Posts Tagged ‘statistics’

Uncertainty and randomness, part II

Monday, August 10th, 2009

Ok, so in my previous post I considered tossing a (biased) coin and trying to figure out the probability of heads over tails.  The probability of tossing heads, p is a property of the coin and fixed, it is just unknown, so there will be some randomness in estimating it due to the randomness in coin tosses we use to estimate.

In the Bayesian analysis I used, I treated the parameter p as stochastic, because that is how that kind of analysis works, but it never really was.  In this post I will consider a random p.

I am going to consider an experiment where I am picking coins out of a bag, tossing them and recording whether they are heads or tails, and then throwing away the coin to pick another one.  I think that the coins are biased, but I don’t know how.  Can I figure that out?

Formalizing this mathematically, I am assuming that I am drawing coins each with a p probability of coming up head, where the ps are beta distributed with (unknown) parameters \alpha and \beta.  I get to toss each coin once to see if it turns up heads or tails, and then I have to throw the coin away, so for coin i I have a Bernoulli distribution with parameter p_i.

a <- 3 ; b <- 4
n <- 100
outcomes <- rbinom(n,1,rbeta(1,a,b))
s <- sum(outcomes)

The task is to figure out the distribution of ps, which means estimating hyper parameters \alpha and \beta.

Any ideas?

Doing the same as last time?

In the setup in my last post, we modelled the situation very similarly.  We drew a p at random and then tossed coins and got a distribution for p that way.  Could that work again?

We could pick a prior for p, say \mathrm{beta}(2,2) as last time, calculate the posterior based on the number of heads out of n tosses and see if that tends toward the real distribution of ps.

plot.analysis <- function(prior.a,prior.b ,real.a,real.b, n) {
  x <- seq(0,1,length.out=1000)

  # real p distribution
  real <- dbeta(x,real.a,real.b)

  # get prior
  prior <- dbeta(x,prior.a,prior.b)

  # perform experiment
  s <- sum(rbinom(n,1,rbeta(1,real.a,real.b)))
  # get posterior
  post <- dbeta(x,prior.a+s,prior.b+n-s)

  # plot results...
  plot(x, post, main='Prior, likelihood and posterior',
       type='l', xlab='', ylab='')
  lines(x, real, col='green')
  lines(x, prior, col='red')
}

opar <- par(mfrow=c(2,2), mar=c(2,2,2,2)+0.1)
plot.analysis(2,2, 3,4, 5)
plot.analysis(2,2, 3,4, 10)
plot.analysis(2,2, 3,4, 50)
plot.analysis(2,2, 3,4, 100)
par(opar)

(Ups, the title for these plots are wrong … there is the real distribution of ps but no likelihood … but I really cannot be bothered to re-do the plot right now…)

Ok, clearly that didn’t work.  Why?

For one thing, even if p is stochastic in this setup, there is only one p, but in the setup we are modelling there is one for each coin toss.

By the very nature of the beta distribution, the more heads and tails we connect, the “pointy’er” it will become, so if the real distribution of ps is relatively flat, it will look less and less like it.

A simple maximum likelihood solution

Since all we have to do to estimate the distribution of ps is to estimate the hyper parameters \alpha and \beta, we could just take the maximum likelihood approach.  We just need to work out the likelihood of our model and then maximize it with respect to the two.

If \pi(p;\alpha,\beta) is the distribution of ps, then the likelihood of ps and heads and tails is

\mathrm{lhd}(\alpha,\beta,p_1,\ldots,p_n) = \prod_i p_i^{h_i}(1-p_i)^{1-h_i} \pi(p_i)

where h_i is 1 if the ith coin toss is head and 0 otherwise.

We don’t want to maximize with respect to all the p_is – and with one coin toss for each we don’t have a lot of information to do that anyway – but we can just integrate them out, and since they are independent that gives us

\mathrm{lhd}(\alpha,\beta) = \prod_i \int p_i^{h_i}(1-p_i)^{1-h_i} \pi(p_i;\alpha,\beta) \,\mathrm{d}p_i

or

\log\mathrm{lhd}(\alpha,\beta) = s \log \int p\pi(p;\alpha,\beta) \,\mathrm{d}p + (n-s)\log \int (1-p)\pi(p;\alpha,\beta) \,\mathrm{d}p

where n is the number of coin tosses and s the number of heads.

With a beta prior for p

\pi(p;\alpha,\beta) = \frac{1}{B(\alpha,\beta)}p^{\alpha-1}(1-p)^{\beta-1}

where B(\alpha,\beta)=\int p^{\alpha-1}(1-p)^{\beta-1} \,\mathrm{d}p we get

\int p\pi(p;\alpha,\beta) \,\mathrm{d}p = B(\alpha+1,\beta)/B(\alpha,\beta)

and

\int (1-p)\pi(p;\alpha,\beta) \,\mathrm{d}p = B(\alpha,\beta+1)/B(\alpha,\beta)

so the log-likelihood is

\log\mathrm{lhd}(\alpha,\beta) = s\log B(\alpha+1,\beta) + (n-s)\log B(\alpha,\beta+1) - n\log B(\alpha,\beta).

You can probably optimize this with respect to \alpha and \beta analytically, but I’m just going to use a numerical optimisation to try it out:

a <- 3 ; b <- 4
n <- 100
s <- sum(rbinom(n,1,rbeta(1,a,b)))
logL <- function(a,b, n,s) {
  s*lbeta(a+1,b) + (n-s)*lbeta(a,b+1) - n*lbeta(a,b)
}
search <- nlm(function (x) -logL(x[1],x[2],n,s), c(3,4))
mle.a <- search$estimate[1]
mle.b <- search$estimate[2]
opar <- par(mfrow=c(1,2))
xs <- seq(1,10,length.out=500)
ys <- sapply(xs, function(x) logL(x,4,n,s))
plot(xs,ys,type='l',xlab=expression(alpha),main='Likelihood',ylab='logL')
abline(v=mle.a)
mtext(mle.a,at=mle.a)
ys <- sapply(xs, function(x) logL(3,x,n,s))
plot(xs,ys,type='l',xlab=expression(beta),main='Likelihood',ylab='logL')
abline(v=mle.b)
mtext(mle.b,at=mle.b)
par(opar)

and we can plot the true p density to compare it with the inferred:

x <- seq(0,1,length.out=100)
true <- dbeta(x,3,4)
inferred <- dbeta(x,mle.a,mle.b)
plot(x,true,type='l',col='red',
     main='True and inferred distribution',
     xlab='', ylab='',
     ylim=range(true,inferred))
lines(x,inferred, col='blue')

You might get different results here; when I experimented with it I didn’t always get results this close to the true values, but I picked a nice result for the plots here…

222-223=-1

Uncertainty and randomness, part I

Sunday, August 9th, 2009

When I wrote about confidence intervals the other day, I was actually inspired to think about it from two papers from here:

The topic of those two papers is two kinds of uncertainty: epistemic uncertainty and aleatory uncertainty, where the former is uncertainty due to lack of knowledge while the latter is uncertainty due to randomness.

If we’re trying to estimate a parameter as in the confidence interval post, we are dealing with lack of knowledge.  The parameter is assumed fixed and not random at all, but we don’t know what it is so there is some uncertainty in our estimate.  The more data we have available, the better we can estimate the parameter, so it is possible for us to reduce the uncertainty by increasing our knowledge.

If we want to predict if the toss of a fair coin comes up head, we are dealing with randomness.  If it is a fair coin, then there is a 50/50 chance of the next flip coming up head, and no matter how many times we repeat the experiment the uncertainty about the outcome is the same.

I’m not going to write much more about this, since you are better off reading the papers anyway, but it got me thinking about uncertainty in flipping (un-)fair coins and that is what this post and the next is about.

The stuff below doesn’t really relate to epistemic or aleatory uncertainty, so you don’t have to read the papers to read on here.  It is just two different models of coin tossing and having some fun with that…

Unknown bias

First let us consider a scenario where we have a coin that we do not necessarily know if is fair or not, but we want to figure out the probability of tossing head.  So the model of the system is a simple Bernoulli distribution where we have a single parameter p and where we toss head with probability p and tail with probability 1-p.

If we toss the coin n times the likelihood of getting s heads is binomially distributed and the likelihood is proportional to p^s(1-p)^{n-s}.  The maximum likelihood estimate of p is s/n.

You can play around with that yourself with a bit of R code as below:

p <- 0.39
ns <- seq(1,100)
est <- sapply(ns, function(n) rbinom(1,n,p)/n)

opar <- par(mar=c(5,4,2,4)+0.1)
plot(ns, est, ylim=c(0,1),
     ylab='Maximum likelihood estimates (s/n)',
     xlab='Number of tosses')
abline(h=p)
mtext(paste(" p = ", p), at=p, side=4, las=1)
par(opar)

The parameter is fixed – in the example above as 0.39 – and the entire uncertainty is in the estimate.  That uncertainty decreases with the number of tosses – if you want to you could play around with confidence intervals and see this.  I won’t do this here but instead use credible intervals since I used confidence intervals in the previous post.

Anyway, as I said, the parameter is fixed but unknown.  In a “frequentist” analysis we would really consider the parameter fixed and consider the estimate random.  In a “Bayesian” analysis, on the other hand, we would still consider the parameter random.  We would just learn more and more of the distribution of it as we toss the coin.

Whether you can consider an unknown but fixed parameter (uncertainty due to lack of knowledge) as a stochastic variable (uncertainty due to randomness) is a philosophical question more than a mathematical one, and you are welcome to join in on the frequentist vs Bayesianist fight any day, I am sure.

I’m going to do a Bayesian analysis here, since I did a frequentist analysis in the confidence interval post, but I’ll be back to a frequentist analysis in the next post.

Priors and posteriors

Ok, so to analyse this system in a Bayesian way, we need to model the parameter p as a stochastic variable.  We do this by giving it a prior distribution \pi(p).  This should capture the uncertainty we have about p before we start tossing the coin.

In a frequentist analysis we have no way of quantifying our intuition about p, such as we expect it to be closer to 0.5 than to 0 or 1.

On the other hand, because we need to provide a prior, we have no way in a pure Bayesian analysis of claiming complete prior ignorance.  There is a concept of “uninformative priors” or “reference priors”, but claiming ignorance in Bayesian analysis is not as easy as you might think.  Anyway, that is way beyond the scope of this blog post, so I’ll be on with the analysis.

For mathematical convinience we prefer something called conjugate priors, and for Bernoulli distributions that is the beta distribution.  Don’t worry about what a conjugate prior is, it is not essential for the analysis here.

The beta distribution is parameterised with two “hyper parameters” \alpha and \beta and has density \pi(p)\propto p^{\alpha-1}(1-p)^{\beta-1} with normalisation constant 1/B(\alpha,\beta) where of course

B(\alpha,\beta)=\int_0^1 p^{\alpha-1}(1-p)^{\beta-1}\,\mathrm{d}p.

Depending on \alpha and \beta the beta distribution can take various shapes.  Whenever \alpha=\beta the density is symmetric and (when \alpha=\beta\geq 1) with mode at 0.5, so if your prior belief is that coins are likely to be unbiased, then you want equal \alpha and \beta.  The greater the parameters, the more “pointy” the density is around the mode, so the higher \alpha and \beta the stronger you make your prior belief about the distribution of p.  With \alpha=\beta=1 you get a uniform distribution.

You can try it out with this R code

plot.beta <- function(a,b) {
  p <- seq(0,1,length.out=100)
  d <- dbeta(p,a,b)
  plot(p,d,
       main=substitute(paste(alpha==a, " and ", beta==b),
                       list(a=a,b=b)),
       type='l', xlab='', ylab='')
}
opar <- par(mfrow=c(2,2), mar=c(2,2,2,2)+0.1)
plot.beta(1,1)
plot.beta(2,2)
plot.beta(4,3)
plot.beta(2,6)
par(opar)

The prior distribution is capturing the knowledge we have about p (or think we have about it) before we do any experiments by tossing the coin.  Once you’ve picked one that captures your intuition about how you would expect p to be, the trick is to use the experiments to change the distribution of p to now capture both your prior expectation and the knowledge you have gained about p from doing your experiments.

Let \mathrm{lhd}(s,n;p)\propto p^s(1-p)^{n-s} denote the likelihood of p when we observe s heads out of n tosses, i.e. \mathrm{lhd}(s,n;p) is the probability of s heads out of n tosses conditional on p.  The joint probability of s, n and p is then \mathrm{lhd}(s,n;p)\pi(p) and the posterior probability for p, P(p) is proportional to this P(p)\propto \mathrm{lhd}(s,n;p)\pi(p).  This is always the case; the posterior is always proportional to the prior times the likelihood, a result you might know as Bayes’ rule.

The posterior probability is interesting because it gives you a distribution for p that depends on the results of your experiment.

In many way this posterior does what the confidence intervals does in frequentist analysis.  It captures the parameter estimate and the uncertainty in it.  The difference here is that in the Bayesian approach the parameter really is considered a random parameter, distributed as the prior before the experiment and as the posterior after the experiment.

Okay, since \mathrm{lhd}(s,n;p)\propto p^s(1-p)^{n-s} and \pi(p)\propto p^{\alpha-1}(1-p)^{\beta-1} the posterior is P(p) \propto p^s(1-p)^{n-s}p^{\alpha-1}(1-p)^{\beta-1}=p^{s+\alpha-1}(1-p)^{n-s+\beta-1}.

So if we start with a \mathrm{beta}(\alpha,\beta) prior and observe s heads in n trials, we end up with a \mathrm{beta}(\alpha+s,\beta+n-s) posterior.  It is this nice relationship that makes the beta distribution a conjugate prior, in case you are wondering.

When coming up with the prior, you can therefore think of \alpha and \beta as “pseudo counts” of heads and tails.  Results you never actually saw, but they behave just as counts of head and tail.

We can try out the Bayesian analysis like this:

plot.analysis <- function(a,b,p,n) {

  # get prior
  x <- seq(0,1,length.out=1000)
  prior <- dbeta(x,a,b)

  # perform "experiment"
  s <- rbinom(1,n,p)

  # get likelihood
  lik <- dbeta(x,s,n-s)

  # get posterior
  post <- dbeta(x,a+s,b+n-s)

  # plot results...
  plot(x, post, main='Prior, likelihood and posterior',
       type='l', xlab='', ylab='')
  lines(x, lik, col='blue')
  lines(x, prior, col='red')
  abline(v=p, lty='dashed')
}

opar <- par(mfrow=c(2,2), mar=c(2,2,2,2)+0.1)
plot.analysis(2,2,0.39,5)
plot.analysis(2,2,0.39,10)
plot.analysis(2,2,0.39,50)
plot.analysis(2,2,0.39,100)
par(opar)

Here I’m plotting the prior – where I’ve chosen a beta(2,2) prior – in red, the likelihood in blue and the posterior in black.

When we have only done a few experiments, the prior will dominate and we see the posterior as somewhere in between the likelihood and the prior.  With many experiments, the likelihood dominates and the posterior resembles it more and more.  This is as it should be, considering the the more data we have from experimenting, the more we should trust the experiments over our previous intuition.

Credible intervals

With a posterior in hand, we have all the information about our parameter that we can get in a Bayesian framework, but the posterior is not always easy summarise (or visualise in high dimensions) so we can also reduce it to an interval in which the parameter is likely to be.

This is like the confidence intervals in frequentist analysis, but this time the parameter really is stochastic so we have a fixed interval, the credible interval, where there is a, say, 95% chance that the parameter falls in.

To get the 95% credible interval, we just get the 0.025 and 0.975 quantiles of the posterior.

qbeta(c(0.025,0.975),a+s,b+n-s)

If the experimental results were really fixed in stone (so the number of heads and number of tails were fixed) and the coin bias random, this would be a fixed interval where the parameter p would fall within in 95% of the cases.

# data  -- assumed fixed forever...
n <- 100
s <- 47

# prior
a <- 2 ; b <- 2

# credible interval
CI <- qbeta(c(0.025,0.975), a+s, b+n-s)

# samples of p
ps <- rbeta(100, a+s, b+n-s)

plot(ps, ylim=c(0,1), main='Credible interval and samples of p',
     ylab='Samples')
abline(h=CI, lty='dashed')

Of course, in this case this is really only something we pretend.  It is in fact p that is fixed and we can re-do the experiment whenever we want, and in that case we would see a fixed p and a random credible interval, so there really is very little difference between credible intervals and confidence intervals in practice…

# experiments
p <- 0.39
n <- 100
s <-
ns <- seq(1,100, by=5)
heads <- sapply(ns, function(n) rbinom(1,n,p))

# prior
a <- 2 ; b <- 2

# point estimate
modes <- mapply(function(n,s) (a+s)/(n+a+b), ns, heads)

# credible interval
lower <- mapply(function(n,s) qbeta(0.025,a+s,b+n-s), ns, heads)
upper <- mapply(function(n,s) qbeta(0.975,a+s,b+n-s), ns, heads)

plot(ns, modes, ylim=c(0,1),
     main='Point estimates and credible intervals',
     xlab='Number of tosses', ylab='Estimate')
lines(ns, lower)
lines(ns, upper)
abline(h=p, lty='dashed')

Ok, enough about this setup.

In a later post I will consider an actual random p parameter and see how we can estimate its distribution from coin tosses.  Stay tuned!

221-222=-1

The problem with confidence intervals

Friday, August 7th, 2009

I’ve ranted plenty about the problem with p-values and the common misunderstanding that the p-value is the probability of the null model being true.  But what about confidence intervals?

It is a common misunderstanding that if you have a 95% confidence interval, then the parameter you are estimating is within the interval with 95% probability.  This turns out to be just slightly wrong, and where the misunderstanding about p-values simply doesn’t make sense at all considering how p-values are used, you cannot really run into much trouble with this misunderstanding.

Inference uncertainty

Okay, so what is a confidence interval, and why is the above a misunderstanding?

A confidence interval is something we use when estimating parameters that is supposed to capture the uncertainty there is in the inference.

Let’s say I want to know the weight of the water in a small lake close to where I live.  Don’t ask why, what I do in my free time is none of your business, and anyway it is just an example so play along!  So I go down to the lake and get a liter of the water and weigh it. Yeah, it is probably going to be close to 1kg, since that would be the weight of pure water and the lake water presumably is mostly water, but it will not be exactly that because of all kinds of impurities in it.

So the parameter I’m interested in is the weight of the water.  The water is probably reasonably homogeneous so it doesn’t matter where I sample the water.

If I just do this once, I will have a single measurement and that will be my estimate of the parameter, but homogeneous or not there might be a bit more mud or something in the sample I take than usual, or a bit less, so there could be some measurement error so my one measure is not the true value.

You all know how to get around this: you make several samples and use the average as the estimate.

All well and good, but once you start making several samples to get the parameter, you have already admitted that you have some uncertainty in your measurements, and that leaks into uncertainty in your estimate.  The degree of uncertainty, however, cannot be seen from the final estimate.

This uncertainty depends on the variation in your measures, of course – if they vary a lot you are less certain than if they are all roughly the same – but also on the number of samples – with just the first sample there is no variation in measures but that doesn’t mean that you will trust that one measure more than the average of ten measures.

Confidence intervals

It is this uncertainty that is captured by the confidence interval.

Before we can quantify the uncertainty at all we need to set up a model of the data, of course.  There are no “one size fits all” with that, but lots of simple models that often work.

For the water weight example a good first attempt would be to assume that the measurements were stochastic variables X distributed as X\sim N(w,\sigma^2) where w is the true weight – the parameter we wish to infer – and \sigma^2 the variance in measures caused by the measurement uncertainty, a nuisance parameter.

When estimating w we sample n observations and use the average value, so from the model of individual measurements we also get a model of the estimator.  In this particular case from the central limit theorem.

In general, a confidence interval for parameter \theta is two statistics l(X) and u(X) – functions of the stochastic variable X – satisfying that P\left(l\left(X\right) \leq \theta \leq u\left(X\right)\right)=1-\alpha where 1-\alpha is the confidence level, typically 95%.

This might look a little complicated, but it isn’t really.  Just like the mean of a set of observations – we could call it m(X)=\frac{1}{n}\sum_{i=1}^nX_i – is a way of summarising an aspect of the data, so are l(X) and u(X), and just as the mean of n observations is a stochastic variable (until you actually make the observations) so will l(X) and u(X) be.

The trick, of course, is defining them so they satisfy P\left(l\left(X\right) \leq \theta \leq u\left(X\right)\right)=1-\alpha.  In general this is not easy to do, but for our example here it is.

We know that our estimate – the mean of observations – is distributed as \hat{w} \sim N(w,\sigma^2/n) when \hat{w} is the mean of n observations.  That’s what we got from the central limit theorem.

So given w and \sigma we can easily define an interval that \hat{w} will fall into with probability 95%, and the obvious choice here would be the symmetric one around w.  We could call this the 95% interval (but not confidence interval quite yet since it doesn’t refer to any given estimate or observations yet).

It has the properties we would expect from a confidence interval, though, it depends on the underlying uncertainty in the model, \sigma^2, and it gets smaller as we get more data and reduce our uncertainty.

The problem is, of course, that this is completely useless since we don’t know w and \sigma and we need those guys to get this interval.

Well, not quite useless.

Because we know the distribution of \hat{w} we know how much larger or smaller than w we expect it to be (the 95% interval from above), and since \hat{w} is symmetrically distributed around w it is the same distances when \hat{w} is above and when it is below w.

So we can get the 95% confidence interval for \hat{w} by taking the width of the 95% interval (for w) but centering it at \hat{w}.

For the confidence interval defined this way, for w not to be contained in the confidence interval, \hat{w} would have to be either smaller than the lower limit of the 95% interval or larger than the upper limit of the 95% interval, in other words for w not to be contained in the 95% confidence interval, \hat{w} would have to not be contained in the 95% interval which it is with 95% probability, so the confidence interval defined this way satisfy the requirement we made for it.

(Yes, I know that I also need to know \sigma for this to work, and there is some added uncertainty from using an estimate rather than the true value.  The width of the interval, for example, will be a function of the estimate of the variance if you do as above, and if you underestimate \sigma your interval will be too narrow.  Go read a statistics text book if this bothers you, for the example here I’m just going to ignore it.)

What’s with the misunderstanding about confidence intervals?

Ok, so that really does sound like if we have a confidence interval, then the true parameter is within the interval with 95% probability, right?

Yes, almost, which is why I wrote that this misunderstanding doesn’t matter much and won’t get you into any problems (unlike the p-value misunderstanding that leads to craziness and probably cases cancer and global warming as well).

The distinction is almost philosophical and has to do with what we consider stochastic and what we consider fixed.

Here, we consider the parameter w unknown but fixed.  It doesn’t vary and there is no stochasticity to it.  So it simply doesn’t make sense to talk about the probability of it being in any given interval.  It is either in the interval or not in the interval and that is all there is to it.

The confidence interval, on the other hand, is stochastic.  If we did the experiment 10 times, we would get 10 different confidence intervals, all which would contain the true parameter with probability 95%.

The confidence interval will contain the true parameter with 95% probability, but the true parameter does not fall within the interval with 95% probability.

The difference is so subtle that it borders the ridiculous to even talk about it.

Credible intervals

In Bayesian statistics it is the other way around.  There you do get fixed intervals and random parameters.

Here parameters we are uncertain about are not considered fixed but unknown, but are assumed to be stochastic. Based on observations and a prior probability distribution you would infer a credible interval that is then considered fixed, while the parameter is stochastic (but falls within the interval with 95% probability).

Anyway, if you have a confidence interval, don’t say “with 95% probability the parameter lies in the interval [a,b]…” since strictly speaking that is incorrect.  For a credible interval it would be correct.

In practise, it makes no difference what so ever, so don’t worry too much about it.

219-219=0

Are women getting more beautiful

Tuesday, July 28th, 2009

The story is all over the net these days, see e.g. here, but are women really getting more beautiful?

Walking down town in the summer time I wouldn’t necessarily say no.  They do look beautiful in their summer dresses and miniskirts, but still… the genetics would have to be a bit special for it to be true, and the statistics isn’t really supporting it.

My bet would be on a classical multiple testing problem, as described here (PDF).  I’m not saying that multiple testing is all there is to it, but I would like to have that ruled out before I believe the story…

209-208=+1

Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis

Sunday, July 5th, 2009

Relevant for my previous post is this paper:

Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis

Bradley Efron

Abstract

Current scientific techniques in genomics and image processing routinely produce hypothesis testing problems with hundreds or thousands of cases to consider simultaneously. This poses new difficulties for the statistician, but also opens new opportunities. In particular it allows empirical estimation of an appropriate null hypothesis. The empirical null may be considerably more dispersed than the usual theoretical null distribution that would be used for any one case considered separately. An empirical Bayes analysis plan for this situation is developed, using a local version of the false discovery rate to examine the inference issues. Two genomics problems are used as examples to show the importance of correctly choosing the null hypothesis.

The topic is exactly the case when the real data does not follow the theoretical null distribution, but is a mixture of two distributions where we want to find the observations from the “interesting” of the two distribution.  The suggestion in the paper is to estimate an empirical null distribution from the data, and then use that distribution to find the significant observations.

185-187=-2