Uncertainty and randomness, part II

Ok, so in my previous post I considered tossing a (biased) coin and trying to figure out the probability of heads over tails.  The probability of tossing heads, p is a property of the coin and fixed, it is just unknown, so there will be some randomness in estimating it due to the randomness in coin tosses we use to estimate.

In the Bayesian analysis I used, I treated the parameter p as stochastic, because that is how that kind of analysis works, but it never really was.  In this post I will consider a random p.

I am going to consider an experiment where I am picking coins out of a bag, tossing them and recording whether they are heads or tails, and then throwing away the coin to pick another one.  I think that the coins are biased, but I don’t know how.  Can I figure that out?

Formalizing this mathematically, I am assuming that I am drawing coins each with a p probability of coming up head, where the ps are beta distributed with (unknown) parameters \alpha and \beta.  I get to toss each coin once to see if it turns up heads or tails, and then I have to throw the coin away, so for coin i I have a Bernoulli distribution with parameter p_i.

a <- 3 ; b <- 4
n <- 100
outcomes <- rbinom(n,1,rbeta(1,a,b))
s <- sum(outcomes)

The task is to figure out the distribution of ps, which means estimating hyper parameters \alpha and \beta.

Any ideas?

Doing the same as last time?

In the setup in my last post, we modelled the situation very similarly.  We drew a p at random and then tossed coins and got a distribution for p that way.  Could that work again?

We could pick a prior for p, say \mathrm{beta}(2,2) as last time, calculate the posterior based on the number of heads out of n tosses and see if that tends toward the real distribution of ps.

plot.analysis <- function(prior.a,prior.b ,real.a,real.b, n) {
  x <- seq(0,1,length.out=1000)

  # real p distribution
  real <- dbeta(x,real.a,real.b)

  # get prior
  prior <- dbeta(x,prior.a,prior.b)

  # perform experiment
  s <- sum(rbinom(n,1,rbeta(1,real.a,real.b)))
  # get posterior
  post <- dbeta(x,prior.a+s,prior.b+n-s)

  # plot results...
  plot(x, post, main='Prior, likelihood and posterior',
       type='l', xlab='', ylab='')
  lines(x, real, col='green')
  lines(x, prior, col='red')
}

opar <- par(mfrow=c(2,2), mar=c(2,2,2,2)+0.1)
plot.analysis(2,2, 3,4, 5)
plot.analysis(2,2, 3,4, 10)
plot.analysis(2,2, 3,4, 50)
plot.analysis(2,2, 3,4, 100)
par(opar)

(Ups, the title for these plots are wrong … there is the real distribution of ps but no likelihood … but I really cannot be bothered to re-do the plot right now…)

Ok, clearly that didn’t work.  Why?

For one thing, even if p is stochastic in this setup, there is only one p, but in the setup we are modelling there is one for each coin toss.

By the very nature of the beta distribution, the more heads and tails we connect, the “pointy’er” it will become, so if the real distribution of ps is relatively flat, it will look less and less like it.

A simple maximum likelihood solution

Since all we have to do to estimate the distribution of ps is to estimate the hyper parameters \alpha and \beta, we could just take the maximum likelihood approach.  We just need to work out the likelihood of our model and then maximize it with respect to the two.

If \pi(p;\alpha,\beta) is the distribution of ps, then the likelihood of ps and heads and tails is

\mathrm{lhd}(\alpha,\beta,p_1,\ldots,p_n) = \prod_i p_i^{h_i}(1-p_i)^{1-h_i} \pi(p_i)

where h_i is 1 if the ith coin toss is head and 0 otherwise.

We don’t want to maximize with respect to all the p_is – and with one coin toss for each we don’t have a lot of information to do that anyway – but we can just integrate them out, and since they are independent that gives us

\mathrm{lhd}(\alpha,\beta) = \prod_i \int p_i^{h_i}(1-p_i)^{1-h_i} \pi(p_i;\alpha,\beta) \,\mathrm{d}p_i

or

\log\mathrm{lhd}(\alpha,\beta) = s \log \int p\pi(p;\alpha,\beta) \,\mathrm{d}p + (n-s)\log \int (1-p)\pi(p;\alpha,\beta) \,\mathrm{d}p

where n is the number of coin tosses and s the number of heads.

With a beta prior for p

\pi(p;\alpha,\beta) = \frac{1}{B(\alpha,\beta)}p^{\alpha-1}(1-p)^{\beta-1}

where B(\alpha,\beta)=\int p^{\alpha-1}(1-p)^{\beta-1} \,\mathrm{d}p we get

\int p\pi(p;\alpha,\beta) \,\mathrm{d}p = B(\alpha+1,\beta)/B(\alpha,\beta)

and

\int (1-p)\pi(p;\alpha,\beta) \,\mathrm{d}p = B(\alpha,\beta+1)/B(\alpha,\beta)

so the log-likelihood is

\log\mathrm{lhd}(\alpha,\beta) = s\log B(\alpha+1,\beta) + (n-s)\log B(\alpha,\beta+1) - n\log B(\alpha,\beta).

You can probably optimize this with respect to \alpha and \beta analytically, but I’m just going to use a numerical optimisation to try it out:

a <- 3 ; b <- 4
n <- 100
s <- sum(rbinom(n,1,rbeta(1,a,b)))
logL <- function(a,b, n,s) {
  s*lbeta(a+1,b) + (n-s)*lbeta(a,b+1) - n*lbeta(a,b)
}
search <- nlm(function (x) -logL(x[1],x[2],n,s), c(3,4))
mle.a <- search$estimate[1]
mle.b <- search$estimate[2]
opar <- par(mfrow=c(1,2))
xs <- seq(1,10,length.out=500)
ys <- sapply(xs, function(x) logL(x,4,n,s))
plot(xs,ys,type='l',xlab=expression(alpha),main='Likelihood',ylab='logL')
abline(v=mle.a)
mtext(mle.a,at=mle.a)
ys <- sapply(xs, function(x) logL(3,x,n,s))
plot(xs,ys,type='l',xlab=expression(beta),main='Likelihood',ylab='logL')
abline(v=mle.b)
mtext(mle.b,at=mle.b)
par(opar)

and we can plot the true p density to compare it with the inferred:

x <- seq(0,1,length.out=100)
true <- dbeta(x,3,4)
inferred <- dbeta(x,mle.a,mle.b)
plot(x,true,type='l',col='red',
     main='True and inferred distribution',
     xlab='', ylab='',
     ylim=range(true,inferred))
lines(x,inferred, col='blue')

You might get different results here; when I experimented with it I didn’t always get results this close to the true values, but I picked a nice result for the plots here…

222-223=-1

Tags: ,

Leave a Reply