Uncertainty and randomness, part II
Ok, so in my previous post I considered tossing a (biased) coin and trying to figure out the probability of heads over tails. The probability of tossing heads,
is a property of the coin and fixed, it is just unknown, so there will be some randomness in estimating it due to the randomness in coin tosses we use to estimate.
In the Bayesian analysis I used, I treated the parameter
as stochastic, because that is how that kind of analysis works, but it never really was. In this post I will consider a random
.
I am going to consider an experiment where I am picking coins out of a bag, tossing them and recording whether they are heads or tails, and then throwing away the coin to pick another one. I think that the coins are biased, but I don’t know how. Can I figure that out?
Formalizing this mathematically, I am assuming that I am drawing coins each with a
probability of coming up head, where the
s are beta distributed with (unknown) parameters
and
. I get to toss each coin once to see if it turns up heads or tails, and then I have to throw the coin away, so for coin
I have a Bernoulli distribution with parameter
.
a <- 3 ; b <- 4 n <- 100 outcomes <- rbinom(n,1,rbeta(1,a,b)) s <- sum(outcomes)
The task is to figure out the distribution of
s, which means estimating hyper parameters
and
.
Any ideas?
Doing the same as last time?
In the setup in my last post, we modelled the situation very similarly. We drew a
at random and then tossed coins and got a distribution for
that way. Could that work again?
We could pick a prior for
, say
as last time, calculate the posterior based on the number of heads out of
tosses and see if that tends toward the real distribution of
s.
plot.analysis <- function(prior.a,prior.b ,real.a,real.b, n) {
x <- seq(0,1,length.out=1000)
# real p distribution
real <- dbeta(x,real.a,real.b)
# get prior
prior <- dbeta(x,prior.a,prior.b)
# perform experiment
s <- sum(rbinom(n,1,rbeta(1,real.a,real.b)))
# get posterior
post <- dbeta(x,prior.a+s,prior.b+n-s)
# plot results...
plot(x, post, main='Prior, likelihood and posterior',
type='l', xlab='', ylab='')
lines(x, real, col='green')
lines(x, prior, col='red')
}
opar <- par(mfrow=c(2,2), mar=c(2,2,2,2)+0.1)
plot.analysis(2,2, 3,4, 5)
plot.analysis(2,2, 3,4, 10)
plot.analysis(2,2, 3,4, 50)
plot.analysis(2,2, 3,4, 100)
par(opar)
(Ups, the title for these plots are wrong … there is the real distribution of
s but no likelihood … but I really cannot be bothered to re-do the plot right now…)
Ok, clearly that didn’t work. Why?
For one thing, even if
is stochastic in this setup, there is only one
, but in the setup we are modelling there is one for each coin toss.
By the very nature of the beta distribution, the more heads and tails we connect, the “pointy’er” it will become, so if the real distribution of
s is relatively flat, it will look less and less like it.
A simple maximum likelihood solution
Since all we have to do to estimate the distribution of
s is to estimate the hyper parameters
and
, we could just take the maximum likelihood approach. We just need to work out the likelihood of our model and then maximize it with respect to the two.
If
is the distribution of
s, then the likelihood of
s and heads and tails is

where
is 1 if the ith coin toss is head and 0 otherwise.
We don’t want to maximize with respect to all the
s – and with one coin toss for each we don’t have a lot of information to do that anyway – but we can just integrate them out, and since they are independent that gives us

or

where
is the number of coin tosses and
the number of heads.
With a beta prior for 

where
we get

and

so the log-likelihood is

You can probably optimize this with respect to
and
analytically, but I’m just going to use a numerical optimisation to try it out:
a <- 3 ; b <- 4
n <- 100
s <- sum(rbinom(n,1,rbeta(1,a,b)))
logL <- function(a,b, n,s) {
s*lbeta(a+1,b) + (n-s)*lbeta(a,b+1) - n*lbeta(a,b)
}
search <- nlm(function (x) -logL(x[1],x[2],n,s), c(3,4))
mle.a <- search$estimate[1]
mle.b <- search$estimate[2]
opar <- par(mfrow=c(1,2))
xs <- seq(1,10,length.out=500)
ys <- sapply(xs, function(x) logL(x,4,n,s))
plot(xs,ys,type='l',xlab=expression(alpha),main='Likelihood',ylab='logL')
abline(v=mle.a)
mtext(mle.a,at=mle.a)
ys <- sapply(xs, function(x) logL(3,x,n,s))
plot(xs,ys,type='l',xlab=expression(beta),main='Likelihood',ylab='logL')
abline(v=mle.b)
mtext(mle.b,at=mle.b)
par(opar)

and we can plot the true
density to compare it with the inferred:
x <- seq(0,1,length.out=100)
true <- dbeta(x,3,4)
inferred <- dbeta(x,mle.a,mle.b)
plot(x,true,type='l',col='red',
main='True and inferred distribution',
xlab='', ylab='',
ylim=range(true,inferred))
lines(x,inferred, col='blue')

You might get different results here; when I experimented with it I didn’t always get results this close to the true values, but I picked a nice result for the plots here…
–
222-223=-1
