Mixing problems
No, not mixing a lot of different problems, but problems with mixing. In an MCMC.
So, I’ve implemented this quantitative trait model in HapCluster this week, and I am now testing how it works. The first few tests when pretty well, but now I’m running into problems with some other data sets.
HapCluster is a haplotype based fine mapping tool that essentially has a single parameter of interest,
, that is the position of a trait marker (either a risk marker for a disease or in the new model a marker affecting a quantitative trait). The model has a lot of nuisance parameters as well, used for comparing haplotypes and clustering them locally around
.
Only three of the parameters are continuous real values,
,
and
. These are easy to work with (compared to clustering of haplotypes or inferred ancestral haplotypes and parameters like that), so I use these three to check for convergence and to get an idea about the effective sample size.
A run with HapCluster starts out with two “burn in” runs, used to test convergence and estimate the effective sample size. I do these to runs, and then test if the distribution of the three parameters is the same in the two runs. If they are, I conclude that the chain has converged, otherwise I do a third run, compare it with the second run, and so forth.
After each run, I increase the thinning – that is the number of parameter points I skip between each sample. If the chain is mixing poorly, I need more data points to get an estimate of the true distribution, and I can either get that by making longer runs, or by increasing the thinning to reduce the auto-correlation. It is the latter I use in the burn-in period.
If, after a number of burn in iterations, I conclude that the chain has converged, I estimate the effective sample size (which is smaller than the actual sample size due to the auto-correlation) and then figure out how many samples I need to make to get the desired effective sample size, and then I run a final chain with that many iterations (using the thinning from the last burn in iteration).
All this works pretty well for case/control data, except that the
and
parameters some times are mixing poorly (have a high auto-correlation). Not too bad, though, and usually the increased thinning in a burn in iteration or three solves the problem. In any case, they are nuisance parameters and do not seem to affect
much anyway.
With the quantitative traits model, though, it looks like it is
that has problems with mixing, and worse, it gets stuck far from the trait position in some cases.
In the plot here, the dashed red line is the position of the trait marker and the different colours in the parameter estimates corresponds to different burn in iterations.
I’m not sure exactly how to deal with this. Mixing is always causing me problems when I work with MCMCs…
–
121-136=-15