I have this HMM that lets me estimate speciation times and effective population sizes. Part of the CoalHMM work we are doing at BiRC, but a completely new model that Julien and I just implemented over Easter.
I am using it to analyse the two orangutan subspecies and there is a slight problem that I am checking to see if it affects the results.
In simulations we have seen that the estimates can be biased, but in general if we have enough states in the HMM the bias shrinks (we can vary the states, that corresponds to coalescence times, so the HMM is really a class of HMMs with different number of states).
For my analysis, I have tried running the HMM on chromosome 22 with different numbers of states to see how it affects the parameter estimates:
The estimates are independent estimates for 1Mbp "chunks" along the genome, and clearly there is a consistent difference in the estimates when varying the number of states. You can test this with a paired t-test or just look at plots of the differences in the estimates.
Compared to the variance in the estimates along the chromosome, though, the differences between using different number of states is tiny. Completely insignificant, just not statistical insignificant.
Not sure what to make of that, except that the number of states probably does not have much affect on the final analysis results.