Designing a hidden Markov model framework
Wednesday, March 11th, 2009Hidden Markov models (HMMs) are very powerful statistical models for sequence analysis, and are widely used in bioinformatics.
The main benefits of HMMs are that they are 1) usually a pretty easy way of modelling whatever underlying biology you are interested in, and 2) provide a very efficient way of computing the likelihood of your model.
Both are relative, of course, but compared to, say, MCMC methods -- who also provide very powerful ways of modelling -- the simplifying Markov assumption needed for HMMs simplify the modelling (at some cost in model fitting, of course) and inference in HMMs is usually many orders of magnitude faster than inference in MCMCs.
Still, even with very efficient inference algorithms, genome scale data can be a problem. Analysing alignments of giga-base long genomes can still take CPU days.
So it makes sense to speed up the HMMs if we can.
An algorithmic approach to speeding up the basic HMM algorithms is not immediately obvious. The basic algorithms simply fill in table entries in O(1) time per cell, so there is little that can be done here.
A more immediate solution is to exploit the parallelisation available on modern hardware, such as single instruction, multiple data (SIMD) operations and multiple cores.
At BiRC we had a Masters student, Asbjørn Brask, who recently wrote a thesis on this: Optimering af generelle skjulte Markovmodeller på multikerne CPUer og GPUer. The focus was on "dense" HMMs, that is, HMMs where the transition probability matrix contains few if any zero probabilities. These are common enough to be interesting, and relatively easy to deal with for SIMD instructions since it is easy to pack the entries in the transition matrix into machine words for parallel calculations.
We (myself and a student programmer, Andreas Sand) are now working on a HMM framework to make these ideas available in an easy-to use library.
This turns out to be a bit more complicated than we had hoped.
Not so much implementing the HMM algorithms. For that we have Asbjørn's code that we can use with a few changes here and there. The problem is that there is a number of different, orthogonal, ways of representing the data and manipulating it in parallel, which if not handled properly leads to an exponential explosion in the number of classes. And it is not quite as orthogonal after all, since data representation affects how we can manipulate the data with SIMD instructions.
Data representation
Take the simple table type:
template <typename floattype,
typename simd_floattype = floattype,
typename Allocator = BasicAllocator<floattype> >
class HMMTable
{ /* ... */ }
We have a type for the entries in the table. This is important for the precision of the floating point operations, but also has an impact on the speedup we get from SIMD instructions (the smaller the type, the more we can pack into a computer word).
Then we have the SIMD type. For the most basic algorithms -- with no SIMD acceleration -- this would just be the same as the basic type, but for SIMD instructions it would be a larger computer word into which we can back several of the basic types for parallel manipulation.
Then there is allocation issues. For SIMD instructions we need the right alignment of types, so we cannot simply use new and delete (where for a solution without SIMD of course we can). So we need some sort of trait class for dealing with allocation and de-allocation of memory.
On top of this, of course, there's issues such as how the basic type is packed into the SIMD type (which is both important for just accessing the table cells and also for the parallel operations on the table). Yet another trait class might be needed here.
There is some orthogonality and some dependencies here.
Any basic type for the table cells should work with any SIMD type (as long as the SIMD type can contain the basic type, but that is not going to be a problem on any sensible hardware), but the allocation/deallocation of course depends on the SIMD type as does the packing of cell values into SIMD words.
HMM Algorithms
There's only a few HMM algorithms needed for most analyses. The Forward and Backward algorithms for maximizing the likelihood and the Viterbi and Posterior Decoding algorithms for analysing the states afterwards.
Still, with SIMD instructions thrown in, there is a lot of problems.
Ideally, we just want one implementation of each algorithm that should work with any type of table (and transition and emission matrices). This doesn't work, however, since the SIMD instructions and how we can use them in speeding up the algorithms depends on the data representation.
So besides parameterising the algorithms with the table and HMM types, we need a bunch of trait classes for doing all the basic operations, and these depends on the SIMD types in the table and the HMM transition and emission matrices.
Template meta programming?
For putting the different data representations together with the algorithms, we are now thinking about some template meta programming approach.
This way we can weave together the different data representations with the optimal algorithms for working on them, and the user needs only provide the basic type of the matrix cells. The latter is best left to the user, who hopefully knows the precision he needs for the final results.
I have some experience with this from previous projects, but I don't think I've ever gotten it "just right", so it will be interesting to see how it turns out for this project...
--
70-87=-17