HMMoC and HMMConverter

I just want to say a few words about a short paper I read last week, and a paper that is a few years old now but related to it.

The first is out in advanced access in Nucleic Acids Research:

HMMConverter 1.0: a toolbox for hidden Markov models

Lam and Meyer

Hidden Markov models (HMMs) and their variants are widely used in Bioinformatics applications that analyze and compare biological sequences. Designing a novel application requires the insight of a human expert to define the model’s architecture. The implementation of prediction algorithms and algorithms to train the model’s parameters, however, can be a time-consuming and error-prone task. We here present HMMCONVERTER, a software package for setting up probabilistic HMMs, pair-HMMs as well as generalized HMMsand pair-HMMs. The user defines the model itself and the algorithms to be used via an XML file which is then directly translated into efficient C++ code. The software package provides linear-memory prediction algorithms, such as the Hirschberg algorithm, banding and the integration of prior probabilities and is the first to present computationally efficient linear-memory algorithms for automatic parameter training. Users of HMMCONVERTER canthus set up complex applications with a minimum of effort and also perform parameter training and data analyses for large data sets.

the other was published in Bioinformatics in 2007:

HMMoC – a compiler for hidden Markov models

Lunter

Hidden Markov models are widely applied within computational biology. The large data sets and complex models involved demand optimized implementations, while efficient exploration of model space requires rapid prototyping. These requirements are not met by existing solutions, and hand-coding is time-consuming and error-prone. Here, I present a compiler that takes over the mechanical process of implementing HMM algorithms, by translating high-level XML descriptions into efficient C++ implementations. The compiler is highly customizable, produces efficient and bug-free code, and includes several optimizations.

Both papers describe compilers that generate C++ implementations of hidden Markov model algorithms from XML specifications, and really they are very similar.

The basic HMM algorithms are quite straightforward to implement, but if you want more complex models such as pair-HMMs or generalized HMMs there is a tad more complications to deal with, and if you need to optimize the algorithms in either runtime or memory usage there are some more complex algorithms you can use such as “banding” – implemented in both HMMoC and HMMConverter – that risk giving sub-optimal results but at a much reduced running time and memory consumption, or the Hirschberg algorithm – only implemented in HMMConverter as far as I can see – that exchanges a doubling in running time for a much reduced memory consumption.

Implementing such extra algorithms is not conceptually hard, but can be quite tedious and error prone, so it makes good sense to have code generators building the algorithms for you.  That is exactly what these tools do.

At a bird’s eye view, the tools are very similar.  You specify the HMM in an XML file (a specification language that I personally don’t like that much, but that is of course very subjective) and the tools then generate the algorithms you ask them to, output as C++ code.

HMMoC provides a number of handles for you to add your own C++ code to the generated code; I am not sure if HMMConverter does the same, but on the other hand HMMConverter provides handles for various constraints on the parameters so it might be easier to re-parameterize models made with that.

Another cool feature unique to HMMConverter is priors on sequence annotation.  You can provide an annotation to the input sequence(s) that is then incorporated in the emission probabilities.  The prior is really on hidden states, but incorporating them into the emission probabilities has exactly the effect you want from them: they weight the posterior probabilities of the hidden states along the input.

To deal with numerical issues, HMMConverter works in log-space while HMMoC uses something called “extended-exponent real numbers”.  Working in log-space can be really slow for the Forward and Backward algorithms, since you have to switch in and out of log-space to deal with sums of probabilities (the Viterbi algorithm doesn’t have this problem, so there the log-space solution is pretty fast).

Unfortunately, there isn’t any comparison between the execution times of algorithms generated with the two tools in the new paper, so I don’t know how much this matters.  In the HMM library I am developing with Andreas we found that the log-solution was very slow, though, and therefore we use a re-scaling approach instead.

I would love to see a comparison of the runtime efficiency between the approaches, but just not quite enough to go and do it myself right now…

  • Lam, T., & Meyer, I. (2009). HMMCONVERTER 1.0: a toolbox for hidden Markov models Nucleic Acids Research DOI: 10.1093/nar/gkp662
  • Lunter, G. (2007). HMMoC a compiler for hidden Markov models Bioinformatics, 23 (18), 2485-2487 DOI: 10.1093/bioinformatics/btm350

261-289=-28

Tags: ,

Leave a Reply