Ok, so the problem I was working on a few days ago, where I needed interval arithmetic, concerns hidden Markov models. We use these for genome analysis because they are pretty fast and thus scale to analysing whole genomes. We still need to split the genomes into segments to parallelise our analysis, but still, it is pretty fast; at least a lot faster than MCMC methods or such.
Anyway, when we split the genomes into segments there is a tradeoff. The larger segments, the more data we have for each analysis job, but of course each job also takes longer to run.
It is not the only trade off, though. There is also a numerical issue: yes, the longer the segments the more observations we have, but the HMM algorithms have to do more arithmetic operations on the data, and numerical errors tend to accumulate with those.
I have no idea about how fast this happens, but I set out to find out, so I implemented the Forward algorithm with a parameterised type, so I could chuck in Boost’s interval type (with either floats, doubles or long doubles as the underlying type).
I use the “scaling” method for dealing with underflows (if you implement the Forward algorithm directly “from the book” you get an underflow from multiplying numbers between 0 and 1 pretty damn fast). That means that in each column in the forward table, I calculate the column sum, call that sum the “scale” for that column, and then normalise all the values in the column so the column sums to 1.
As far as I understand, that approach is more numerical stable than working in log space, where you have to use various tricks to handle sums. It is certainly a lot faster than translating numbers to and from log space all the time.
Anyway, I ran the forward algorithm on a 5 state HMM with an alphabet size of 4, with a random transition and emission matrix. For each column in the table I can then look at the width of the intervals in the interval representation of my numbers. The intervals are the intervals where the true value, had we infinite size floats to work with, would be. The wider the intervals, the less certain we are about the true type if we just used plain floating point arithmetic.
First, I looked at the “scale” values. The results are show below (notice that the y-axis is in log scale):

Initially the intervals are small, but then they just explode. Not good for trusting the results; if the interval of possible values ranges over orders of magnitude, the true number can be just about anything! Depending on the underlying floating point type, the intervals explode early or late, but even for long doubles the explosion happens before 100 columns. So the numerical accuracy can only be trusted for sequences shorter than 100!
Damn!
I also looked at the smallest and largest interval in each column. The plot is below (shortest intervals in green, largest in red):

We see the explosion again, so it is pretty hard to distinguish between the smallest and largest intervals, so I have also plotted the interval widths for the prefixes before the explosion:

Before the explosion, it seems the nummerical error accumulates linearly (as we probably would expect). There doesn’t seem to be much difference between the widest and the smallest intervals, though.
I expected something like the last plot for the entire range, I must admit. Of course, I know that the accuracy is only good up to a point, because at some point you start adding very large numbers to very small (between 0 and 1) numbers, so after that point you just stick to the large number, but still…
I didn’t expect the explosion in inaccuracy that early. Dealing with sequences of length less than 100 is just not feasible, and certainly not what anyone is doing.
Of course, the intervals here are worst case behaviour. They are inclusive, so they capture all possible values that the true value could be in. They say nothing about how close the values we get from working with finite size floating point values are to the true real value numbers. That is probably what saves us in practise.
Still, I think I need to look into this in more details. Who knows, there might even be a paper in this study somewhere…
–
102-127=-25