This just isn't right
So, I'm working on some genome scale analysis and am a bit worried about numerical precision issues. After all, even relatively stable operations might run into problems after milions of additions and multiplications.
So I figured I could try using interval arithmetic to get a feeling for the uncertainty in the numerical values.
Boost, as so often, is helpful here and has a library for interval arithmetic.
Great, I thought, I'll just chuck in the interval type in my algorithm (that is parameterised with the numerical type anyway so I can use either float, double or long double) and see how it goes.
Well, no luck. All my intervals ends up being a single value, and that just isn't right.
A simple example like this, that that adds 1000 random floats, results in an empty interval:
#include <iostream>
#include <boost/numeric/interval.hpp>
using namespace boost::numeric;
using namespace interval_lib;
#include <ctime>
#include <boost/random.hpp>
int main (int argc, char * const argv[])
{
typedef interval<double> I;
static boost::mt19937 rng(std::time(0));
static boost::uniform_real<> uni_dist(0,1);
static boost::variate_generator<boost::mt19937&, boost::uniform_real<> > random_01(rng, uni_dist);
I x = 0.0;
for (int i = 0; i < 1000; ++i) {
x += random_01();
}
std::cout << '[' << x.lower() << ',' << x.upper()
<< "](" << (x.upper() - x.lower()) << ')'
<< std::endl;
return 0;
}
There is no way there is no precision lost in adding 1000 values!
Clearly I am doing something wrong, but what?
--
100-122=-22
April 10th, 2009 at 4:56 pm
That could be a useful library - especially if it worked :-)
Have you disabled SSE2 instructions ("-mfpmath=387" or something)?
April 10th, 2009 at 6:01 pm
I haven't explicitly enabled or disabled SSE2, but yes I noticed in the documentation that the library doesn't work with SSE2. I will try to see if I can explicitly disable it and see how it works...
Not today or tomorrow, though, I left the code on my iMac and I'm out of town until Sunday :-S
April 10th, 2009 at 8:45 pm
I tried your program on VC C++, and found that that I got single value intervals just like you.
However changing the loop to:
x += random_01()/3.0;
or just:
x += 1/3.0;
leads to the expected result: the interval size increases nicely as i increases.
I think the lowest bits in your random numbers are zeroes. (Actually letting i<10000000 does produce error intervals).
Btw, I got the same results both with and without SSE, so I do not think it matters (even though their docs state it does).
April 10th, 2009 at 8:51 pm
Hmm, why should it matter if the lowest bits are zeros?
They are supposed to be random numbers between 0 and 1, but at this exact moment it strikes me that I never actually tested if they were :-S
Anyway, I don't think that should matter for the numerical accuracy of the final number. The possible values should still get wider and wider...
April 10th, 2009 at 9:14 pm
Hmm, just tried with x += 1.0/3.0; and I still get single point intervals :-(
April 10th, 2009 at 10:26 pm
The interval addition as I understand it, works something like this:
[a,b]+c = [round_down(a+c), round_up(b+c)]
If the lowest bits in the mantissa are zeroes (and the exponent the same), there is no loss (and no rounding) in the addition, and the interval will not be expanded, right?
This is the reason, I choose x+=1/3.0 - which adds rounding errors. If I run the program with x+=1/2.0 there is no accumulated error (because 1/2 can be represented exact as a binary number).
April 10th, 2009 at 10:45 pm
Hmm, I see your point.
Using 1/3 still doesn't have any effect for me, though, but I'll take a look at that tomorrow.
April 11th, 2009 at 7:42 am
On the other hand, even if both numbers have a fine representation, and within the accuracy of the floating point number used, I would still expect the interval to increase when adding a small number to a large number, since this addition is throwing away most of the digits of the small number.
Anyway, since even
gives me a single point interval, it is clear that it isn't working for me. One tenth is not representable in binary for any finite size float.
April 11th, 2009 at 8:46 am
Yes, exactly - if the exponent of the floats differs, you will loose precision when shifting the mantissa. And you do get rounding errors in your original example after running 10 million iterations (when x has grown large relative to the random number).
Btw, your example (1/10.0) gives an interval width of 1.38778e-17 on my PC. (Those Macs...)
April 11th, 2009 at 10:33 am
Hey, setting the flag -mfpmath=387 does work. Now I am getting intervals as well on my Mac.
(I know I said I would try it later and all, but I had some time right now before heading off for the big family Easter lunch... The actual project I am working on will have to wait, though).
Thanks for the hint about the flag, Mikael!
April 12th, 2009 at 7:16 pm
Oh - guess their docs were right after all :-)
Btw, I think you RNG has exactly 32 bit resolution: a double has a mantissa of 53 bits, thus the 21 lowest bits would then be blank. After 10 millions iterations x becomes ~ 2^21, explaining why numerical errors are first introduced at this point.
April 12th, 2009 at 7:24 pm
You are probably right.
Anyway, for the application I actually have in mind, the random numbers are not so important; those were just for testing.
I'll keep you posted on the real problem :)