Calculation of maximum floating point error

1.8k Views Asked by At

pow(x,y) is computed as e^(y*log(x)) Generally math libraries compute log(x) in quad precision (which is time consuming) in order to avoid loss of precision when computing y*log(x) as precision errors will magnify in the computation of e^(y*log(x)). Now, in case I would want to compute pow(x,y) in the following steps.

double pow(double x,double y) {
   return exp(y*log(x)); // Without any quad precision multiplication
}

What would be the maximum ULP error of this function. I do know that IEEE-754 standard says that any floating point operation should have less than 0.5 ULP error i.e 0.5*2^(-52). So if my operation y*log(x) suffers from a 0.5 ULP error, how do I compute the largest possible ULP error for e^(y*log(x))


Agreed that the computation of pow(x,y) is fairly complicated. Algorithms generally compute log(x) in a higher precision and the multiplication between y and log(x) is not straightforward. Since the ULP error depends on y*log(x), the maximum error would be for the largest value of Y*log(x) for which e^(y*log(x)) is not infinity. Right? How do I compute the number of ULP for such a case? What are the maximum number of bits of the mantissa in the double precision format that would vary form the actual value in case of the largest value of y*log(x) ?


Updated the question. Thanks for all the help!

So a 10 bit difference would result in how much ULP error? I calculated it as,

    ULP = (actual - computed)/ 2^(e-(p-1)) 

where e is the exponent of the actual number, p=53 for double precision. I read that I ULP = 2^(e-(p-1)) Let's assume,

    Actual = 1.79282279439444787915898270592 *10^308
    Computed = 1.79282279439451553814547593293 * 10^308 
    error= actual - computed = 6.7659e+294 

Now

1 ULP = 2^(e- (p-1)) 
e = 1023 (exponent of the actual number - bias)
1 ULP = 2^(1023 - (53-1)) = 2^971
ULP error = error/ 1 ULP = 339

Is this correct?

2

There are 2 best solutions below

0
On

I do not have time for a fuller answer now, but here is something to start:

Suppose the error in each individual operation—exp, *, and log—is at most ½ ULP. This means the computed value of log(x) is exactly log(x)•(1+e0) for some e0 satisfying -/2 ≤ e0 ≤ /2, where is ULP of 1 (2−52 IEEE-754 basic 64-bit binary format). And the computed value of y*log(x) is exactly y•log(x)•(1+e0)•(1+e1) for some -/2 ≤ e0 ≤ /2 and some -/2 ≤ e1 ≤ /2. And the computed value of exp(y*log(x)) is exactly ey•log(x)•(1+e0)•(1+e1)•(1+e2) for some -/2 ≤ e0, e1, e2 ≤ /2. And the error is of course ey•log(x)•(1+e0)•(1+e1)•(1+e2) − ey•log(x).

Then you can start analyzing that expression for maximum and minimum values over the possible values of e0, e1, and e2. If 0 < y and 1 < x, the greatest error occurs when e0 = e1 = e2 = /2.

Note that common implementations of exp and log are not usually correctly rounded. Errors of several ULP may be common. So you should not assume a ½ ULP bound unless the routine is documented to be correctly rounded.

2
On

maximum floating point error (for exp(y*log(x)))

x == 0

The error is infinite;

x < 0

OP's pow() fails, yet when y has no fraction part, it should complete.

x > 0

Let z = y*log(x). The error in that calculation may be quite reasonable - just a few ULPs or less. Let us assume 1 ULP or perhaps z_error = z*2-53 given the usual double.

Yet consider its impact: exp(z + error_z) = exp(z)*exp(error_z).

With z about 710, exp(709.78) is about DBL_MAX, so let us consider that as much larger values result in infinity with OP's pow().

exp(some_small_value) is about 1 + some_small_value. exp(error_z) is then 1 + 710*pow(2,-53). Thus the final pow() can lose about log2(710) or 9, 10 plus a few bits of precision.