[Mingw-w64-public] Math library discrepancies that surprised me.

Hello Arthur!

(I've take the liberty of copying this to the mingw list, as well.)

Post by Arthur Norman
The following short program compares the value of pow(x, 0.5) to that
returned by sqrt(x) for a range of values of x, and prints out any first
case where the two do not agree.

Well, this is unfortunate. Ideally they should, and as a
quality-of-implementation
issue, I think we should be able to expect them to agree.

Post by Arthur Norman
With the mingw family of compilers I had
expected the Microsoft C library to be in use.
What I appear to see is
linux (ubuntu 11.04 64-bit tested) no output
Mac (snow leopard) no output
gcc (cygwin) no output
gcc-3 -mno-cygwin )
i686-w64-mingw32-gcc ) all different from each other
x86_64-w64-mingw32-gcc ) but all find discrepancies.

Very interesting.

Post by Arthur Norman
It is a totally fair cop if the Microsoft C library gives different
results in the last bit for floating point elementary functions to gnu
libraries,

Well, from the mingw perspective, if the microsoft run-time library is
flawed in this way, then it's a legitimate cop for mingw not to fix it.

By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

Post by Arthur Norman
but is there an obvious reason why the three mingw varients
all give different answers.

If I understand your point correctly, if the problem is with msvcrt,
then the three mingw variants should all give the same flawed results.

Post by Arthur Norman
This showed up in regression testing something
much larger across platforms.
Arthur
...

As I understand it, IEEE-754 compliance requires sqrt to be calculated
"correctly", i.e., to return the floating-point number closest to the exact
mathematical value. (The c++ standard, I believe, does not, however,
require IEEE-754 compliance.)

I think, however, the IEEE-754 does not require all mathematical functions,
and, in particular, pow, to be "correct," in the above sense sense.

It would be interesting to print out in binary (or hex) the results for a few of
the values that lead to discrepancies with both mingw and a linux-based
gcc, and see whether the problem is in sqrt (for which IEEE-754 specifies
the result) or in pow (for which I think IEEE-754 permits some leeway).

In my opinion:

mingw / msvcrt should be IEEE-754 compliant. Of course, if msvcrt isn't,
then it's not mingw's fault.

sqrt (x) and pow (x, 0.5) ought to give the same result (even if not required
to by IEEE-754).

Browsing some gcc lists, I did see some comments that suggest that
gcc (on linux) does try to transform pow (x, 0.5) to sqrt (x). This would
make sqrt and pow consistent (whether or not technically correct). Maybe
gcc / linux pow and msvcrt pow are both not quite exact, but this is hidden
in your test, because gcc / linux replaces pow (x, 0.5) with sqrt (x), and
msvcrt doesn't. (Just speculation.)

It is odd, however, that the various mingw's don't agree, and this speaks
to a possible bug in mingw.

Thanks.

K. Frank

Post by Arthur Norman
...
====================
#include <stdio.h>
#include <math.h>
int main(int argc, char *argv[])
{
double a, b, c;
int i;
for (i=0, a=1.0; i<10000000; i++)
{ a *= 1.00000001;
b = sqrt(a); c = pow(a, 0.5);
if (b == c) continue;
printf("%.17g %.17g %.17g\n", a, b, c);
return 0;
}
return 0;
}
====================
$ gcc expt.c -o expt
$ ./expt
$ gcc-3 -mno-cygwin expt.c -o expt
$ ./expt
1.0241210164238141 1.011988644414459 1.0119886444144588
$ i686-w64-mingw32-gcc expt.c -o expt
$ ./expt
1.0004603659307831 1.000230156479389 1.0002301564793892
$ x86_64-w64-mingw32-gcc expt.c -o expt
$ ./expt
1.0000179101601867 1.0000089550399969 1.0000089550399971
==============================================================

James K Beard

2011-04-29 21:57:49 UTC

The standard process for computation of sqrt(x) has been the use of Newton's
method. This means that an approximation to SQRT(x) is found by whatever
means and the iteration
result = (result_old^2+x)/(2*result_old)
Repeated until result = result_old. A check isn't necessary because a
maximum iteration count can be obtained from the worst-case accuracy of the
initial approximation. Thus there isn't any reason that the librarys should
vary in their results.

A bit or so may result from the non-algebraically-reduced form of the
iteration,
result = result_old + (x - result_old^2)/(2*result_old)
which some of little faith use because the second term can be tested for
zero - or near zero. Don't go there. In any case this non-reduced form is
less accurate.

All this works for complex numbers, too. Watch out for the sign convention
for complex numbers, though.

James K Beard

-----Original Message-----
From: K. Frank [mailto:***@gmail.com]
Sent: Friday, April 29, 2011 3:56 PM
To: mingw64; MinGW Users List
Subject: Re: [Mingw-w64-public] Math library discrepancies that surprised
me.

Hello Arthur!

(I've take the liberty of copying this to the mingw list, as well.)

Post by Arthur Norman
The following short program compares the value of pow(x, 0.5) to that
returned by sqrt(x) for a range of values of x, and prints out any first
case where the two do not agree.

Well, this is unfortunate. Ideally they should, and as a
quality-of-implementation
issue, I think we should be able to expect them to agree.

Very interesting.

Post by Arthur Norman
It is a totally fair cop if the Microsoft C library gives different
results in the last bit for floating point elementary functions to gnu
libraries,

Post by Arthur Norman
but is there an obvious reason why the three mingw varients
all give different answers.

If I understand your point correctly, if the problem is with msvcrt,
then the three mingw variants should all give the same flawed results.

Post by Arthur Norman
This showed up in regression testing something
much larger across platforms.
Arthur
...

As I understand it, IEEE-754 compliance requires sqrt to be calculated
"correctly", i.e., to return the floating-point number closest to the exact
mathematical value. (The c++ standard, I believe, does not, however,
require IEEE-754 compliance.)

I think, however, the IEEE-754 does not require all mathematical functions,
and, in particular, pow, to be "correct," in the above sense sense.

It would be interesting to print out in binary (or hex) the results for a
few of
the values that lead to discrepancies with both mingw and a linux-based
gcc, and see whether the problem is in sqrt (for which IEEE-754 specifies
the result) or in pow (for which I think IEEE-754 permits some leeway).

In my opinion:

mingw / msvcrt should be IEEE-754 compliant. Of course, if msvcrt isn't,
then it's not mingw's fault.

sqrt (x) and pow (x, 0.5) ought to give the same result (even if not
required
to by IEEE-754).

Browsing some gcc lists, I did see some comments that suggest that
gcc (on linux) does try to transform pow (x, 0.5) to sqrt (x). This would
make sqrt and pow consistent (whether or not technically correct). Maybe
gcc / linux pow and msvcrt pow are both not quite exact, but this is hidden
in your test, because gcc / linux replaces pow (x, 0.5) with sqrt (x), and
msvcrt doesn't. (Just speculation.)

It is odd, however, that the various mingw's don't agree, and this speaks
to a possible bug in mingw.

Thanks.

K. Frank

----------------------------------------------------------------------------
--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd

K. Frank

2011-04-30 00:51:00 UTC

Hi Greg!

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/mingw/mingwex/math/?cvsroot=src
Looks to me like sqrt(double) calls into msvcrt. But pow() is a
different story. Danny implemented it in libmingwex because of
some quality-of-implementation problem with msvcrt; then people
complained that the libmingwex version was slower, and I can't
remember where it wound up, but it's all in the archives.

Thanks for that -- good to know.

James K Beard

2011-04-30 01:37:59 UTC

The standard process for computation of sqrt(x) has been the use of Newton's
method. This means that an approximation to SQRT(x) is found by whatever
means (but important to get good accuracy) and the iteration
result = (result_old^2+x)/(2*result_old) Repeated until result =
result_old. A check for equality isn't necessary because a maximum
iteration count can be obtained from the worst-case accuracy of the initial
approximation. Thus there isn't any reason that the libraries should vary
in their results.

A bit or so may result from the non-algebraically-reduced form of the
iteration,
result = result_old + (x - result_old^2)/(2*result_old) which some of
little faith use because the second term can be tested for zero - or near
zero. Don't go there. This form is less accurate than the shorter form.

All this works for complex numbers, too. Watch out for the sign convention
for complex numbers, though.

Regarding the log and exponentiation algorithm approach, x^a =
exp(a*log(x)), you have rounding error in two operations, as well as
inherent accuracy limits, meaning that if you don't have a few guard bits
you aren't going to get full mantissa accuracy. The Newton's method will
get you within a count or two on the LSB, and with any guard bits at all
will get you full mantissa accuracy every time. Why? K
digits/bits/whatever in X define about 2K digits/bits/whatever in sqrt(x),
while this isn't true for log(*) and exp(*).

James K Beard

-----Original Message-----
From: K. Frank [mailto:***@gmail.com]
Sent: Friday, April 29, 2011 8:51 PM
To: mingw64; MinGW Users List
Subject: Re: [Mingw-w64-public] [Mingw-users] Math library discrepancies
that surprised me.

Hi Greg!

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/mingw/mingwex/math/?cvsroot=
src

Looks to me like sqrt(double) calls into msvcrt. But pow() is a
different story. Danny implemented it in libmingwex because of
some quality-of-implementation problem with msvcrt; then people
complained that the libmingwex version was slower, and I can't
remember where it wound up, but it's all in the archives.

Thanks for that -- good to know.

Greg Chicares

2011-04-29 23:38:02 UTC

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/mingw/mingwex/math/?cvsroot=src

Looks to me like sqrt(double) calls into msvcrt. But pow() is a
different story. Danny implemented it in libmingwex because of
some quality-of-implementation problem with msvcrt; then people
complained that the libmingwex version was slower, and I can't
remember where it wound up, but it's all in the archives.

Post by K. Frank
sqrt (x) and pow (x, 0.5) ought to give the same result (even if not required
to by IEEE-754).

If they should give exactly the same result always, then the
first thing that needs to be fixed is any standard that allows
them to differ. The language standards are so much looser than
the numerical standard, even now, that I'd rather see C and C++
catch up to IEEE-754-1985 before asking them to go beyond.

Post by K. Frank
Browsing some gcc lists, I did see some comments that suggest that
gcc (on linux) does try to transform pow (x, 0.5) to sqrt (x). This would
make sqrt and pow consistent (whether or not technically correct).

Kai Tietz

2011-04-30 08:42:09 UTC

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

Post by K. Frank
sqrt (x) and pow (x, 0.5) ought to give the same result (even if not required
to by IEEE-754).

Well, I spent some efforts into log, exp, and pow implementation. Also
some other base-math functions I improved in mingw-w64's math library
in two terms. a) Make those functions behave as ISO-C99 specification
tells in corner-cases. and b) Improve accuracy in terms of gcc's
internal used gmp. As we noticed here result differences.

The following example illustrates the issues pretty well:

#include <stdio.h>
#include <math.h>

double abc[3][2] = { { 2.2, 3.1 }, { 2.0, 3.0 }, { 4.0, 0.5 } };
long double ab[3][2] = { { 2.2L, 3.1L }, { 2.0L, 3.0L }, { 4.0L, 0.5L } };
int main()
{
double r[3];
long double rl[3];
int i;
for (i = 0; i < 3; i++)
__mingw_printf ("%.20F^%.20F = %.20F\n", abc[i][0], abc[i][1],
(r[i] = pow (abc[i][0], abc[i][1])));
__mingw_printf ("%.20F %.20F %.20F\n", pow (2.2, 3.1), pow (2.0,
3.0), pow (4.0, 0.5));
r[0] -= pow (2.2, 3.1);
r[1] -= pow (2.0, 3.0);
r[2] -= pow (4.0, 0.5);
__mingw_printf ("%.20F %.20F %.20F\n", r[0], r[1], r[2]);

for (i = 0; i < 3; i++)
__mingw_printf ("%.20LF^%.20LF = %.20LF\n", ab[i][0], ab[i][1],
(rl[i] = powl (ab[i][0], ab[i][1])));
__mingw_printf ("%.20LF %.20LF %.20LF\n",
powl (2.2L, 3.1L), powl (2.0L, 3.0L) , powl
(4.0L, 0.5L));
rl[0] -= powl (2.2L, 3.1L);
rl[1] -= powl (2.0L, 3.0L);
rl[2] -= powl (4.0L, 0.5L);
__mingw_printf ("%.20LF %.20LF %.20LF\n", rl[0], rl[1], rl[2]);
return 1;
}

The interesting issue is here that gcc uses for constant
math-calculations gmp-library to get result. For more complex
variants, it uses crt's math. By this it is important for a gcc based
runtime to be in the IEEE 754 floating point compatible to gmp's
results.

Btw new pow, log, and exp variant are slightly faster then variant in
msvcrt, but this is more or less just a side-effect.

I admit that this might be not that obvious to users, why results here
are different, but for a gcc based toolchain we need to play nice with
gcc's internal assumptions.

Regards,
Kai

Kai Tietz

2011-04-30 09:51:32 UTC

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

Post by K. Frank
sqrt (x) and pow (x, 0.5) ought to give the same result (even if not required
to by IEEE-754).

Well, I spent some efforts into log, exp, and pow implementation. Also
some other base-math functions I improved in mingw-w64's math library
in two terms. a) Make those functions behave as ISO-C99 specification
tells in corner-cases. and b) Improve accuracy in terms of gcc's
internal used gmp. As we noticed here result differences.
#include <stdio.h>
#include <math.h>
double abc[3][2] = { { 2.2, 3.1 }, { 2.0, 3.0 }, { 4.0, 0.5 } };
long double ab[3][2] = { { 2.2L, 3.1L }, { 2.0L, 3.0L }, { 4.0L, 0.5L } };
int main()
{
double r[3];
long double rl[3];
int i;
for (i = 0; i < 3; i++)
__mingw_printf ("%.20F^%.20F = %.20F\n", abc[i][0], abc[i][1],
(r[i] = pow (abc[i][0], abc[i][1])));
__mingw_printf ("%.20F %.20F %.20F\n", pow (2.2, 3.1), pow (2.0,
3.0), pow (4.0, 0.5));
r[0] -= pow (2.2, 3.1);
r[1] -= pow (2.0, 3.0);
r[2] -= pow (4.0, 0.5);
__mingw_printf ("%.20F %.20F %.20F\n", r[0], r[1], r[2]);
for (i = 0; i < 3; i++)
__mingw_printf ("%.20LF^%.20LF = %.20LF\n", ab[i][0], ab[i][1],
(rl[i] = powl (ab[i][0], ab[i][1])));
__mingw_printf ("%.20LF %.20LF %.20LF\n",
powl (2.2L, 3.1L), powl (2.0L, 3.0L) , powl
(4.0L, 0.5L));
rl[0] -= powl (2.2L, 3.1L);
rl[1] -= powl (2.0L, 3.0L);
rl[2] -= powl (4.0L, 0.5L);
__mingw_printf ("%.20LF %.20LF %.20LF\n", rl[0], rl[1], rl[2]);
return 1;
}
The interesting issue is here that gcc uses for constant
math-calculations gmp-library to get result. For more complex
variants, it uses crt's math. By this it is important for a gcc based
runtime to be in the IEEE 754 floating point compatible to gmp's
results.
Btw new pow, log, and exp variant are slightly faster then variant in
msvcrt, but this is more or less just a side-effect.
I admit that this might be not that obvious to users, why results here
are different, but for a gcc based toolchain we need to play nice with
gcc's internal assumptions.
Regards,
Kai

On closer look for long double, we can have rouding issues for long
double precission. I see that powl (0.01L, 0.5L) we have a rounding
difference to sqrtl (0.01L). Interestingly not for float or double
types. So we we might need to special-case in pow the case for y ==
0.5 to solve this.

Kai

K. Frank

2011-04-30 15:15:11 UTC

Hello Kai!

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

Post by K. Frank
sqrt (x) and pow (x, 0.5) ought to give the same result (even if not required
to by IEEE-754).

...
Not so many years ago (perhaps when msvcrt was written), it was
thought that correctly-rounded transcendental functions weren't
feasible in practice, so library authors had a looser attitude
toward exactness. If you have an ideal implementation of sqrt()
but a pow() that's only approximate, should you special-case
the power of exactly 0.5? If you do, then pow() and sqrt() will
be consistent, but then the powers
x^(0.5*(1-epsilon))
x^(0.5 )
x^(0.5*(1+epsilon))
might not be monotone. Some would call that a bad tradeoff.

Do I understand your point correctly that you are comparing gcc's
compile-time and run-time values for pow?

For the record, when I run this on a mingw64 build:

gcc (GCC) 4.5.2 20101002 (prerelease) [svn/rev.164902 - mingw-w64/oz]

I get:

C:\>gcc -o pow_test pow_test.c

C:\>pow_test
2.20000000000000017764^3.10000000000000008882 = 11.52153412678571875460
2.00000000000000000000^3.00000000000000000000 = 8.00000000000000000000
4.00000000000000000000^0.50000000000000000000 = 2.00000000000000000000
11.52153412678571697825 8.00000000000000000000 2.00000000000000000000
0.00000000000000177636 0.00000000000000000000 0.00000000000000000000
2.20000000000000000004^3.09999999999999999991 = 11.52153412678571414718
2.00000000000000000000^3.00000000000000000000 = 8.00000000000000000000
4.00000000000000000000^0.50000000000000000000 = 2.00000000000000000000
11.52153412678571414718 8.00000000000000000000 2.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000

As I understand it, this means I am seeing a small difference between
compile and run-time evaluation only for powl (2.2, 3.1), i.e., the
long-double case.

Post by Kai Tietz
The interesting issue is here that gcc uses for constant
math-calculations gmp-library to get result. For more complex
variants, it uses crt's math. By this it is important for a gcc based
runtime to be in the IEEE 754 floating point compatible to gmp's
results.

Yes, that's a good point. I hadn't considered the issue of compile-time
evaluation, and, ideally, it should agree with run-time evaluation, adding
yet another constraint to our wish list.

Furthermore, if we take the idea of a cross-compiler seriously, we
would also like the compile-time evaluations of a cross-compiler and
native compiler to agree.

Hmm...

So, we would like sqrt (x) and pow (x, 0.5) to agree. We would like
compile-time and run-time evaluations to agree. We would like
cross-compilers and native compilers to agree.

Taken at face value, this would seem to preclude using the platform
library and platform hardware to compute things like sqrt and pow
(unless we know they comply with some agreed-upon standard,
such as IEEE-754).

Just to make the reasoning explicit, if I perform my run-time evaluation
on platform A, e.g., using platform A's FPU, then a B-->A cross-compiler
running on platform B won't have platform A's FPU available to perform
consistent compile-time evaluations.

So you either give up some of the above-wished-for consistency, you
forgo the potential performance benefits of using the platform's library
and / or hardware, or you get everybody to use a common standard.

Post by Kai Tietz
...
I admit that this might be not that obvious to users, why results here
are different, but for a gcc based toolchain we need to play nice with
gcc's internal assumptions.
Regards,
Kai

Be, however, cognizant of Greg's concern. Forcing pow to agree with
sqrt just for y = 0.5 has the potential to make pow, in a sense, internally
inconsistent, It's a tricky trade-off. (Ideally, both pow and sqrt would
be "exact" and therefore agree with one another automatically.)

By the way, I also see a problem with run-time powl for powl (2.2L, 0.5L).
It disagrees slightly from sqrtl (2.2L) and from the compile-time value
for powl (2.2L, 0.5L).

I modified your test program to compare sqrt with pow (x, 0.5) (and sqrtl
with powl). For the record:

C:\>gcc --version
gcc (GCC) 4.5.2 20101002 (prerelease) [svn/rev.164902 - mingw-w64/oz]

C:\>gcc -o sqrt_pow sqrt_pow.c

C:\>sqrt_pow
1.04880884817015163080, 1.04880884817015163080, 0.00000000000000000000
1.48323969741913264109, 1.48323969741913264109, 0.00000000000000000000
1.81659021245849494619, 1.81659021245849494619, 0.00000000000000000000
1.04880884817015163080 1.04880884817015163080
1.48323969741913264109 1.48323969741913264109
1.81659021245849494619 1.81659021245849494619
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
1.04880884817015154699, 1.04880884817015154699, 0.00000000000000000000
1.48323969741913258980, 1.48323969741913258970, -0.00000000000000000011
1.81659021245849499921, 1.81659021245849499921, 0.00000000000000000000
1.04880884817015154699 1.04880884817015154699
1.48323969741913258980 1.48323969741913258980
1.81659021245849499921 1.81659021245849499921
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 -0.00000000000000000011 0.00000000000000000000

where sqrt_pow.c is:

#include <stdio.h>
#include <math.h>

double abc[3] = { 1.1, 2.2, 3.3 };
long double abcl[3] = { 1.1L, 2.2L, 3.3L };
int main()
{
double r1[3], r2[3], r3[3];
long double rl1[3], rl2[3], rl3[3];
int i;
for (i = 0; i < 3; i++)
__mingw_printf ("%.20F, %.20F, %.20F\n",
(r1[i] = sqrt (abc[i])),
(r2[i] = pow (abc[i], 0.5)),
(r3[i] = pow (abc[i], 0.5) - sqrt (abc[i])));
__mingw_printf ("%.20F %.20F\n", sqrt (1.1), pow (1.1, 0.5));
__mingw_printf ("%.20F %.20F\n", sqrt (2.2), pow (2.2, 0.5));
__mingw_printf ("%.20F %.20F\n", sqrt (3.3), pow (3.3, 0.5));
r1[0] -= sqrt (1.1);
r1[1] -= sqrt (2.2);
r1[2] -= sqrt (3.3);
__mingw_printf ("%.20F %.20F %.20F\n", r1[0], r1[1], r1[2]);
r2[0] -= pow (1.1, 0.5);
r2[1] -= pow (2.2, 0.5);
r2[2] -= pow (3.3, 0.5);
__mingw_printf ("%.20F %.20F %.20F\n", r2[0], r2[1], r2[2]);

for (i = 0; i < 3; i++)
__mingw_printf ("%.20LF, %.20LF, %.20LF\n",
(rl1[i] = sqrtl (abcl[i])),
(rl2[i] = powl (abcl[i], 0.5L)),
(rl3[i] = powl (abcl[i], 0.5L) - sqrtl (abcl[i])));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (1.1L), powl (1.1L, 0.5L));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (2.2L), powl (2.2L, 0.5L));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (3.3L), powl (3.3L, 0.5L));
rl1[0] -= sqrtl (1.1L);
rl1[1] -= sqrtl (2.2L);
rl1[2] -= sqrtl (3.3L);
__mingw_printf ("%.20LF %.20LF %.20LF\n", rl1[0], rl1[1], rl1[2]);
rl2[0] -= powl (1.1L, 0.5L);
rl2[1] -= powl (2.2L, 0.5L);
rl2[2] -= powl (3.3L, 0.5L);
__mingw_printf ("%.20LF %.20LF %.20LF\n", rl2[0], rl2[1], rl2[2]);

return 1;
}

Post by Kai Tietz
Kai

Thanks, Kai, for your insight and explanations.

K. Frank

Kai Tietz

2011-04-30 15:33:52 UTC

Post by K. Frank
Hello Kai!

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

Post by K. Frank
sqrt (x) and pow (x, 0.5) ought to give the same result (even if not required
to by IEEE-754).

Do I understand your point correctly that you are comparing gcc's
compile-time and run-time values for pow?
gcc (GCC) 4.5.2 20101002 (prerelease) [svn/rev.164902 - mingw-w64/oz]
C:\>gcc -o pow_test pow_test.c
C:\>pow_test
2.20000000000000017764^3.10000000000000008882 = 11.52153412678571875460
2.00000000000000000000^3.00000000000000000000 = 8.00000000000000000000
4.00000000000000000000^0.50000000000000000000 = 2.00000000000000000000
11.52153412678571697825 8.00000000000000000000 2.00000000000000000000
0.00000000000000177636 0.00000000000000000000 0.00000000000000000000
2.20000000000000000004^3.09999999999999999991 = 11.52153412678571414718
2.00000000000000000000^3.00000000000000000000 = 8.00000000000000000000
4.00000000000000000000^0.50000000000000000000 = 2.00000000000000000000
11.52153412678571414718 8.00000000000000000000 2.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
As I understand it, this means I am seeing a small difference between
compile and run-time evaluation only for powl (2.2, 3.1), i.e., the
long-double case.

Yes, that's a good point. I hadn't considered the issue of compile-time
evaluation, and, ideally, it should agree with run-time evaluation, adding
yet another constraint to our wish list.
Furthermore, if we take the idea of a cross-compiler seriously, we
would also like the compile-time evaluations of a cross-compiler and
native compiler to agree.
Hmm...
So, we would like sqrt (x) and pow (x, 0.5) to agree. We would like
compile-time and run-time evaluations to agree. We would like
cross-compilers and native compilers to agree.
Taken at face value, this would seem to preclude using the platform
library and platform hardware to compute things like sqrt and pow
(unless we know they comply with some agreed-upon standard,
such as IEEE-754).
Just to make the reasoning explicit, if I perform my run-time evaluation
on platform A, e.g., using platform A's FPU, then a B-->A cross-compiler
running on platform B won't have platform A's FPU available to perform
consistent compile-time evaluations.
So you either give up some of the above-wished-for consistency, you
forgo the potential performance benefits of using the platform's library
and / or hardware, or you get everybody to use a common standard.

Be, however, cognizant of Greg's concern. Forcing pow to agree with
sqrt just for y = 0.5 has the potential to make pow, in a sense, internally
inconsistent, It's a tricky trade-off. (Ideally, both pow and sqrt would
be "exact" and therefore agree with one another automatically.)
By the way, I also see a problem with run-time powl for powl (2.2L, 0.5L).
It disagrees slightly from sqrtl (2.2L) and from the compile-time value
for powl (2.2L, 0.5L).
I modified your test program to compare sqrt with pow (x, 0.5) (and sqrtl
C:\>gcc --version
gcc (GCC) 4.5.2 20101002 (prerelease) [svn/rev.164902 - mingw-w64/oz]
C:\>gcc -o sqrt_pow sqrt_pow.c
C:\>sqrt_pow
1.04880884817015163080, 1.04880884817015163080, 0.00000000000000000000
1.48323969741913264109, 1.48323969741913264109, 0.00000000000000000000
1.81659021245849494619, 1.81659021245849494619, 0.00000000000000000000
1.04880884817015163080 1.04880884817015163080
1.48323969741913264109 1.48323969741913264109
1.81659021245849494619 1.81659021245849494619
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
1.04880884817015154699, 1.04880884817015154699, 0.00000000000000000000
1.48323969741913258980, 1.48323969741913258970, -0.00000000000000000011
1.81659021245849499921, 1.81659021245849499921, 0.00000000000000000000
1.04880884817015154699 1.04880884817015154699
1.48323969741913258980 1.48323969741913258980
1.81659021245849499921 1.81659021245849499921
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 -0.00000000000000000011 0.00000000000000000000
#include <stdio.h>
#include <math.h>
double abc[3] = { 1.1, 2.2, 3.3 };
long double abcl[3] = { 1.1L, 2.2L, 3.3L };
int main()
{
double r1[3], r2[3], r3[3];
long double rl1[3], rl2[3], rl3[3];
int i;
for (i = 0; i < 3; i++)
__mingw_printf ("%.20F, %.20F, %.20F\n",
(r1[i] = sqrt (abc[i])),
(r2[i] = pow (abc[i], 0.5)),
(r3[i] = pow (abc[i], 0.5) - sqrt (abc[i])));
__mingw_printf ("%.20F %.20F\n", sqrt (1.1), pow (1.1, 0.5));
__mingw_printf ("%.20F %.20F\n", sqrt (2.2), pow (2.2, 0.5));
__mingw_printf ("%.20F %.20F\n", sqrt (3.3), pow (3.3, 0.5));
r1[0] -= sqrt (1.1);
r1[1] -= sqrt (2.2);
r1[2] -= sqrt (3.3);
__mingw_printf ("%.20F %.20F %.20F\n", r1[0], r1[1], r1[2]);
r2[0] -= pow (1.1, 0.5);
r2[1] -= pow (2.2, 0.5);
r2[2] -= pow (3.3, 0.5);
__mingw_printf ("%.20F %.20F %.20F\n", r2[0], r2[1], r2[2]);
for (i = 0; i < 3; i++)
__mingw_printf ("%.20LF, %.20LF, %.20LF\n",
(rl1[i] = sqrtl (abcl[i])),
(rl2[i] = powl (abcl[i], 0.5L)),
(rl3[i] = powl (abcl[i], 0.5L) - sqrtl (abcl[i])));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (1.1L), powl (1.1L, 0.5L));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (2.2L), powl (2.2L, 0.5L));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (3.3L), powl (3.3L, 0.5L));
rl1[0] -= sqrtl (1.1L);
rl1[1] -= sqrtl (2.2L);
rl1[2] -= sqrtl (3.3L);
__mingw_printf ("%.20LF %.20LF %.20LF\n", rl1[0], rl1[1], rl1[2]);
rl2[0] -= powl (1.1L, 0.5L);
rl2[1] -= powl (2.2L, 0.5L);
rl2[2] -= powl (3.3L, 0.5L);
__mingw_printf ("%.20LF %.20LF %.20LF\n", rl2[0], rl2[1], rl2[2]);
return 1;
}

Post by Kai Tietz
Kai

Thanks, Kai, for your insight and explanations.
K. Frank

I add a special case to mingw-w64's pow(f|l) function about y=0.5 and
get now for your adjusted testcase the following results, which are
looking much better IMHO.

$ ./tst.exe
1.04880884817015163080, 1.04880884817015163080, 0.00000000000000004142
1.48323969741913264109, 1.48323969741913264109, -0.00000000000000000857
1.81659021245849494619, 1.81659021245849494619, -0.00000000000000000412
1.04880884817015163080 1.04880884817015163080
1.48323969741913264109 1.48323969741913264109
1.81659021245849494619 1.81659021245849494619
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
1.04880884817015154699, 1.04880884817015154699, 0.00000000000000000000
1.48323969741913258980, 1.48323969741913258980, 0.00000000000000000000
1.81659021245849499921, 1.81659021245849499921, 0.00000000000000000000
1.04880884817015154699 1.04880884817015154699
1.48323969741913258980 1.48323969741913258980
1.81659021245849499921 1.81659021245849499921
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000

I used for this a 32-bit variant and 64-bit variant, which made no
differences in output.
The issue here is that 1.0 branch still uses the inaccurate variant
from Danny, but nowaday's trunk uses for pow, sqrt, log, and exp
functions new implementations, which are treating rounding and
short-path calculation in an ISO-C variant (as gcc expects internally
such a math). Also complex-math part is completely rewritten on
mingw-w64's trunk version.

IMHO it might be a good idea to replace step by step the classical
runtime functions, so that they are not using here msvcrt in first
place. Of course the import-symbols of msvcrt (for compatibility with
VC generated objects need to remain in .def exports, but the actual
functions should come directly from mingwex.

Kai Tietz

2011-04-30 15:47:57 UTC

Post by K. Frank
Hello Kai!

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

Post by K. Frank
sqrt (x) and pow (x, 0.5) ought to give the same result (even if not required
to by IEEE-754).

Yes, that's a good point. I hadn't considered the issue of compile-time
evaluation, and, ideally, it should agree with run-time evaluation, adding
yet another constraint to our wish list.
Furthermore, if we take the idea of a cross-compiler seriously, we
would also like the compile-time evaluations of a cross-compiler and
native compiler to agree.
Hmm...
So, we would like sqrt (x) and pow (x, 0.5) to agree. We would like
compile-time and run-time evaluations to agree. We would like
cross-compilers and native compilers to agree.
Taken at face value, this would seem to preclude using the platform
library and platform hardware to compute things like sqrt and pow
(unless we know they comply with some agreed-upon standard,
such as IEEE-754).
Just to make the reasoning explicit, if I perform my run-time evaluation
on platform A, e.g., using platform A's FPU, then a B-->A cross-compiler
running on platform B won't have platform A's FPU available to perform
consistent compile-time evaluations.
So you either give up some of the above-wished-for consistency, you
forgo the potential performance benefits of using the platform's library
and / or hardware, or you get everybody to use a common standard.

Be, however, cognizant of Greg's concern. Forcing pow to agree with
sqrt just for y = 0.5 has the potential to make pow, in a sense, internally
inconsistent, It's a tricky trade-off. (Ideally, both pow and sqrt would
be "exact" and therefore agree with one another automatically.)
By the way, I also see a problem with run-time powl for powl (2.2L, 0.5L).
It disagrees slightly from sqrtl (2.2L) and from the compile-time value
for powl (2.2L, 0.5L).
I modified your test program to compare sqrt with pow (x, 0.5) (and sqrtl
C:\>gcc --version
gcc (GCC) 4.5.2 20101002 (prerelease) [svn/rev.164902 - mingw-w64/oz]
C:\>gcc -o sqrt_pow sqrt_pow.c
C:\>sqrt_pow
1.04880884817015163080, 1.04880884817015163080, 0.00000000000000000000
1.48323969741913264109, 1.48323969741913264109, 0.00000000000000000000
1.81659021245849494619, 1.81659021245849494619, 0.00000000000000000000
1.04880884817015163080 1.04880884817015163080
1.48323969741913264109 1.48323969741913264109
1.81659021245849494619 1.81659021245849494619
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
1.04880884817015154699, 1.04880884817015154699, 0.00000000000000000000
1.48323969741913258980, 1.48323969741913258970, -0.00000000000000000011
1.81659021245849499921, 1.81659021245849499921, 0.00000000000000000000
1.04880884817015154699 1.04880884817015154699
1.48323969741913258980 1.48323969741913258980
1.81659021245849499921 1.81659021245849499921
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 -0.00000000000000000011 0.00000000000000000000
#include <stdio.h>
#include <math.h>
double abc[3] = { 1.1, 2.2, 3.3 };
long double abcl[3] = { 1.1L, 2.2L, 3.3L };
int main()
{
double r1[3], r2[3], r3[3];
long double rl1[3], rl2[3], rl3[3];
int i;
for (i = 0; i < 3; i++)
__mingw_printf ("%.20F, %.20F, %.20F\n",
(r1[i] = sqrt (abc[i])),
(r2[i] = pow (abc[i], 0.5)),
(r3[i] = pow (abc[i], 0.5) - sqrt (abc[i])));
__mingw_printf ("%.20F %.20F\n", sqrt (1.1), pow (1.1, 0.5));
__mingw_printf ("%.20F %.20F\n", sqrt (2.2), pow (2.2, 0.5));
__mingw_printf ("%.20F %.20F\n", sqrt (3.3), pow (3.3, 0.5));
r1[0] -= sqrt (1.1);
r1[1] -= sqrt (2.2);
r1[2] -= sqrt (3.3);
__mingw_printf ("%.20F %.20F %.20F\n", r1[0], r1[1], r1[2]);
r2[0] -= pow (1.1, 0.5);
r2[1] -= pow (2.2, 0.5);
r2[2] -= pow (3.3, 0.5);
__mingw_printf ("%.20F %.20F %.20F\n", r2[0], r2[1], r2[2]);
for (i = 0; i < 3; i++)
__mingw_printf ("%.20LF, %.20LF, %.20LF\n",
(rl1[i] = sqrtl (abcl[i])),
(rl2[i] = powl (abcl[i], 0.5L)),
(rl3[i] = powl (abcl[i], 0.5L) - sqrtl (abcl[i])));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (1.1L), powl (1.1L, 0.5L));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (2.2L), powl (2.2L, 0.5L));
__mingw_printf ("%.20LF %.20LF\n", sqrtl (3.3L), powl (3.3L, 0.5L));
rl1[0] -= sqrtl (1.1L);
rl1[1] -= sqrtl (2.2L);
rl1[2] -= sqrtl (3.3L);
__mingw_printf ("%.20LF %.20LF %.20LF\n", rl1[0], rl1[1], rl1[2]);
rl2[0] -= powl (1.1L, 0.5L);
rl2[1] -= powl (2.2L, 0.5L);
rl2[2] -= powl (3.3L, 0.5L);
__mingw_printf ("%.20LF %.20LF %.20LF\n", rl2[0], rl2[1], rl2[2]);
return 1;
}

Post by Kai Tietz
Kai

Thanks, Kai, for your insight and explanations.
K. Frank

I add a special case to mingw-w64's pow(f|l) function about y=0.5 and
get now for your adjusted testcase the following results, which are
looking much better IMHO.
$ ./tst.exe
1.04880884817015163080, 1.04880884817015163080, 0.00000000000000004142
1.48323969741913264109, 1.48323969741913264109, -0.00000000000000000857
1.81659021245849494619, 1.81659021245849494619, -0.00000000000000000412
1.04880884817015163080 1.04880884817015163080
1.48323969741913264109 1.48323969741913264109
1.81659021245849494619 1.81659021245849494619
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
1.04880884817015154699, 1.04880884817015154699, 0.00000000000000000000
1.48323969741913258980, 1.48323969741913258980, 0.00000000000000000000
1.81659021245849499921, 1.81659021245849499921, 0.00000000000000000000
1.04880884817015154699 1.04880884817015154699
1.48323969741913258980 1.48323969741913258980
1.81659021245849499921 1.81659021245849499921
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
I used for this a 32-bit variant and 64-bit variant, which made no
differences in output.
The issue here is that 1.0 branch still uses the inaccurate variant
from Danny, but nowaday's trunk uses for pow, sqrt, log, and exp
functions new implementations, which are treating rounding and
short-path calculation in an ISO-C variant (as gcc expects internally
such a math). Also complex-math part is completely rewritten on
mingw-w64's trunk version.
IMHO it might be a good idea to replace step by step the classical
runtime functions, so that they are not using here msvcrt in first
place. Of course the import-symbols of msvcrt (for compatibility with
VC generated objects need to remain in .def exports, but the actual
functions should come directly from mingwex.

PS: The differences shown in results for 32-bit aren't caused by
inaccuarcy of used math directly. Instead it is caused by the fact
that 32-bit use FPU registers (which are 80-bit always). By this the
inline difference without using store, have for double small
differences. By first storing values and then printing those stored
values you see also for 32-bit a delta of zero.

Interesting this behavior is on 64-bit x86_64 different, as it uses
always precise accuracy to store results in registers. By this you
have really just a 64-bit result for double type, and not indirecty
80-bit (as shown in first case). I altered your testcase a bit in
following lines

...
for (i = 0; i < 3; i++)
{
r1[i] = sqrt (abc[i]);
r2[i] = pow (abc[i], 0.5);
r3[i] = pow (abc[i], 0.5) - sqrt (abc[i]);
__mingw_printf ("%.20F, %.20F, %.20F\n",
r1[i], r2[i], r3[i]);
}
...

and get now
$ ./tst.exe
1.04880884817015163080, 1.04880884817015163080, 0.00000000000000000000
1.48323969741913264109, 1.48323969741913264109, 0.00000000000000000000
1.81659021245849494619, 1.81659021245849494619, 0.00000000000000000000
1.04880884817015163080 1.04880884817015163080
1.48323969741913264109 1.48323969741913264109
1.81659021245849494619 1.81659021245849494619
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
1.04880884817015154699, 1.04880884817015154699, 0.00000000000000000000
1.48323969741913258980, 1.48323969741913258980, 0.00000000000000000000
1.81659021245849499921, 1.81659021245849499921, 0.00000000000000000000
1.04880884817015154699 1.04880884817015154699
1.48323969741913258980 1.48323969741913258980
1.81659021245849499921 1.81659021245849499921
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000

as result. it is an interesting fact that this happens.

Regards,
Kai

K. Frank

2011-04-30 22:12:00 UTC

Hello Bob and Kai!

Post by K. Frank
C:\>gcc -o pow_test pow_test.c
C:\>pow_test
2.20000000000000017764^3.10000000000000008882 = 11.52153412678571875460

2.20000000000000017764^3.10000000000000008882 = 11.52153412678571783832
so your result is accurate to only 16 digits. Your

This is exactly what I would expect (short of looking at the actual
binary representation). An IEEE double is 64 bits, with a 52-bit
mantissa, which, with the "implicit" bit, gives 53 bits of precision.
This gives just under 16 decimal digits of precision, which is what
you see.

2.20000000000000000004^3.09999999999999999991 = 11.52153412678571414718
2.20000000000000000004^3.09999999999999999991 = 11.52153412678571414732
so your result is accurate to about 20 digits.

As I understand it, gcc defines long double as the x87 80-bit floating point
number. This has a 64-bit mantissa, and I believe that it does not use an
implicit bit, so this gives slightly more than 19 decimal digits of precision,
again consistent with your result.

Bob

Well, what is to be expected as we used here in test application an
%.20F. By specifying here a %.40F its calculated result is
11.5215341267857141471786519559827866032720 and it is has binary
identity for what gmp calculates for 80-bit IEEE 754 floating point.
Which means for our purposes in gcc, it has right accuracy.

(Kai's comment seems to confirm that gcc's long double is, indeed,
the 80-bit floating-point number.)

Kai

Thanks to all for the insight.

K. Frank

JonY

2011-05-01 13:28:40 UTC

digits for mantissa:64
digit:18, min exp:-16381 (-4931 10th)
max exp: 16384 (4932 10th)
Epsilon:1.0842e-019
Which means that indeed the accuracy of 20 digits after decimal point
are pretty precise here.

I've followed with some unease at the way "accuracy" and "precision"
have been used in this thread. To nuance Keith's point below, you can
write down 20 digits after the decimal point but that does not mean you
have 20-digit accuracy. They are very different things.

You cannot get 20 decimal digits precision after the decimal point, with
80-bit floating point. You cannot even get 20 significant digit
precision -- there is a subtle but extremely important distinction. The
best you can hope for is 19 significant digits.

So long doubles don't do normalisation, and thus have no extra, implicit
mantissa bit? (Have come across something that suggests this as an
historical quirk but it was unclear.) That would set long doubles apart
from singles and doubles in the IEEE754 scheme of things... Interesting.
To add my two pennyworth on pow/sqrt: Although changing the behaviour of
pow() to give the same answer as sqrt() for the special case of 0.5 may
seem like a good idea, this is potentially introducing a discontinuity
into pow() which could be disastrous. Fundamentally, why should you
expect two transcendental quantities calculated in different ways with
finite-precision arithmetic to give exactly the same result? Floats are
not real numbers! In fact some of the comparisons in this thread have
involved differences of a few ulps (units of least precision) - which
looks really good to me.

Good point, does only 0.5 get special treatment? What of 0.25 or 0.499....?

I think those concerned with accuracy and precision over performance
should be using something more specialized like GMP/MPFR arbitrary
precision libraries, with the C library (libmingwex in this case)
providing a "good enough" and "fast enough" implementation.

Slightly OT:
Arbitrary precision square root:
<http://www.embedded-systems.com/98/9802fe2.htm>

JonY

2011-05-03 01:24:07 UTC

Post by K. Frank
So, we would like sqrt (x) and pow (x, 0.5) to agree.

The point below about off-topic diversions into precision is valid but
can I reiterate, there is no good reason why x^(0.5) calculated two
different ways should agree since the two different routes will
accumulate different rounding errors. Adding a special case to force
agreement for x^n when n = 0.5 will introduce a discontinuity into the
pow() function and is a very bad idea. The value of returned by pow(x,
0.5) might be less accurate compared to sqrt(x) but that is part of the
fundamental issue with using floating-point numbers: they are only ever
approximations. You have to take that fact on board.

Correct, the ways of calculation of pow (x, 0.5) and sqrt (x) are different.
Nevertheless is it an issue in gcc. As if you are writing pow (1.1,
0.5) or sqrt (1.1) - which means gcc optimize it via gmp to a
precalculated result - it has identical values. But by using
math-library results in a - expectable - difference in result.
So IMHO pow should special case here to provide same result as pow
does for y = 0.5.

I would agree with the others that its a horrible hack to treat "0.5"
special, I don't remember reading anything about requiring pow() results
to agree with sqrt().

Can we at least follow a standard spec like:
<http://pubs.opengroup.org/onlinepubs/009695399/functions/pow.html>

We can remove the special case, right?

Kai Tietz

2011-05-03 09:11:53 UTC

Post by K. Frank
So, we would like sqrt (x) and pow (x, 0.5) to agree.

I would agree with the others that its a horrible hack to treat "0.5"
special, I don't remember reading anything about requiring pow() results
to agree with sqrt().
<http://pubs.opengroup.org/onlinepubs/009695399/functions/pow.html>
We can remove the special case, right?

Well, as some people aren't able to use reply-all here, but have wise
suggestion about implementation details they don't satisfy themself, I
am a bit annoyed.
Anyway, I see that this special-casing is neither explicitly demanded,
but also it isn't explicitly forbidden. So I see here not much reason
for removing this. It might be that I haven't seen the forbidden
thing on the spec. The sqrt special-case has at least one advantage
over the log2/exp2 variant. First it is faster, as just one FPU
command is used to calculate sqrt. Secondly, is provides better result
and compatible one to gcc's internal used soft-floating-point library.
As I don't see in spec that this special casing shouldn't be done, I
think we will keep it on mingw-w64's side.
Well, it might be a good idea on mingw.org's side to recheck the exp2,
exp2f, exp2l, and expl implementation, as here you can easily prove
that the calculated results have a rounding issue, which leads for
combined operations - like exp2 (y * log2 (fabs (x)) - to
mis-calculations over two eps. But well, this might be treated as
typical inaccuracy ...

Regards,
Kai

Kai Tietz

2011-05-03 10:58:12 UTC

Post by K. Frank
So, we would like sqrt (x) and pow (x, 0.5) to agree.

The reason is not introducing a discontinuity into pow() around 0.5.
Forcing the two functions to agree may seem like a good idea but in
reality, you are creating another much more insidious problem. You are
embedding a potentially disastrous property into pow().

Post by Kai Tietz
It might be that I haven't seen the forbidden
thing on the spec. The sqrt special-case has at least one advantage
over the log2/exp2 variant. First it is faster, as just one FPU
command is used to calculate sqrt. Secondly, is provides better result
and compatible one to gcc's internal used soft-floating-point library.
As I don't see in spec that this special casing shouldn't be done, I
think we will keep it on mingw-w64's side.

Sorry, but that's a really bad call IMO for the reason stated above. You
are ignoring the consequences to try to (mistakenly) enforce
mathematical correctness. Even basic FP ops like addition and
subtraction deviate from pure mathematical notions of equality. For
example, 1 + a, where a is less than the machine epsilon will return
precisely unity! Floating point numbers ain't reals and you can't force
them to behave exactly like reals...
P.

I am aware of this what you are telling me, but there is in pow
function already special-casing for x^y for y with integer values in
range for y >= INT_MIN and y <= INT_MAX. As here more precise
calculation via powi is used. This makes that results of pow having
already discontinuity.
So pow functions doesn't have the requirement of having continuity.

Post by Kai Tietz
Well, it might be a good idea on mingw.org's side to recheck the exp2,
exp2f, exp2l, and expl implementation, as here you can easily prove
that the calculated results have a rounding issue, which leads for
combined operations - like exp2 (y * log2 (fabs (x)) - to
mis-calculations over two eps. But well, this might be treated as
typical inaccuracy ...
Regards,
Kai

Regards,
Kai

K. Frank

2011-05-03 13:32:29 UTC

Hello Kai and Everybody!

A couple of comments about whether to use special cases for
pow (x, y) for things like y = 0.5 or y = integer...

...
The reason is not introducing a discontinuity into pow() around 0.5.
Forcing the two functions to agree may seem like a good idea but in
reality, you are creating another much more insidious problem. You are
embedding a potentially disastrous property into pow().

...

I'm not aware that either the C standard or the IEEE-754 standard specifies
one behavior or the other. (I think that they don't, but I could be wrong.)

I believe that is has been commonplace over the years to use these special
cases for pow. (This is just the sense I have from personal recollection and
oral tradition.) I think speed optimization has been the reason, rather than
improved accuracy or agreement with things like sqrt, but perhaps it's been
a mixture of those. Of course, just because there is precedent, doesn't
mean it's the right way to go.

If I had to choose between pow (x, 0.5) agreeing with sqrt (x) and pow (x, y)
being monotone in y, I think I would choose the latter (i.e., choose not to
use the special cases). But there are good arguments on both sides.

One solution that would be attractive to me would be to implement pow
"exactly" (i.e., bankers' rounding to the nearest representable value), and
then use all the special cases you want as a speed optimization. Now
the special cases won't change the value of pow (assuming sqrt is properly
implemented), so the special cases won't introduce any non-monotonicity
or "discontinuities" into pow.

What I don't know is how hard or expensive this is. I'm pretty sure it's
not too hard to calculate exp "exactly." But I think that calculating log
is harder. Perhaps James Beard knows whether it's practical to calculate
pow "exactly," and maybe even how to do it.

(Another approach, which I very much don't recommend, is to use pow
to calculate sqrt. You get reduced speed and reduced accuracy for
sqrt, but at least you do get sqrt and pow to agree without introducing
non-monotonicity into pow.)

Lastly, I have to believe that this is a well-understood and "solved" problem
in the numerical-analysis community (of which I am not a part). Perhaps
it would make sense to find out how the experts who have doing this for
fifty-plus years deal with this issue.

Post by Kai Tietz
Regards,
Kai

And thanks to the mingw and mingw64 teams who have invested the
time and effort to actually write the code we're so breezily discussing.
My comments are in no way meant to be a criticism of those efforts.
I'm just offering up one man's opinion from the user community in the
hope that it might be helpful.

Bets regards.

K. Frank

James K Beard

2011-04-30 13:36:04 UTC

My comments don't seem to be making it into the thread this week, possibly
because of a switch between my reply-to email address and another. So, if
it gets lost, perhaps nightstrike can intervene, or echo the information.
Perhaps not.

The sqrt() function can be implemented with excellent speed to within a bit
of full mantissa accuracy using iterations of Newton's method, as is
classically done in libraries. Not so with ln() or exp(), which are
classically done by extracting the exponent and using the reduced-range
mantissa in a modified Chebychev approximation. Arithmetic co-processors do
sqrt(), ln(), and exp() internally with 80-bit floating point, then store in
32, 64, or 80 bit floating point format. So, the co-processors can get full
mantissa accuracy with 32-bit and 64-bit sqrt() and pow(x,0.5). So, the
problem is non-existent for Intel/PPC and most other common workstation
processors. Yet we are looking at software transcendental function
libraries.

James K Beard

-----Original Message-----
From: Greg Chicares [mailto:***@sbcglobal.net]
Sent: Friday, April 29, 2011 7:38 PM
To: MinGW Users List
Cc: mingw64
Subject: Re: [Mingw-w64-public] [Mingw-users] Math library discrepancies
that surprised me.

Post by K. Frank
By the way, could someone confirm that mingw does use msvcrt for
sqrt and pow?

http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/mingw/mingwex/math/?cvsroot=
src

Looks to me like sqrt(double) calls into msvcrt. But pow() is a
different story. Danny implemented it in libmingwex because of
some quality-of-implementation problem with msvcrt; then people
complained that the libmingwex version was slower, and I can't
remember where it wound up, but it's all in the archives.

Post by K. Frank
sqrt (x) and pow (x, 0.5) ought to give the same result (even if not required
to by IEEE-754).

Not so many years ago (perhaps when msvcrt was written), it was
thought that correctly-rounded transcendental functions weren't
feasible in practice, so library authors had a looser attitude
toward exactness. If you have an ideal implementation of sqrt()
but a pow() that's only approximate, should you special-case
the power of exactly 0.5? If you do, then pow() and sqrt() will
be consistent, but then the powers
x^(0.5*(1-epsilon))
x^(0.5 )
x^(0.5*(1+epsilon))
might not be monotone. Some would call that a bad tradeoff.

----------------------------------------------------------------------------
--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd

JonY

2011-05-01 14:17:41 UTC

Now, about that math library problem -- it seems to me that asking full
mantissa accuracy and agreement for 32-bit and 64-bit floating point is
reasonable. Differences between FPUs makes full mantissa accuracy and
agreement at 80 bits a bit of a question, and I wouldn't demand it so that
the FPU can be used - unless you are using your own algorithms for quad
precision or other situation where the FPU doesn't provide the result.
AFAIK, all currently produced FPUs are IEEE compliant. Are there some out
there that are still bucking the inevitable?

-- Putting mingw lists back --

I heard older AMDs (late 90s-200Xs era) don't have 80-bit FPUs, but they
do have 64-bit FPUs, so it wasn't strange that results do differ
slightly between CPUs. Maybe that has changed for newer CPUs.

Does the long double epsilon 1.08420217248550443401e-19L accuracy
absolutely require an 80-bit FPU?

IMHO, we don't need to produce results more accurate than that, but its
a bonus if it doesn't impact performance too much.

James K Beard

2011-05-01 15:17:26 UTC

I would hope that in the interests of performance and maintainable code that
we simply allow rounding bits on any targets that use 64-bit FPU that may
still be in service somewhere. I would not expect such processors to be
doing such critical work that this would be a real problem.

One thing that is real is consequences of the alternative. If gcc supports
full mantissa accuracy for the entire FP library, this requirement will
generate a full software scientific function library with extended precision
that is loaded with every program compiled with <math.h> or the equivalent
in other languages, i.e. that uses floating point exponentiation or a sine,
cosine, or logarithm. That amounts to bloat for 99.99% of the users and is
likely frivolous for the two or three users that actually get different
results. And, this appendage will likely be enthusiastically supported for
a generation of volunteers, whose time might be better spent fleshing out
the C++ feature set to catch up with the standard, etc.

Think the Pareto Principle. Climbing the curve too far past the knee...

James K Beard

-----Original Message-----
From: JonY [mailto:***@users.sourceforge.net]
Sent: Sunday, May 01, 2011 10:18 AM
To: mingw-w64-***@lists.sourceforge.net
Cc: MinGW Users List
Subject: Re: [Mingw-w64-public] [Mingw-users] Math library discrepancies
that surprised me.

Now, about that math library problem -- it seems to me that asking
full mantissa accuracy and agreement for 32-bit and 64-bit floating
point is reasonable. Differences between FPUs makes full mantissa
accuracy and agreement at 80 bits a bit of a question, and I wouldn't
demand it so that the FPU can be used - unless you are using your own
algorithms for quad precision or other situation where the FPU doesn't

provide the result.

AFAIK, all currently produced FPUs are IEEE compliant. Are there some
out there that are still bucking the inevitable?

-- Putting mingw lists back --

I heard older AMDs (late 90s-200Xs era) don't have 80-bit FPUs, but they do
have 64-bit FPUs, so it wasn't strange that results do differ slightly
between CPUs. Maybe that has changed for newer CPUs.

Does the long double epsilon 1.08420217248550443401e-19L accuracy absolutely
require an 80-bit FPU?

IMHO, we don't need to produce results more accurate than that, but its a
bonus if it doesn't impact performance too much.

K. Frank

2011-05-01 15:28:46 UTC

Hello Jon!

Now, about that math library problem...
...
AFAIK, all currently produced FPUs are IEEE compliant. Are there some out
there that are still bucking the inevitable?

-- Putting mingw lists back --
I heard older AMDs (late 90s-200Xs era) don't have 80-bit FPUs, but they
do have 64-bit FPUs, so it wasn't strange that results do differ
slightly between CPUs. Maybe that has changed for newer CPUs.
Does the long double epsilon 1.08420217248550443401e-19L accuracy
absolutely require an 80-bit FPU?

If your willing to do your floating-point math in software, then no.
But if you want to get 19-decimal-digit precision in your FPU, it
pretty much needs to be 80 bits. (You can always trade off exponent
bits for mantissa bits, but it's much better stick with IEEE choice.)

The 80-bit floating-point format has a 64-bit mantissa (and, I believe, no
implicit bit): log_base_10 (2^64) = 19.27, so 64 bits gets you a little
over 19 decimal digits of precision. So to get 19 decimal digits, you
need 64 bits plus exponent plus sign, and that gives you something
pretty similar to the 80-bit format, and it would be hard to justify making
a non-standard choice.

Post by JonY
IMHO, we don't need to produce results more accurate than that, but its
a bonus if it doesn't impact performance too much.

James K Beard

2011-05-01 17:45:23 UTC

K. Frank: You raise the very important point of embedded processors that
may not have an FPU. I venture that it's the responsibility of the designer
to use processors that have an FPU when the application requires one. The
middle ground is the problem - where some FP is used but software FP is
acceptable. And that is the main practical focus.

There are three software FP implementation scenarios that must be addressed
by gcc:
(1) Software FP for 32-bit results and 64-bit results,
(2) Quad precision, and
(3) Multiple precision.

What has historically been done as a solution to (1) above, from the days
when Intel and Motorola had an optional FPU as a second chip, was to have a
software FP library that was loaded when the target did not have an FPU, and
the FP instructions were trapped and emulated with software FP. Whatever
the details of the implementation, a separate FPU emulator that provides
IEEE-compliant 80-bit arithmetic with loads and stores for IEEE integer and
FP data types strikes me as appropriate.

Quad precision is supported by preservation of the carry bits in the FPU for
80-bit operations, which support their use in stringing together more than
one FPU 80-bit operation to support 128-bit and larger software FP. Without
the FPU and architecture-dependent exploitation of the preserved carry bits
(a lot of data for multiplication), you are stuck with the much slower
stringing-together-of-integers extended precision floating point which is
best left to those libraries.

Multiple precision is a separate library for user-specified mantissa length
and (usually) a 16-bit exponent. The Von Niemann format of a biased
exponent with one sign bit for the whole FP word may or may not be a good
idea and its principal advantage, monotonic increase in value when the FP
number is interpreted as an integer, is largely lost, so I think that's best
left to the implementer. And, when an FPU is not present, this is where
quad precision must live, so why not put it in the gcc plan and
documentation that way?

James K Beard

-----Original Message-----
From: K. Frank [mailto:***@gmail.com]
Sent: Sunday, May 01, 2011 11:29 AM
To: mingw64; MinGW Users List
Subject: Re: [Mingw-w64-public] [Mingw-users] Math library discrepancies
that surprised me.

Hello Jon!