|
List Info
Thread: Implementation of expl()
|
|
| Implementation of expl() |
  United States |
2007-11-02 19:13:05 |
With all the success of developing several other missing
C99 math routines, I decided to tackle expl() (which I
need to tackle tanhl()). As with the other functions,
this implementation is known to work on amd64 and should
be valid on other 64-bit architectures.
As with the previous function, I can only do a limited
amount of statistical testing of expl(). The reference
values in testing are computing using GMP/MPFR with
265 bits of precision. The reference value is then
converted to a long double for comparison to my expl().
Testing shows:
Special cases:
exp(0) = 1e+00
exp(inf) = inf
exp(-inf)= 0e+00
exp(nan) = nan
With x in the non-exceptional range of
[ln(LDBL_MIN):ln(LDBL_MAX)],
I computed 22.7 million values of expl() that span the
range. The
maximum observed ULP was 1 where only 7378 values (0.0325%)
had a
ULP exceeding 0.5.
In the exceptional range leading to gradual underflow,
testing was
much more difficult. Computing a ULP seemed troublesome
because of
the loss of precision as the subnormal numbers approach
zero. So,
I broke the range into 4 segments and compute the relative
maximum
error in expl() compared to a value computed by GMP/MPFR.
x in [-11366.00:-11355.14]
RME > 1e-18 --> (73524/1086300) = 6.77%
RME > 1e-10 --> (69314/1086300) = 6.38%
RME > 1e-8 --> (69314/1086300) = 6.38%
RME > 1e-4 --> (69314/1086300) = 6.38%
x in [-11377.00:-11366.00]
RME > 1e-18 --> (2/1100000) = 0.00%
x in [-11388.00:-11377.00]
RME > 1e-18 --> (0/1100000) = 0.00%
x in [-11399.50:-11388.00]
RME > 1e-18 --> (156152/1149852) = 13.58%
RME > 1e-10 --> (156152/1149852) = 13.58%
RME > 1e-8 --> (156152/1149852) = 13.58%
RME > 1e-4 --> (136328/1149852) = 11.86%
The above suggests that in the middle on exceptional range,
my
expl() gives an accurate anwer. In the last segment, the
RME
significantly increases and is attributed to the loss of
precision.
Although I'm not completely satisfied with the gradual
underflow
behavior, I'm providing the code for comments and possible
inclusion
into src/msun. Note, a diff against cvs is not provided
because of
the entanglement of my other patches to math.h, Symbol.map,
and
Makefile.
Enjoy.
--
Steve
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
| Re: Implementation of expl() |
  United States |
2007-12-09 15:25:05 |
On Fri, Nov 02, 2007, Steve Kargl wrote:
> With all the success of developing several other
missing
> C99 math routines, I decided to tackle expl() (which I
> need to tackle tanhl()).
Hmm, great, but where's the patch? Maybe the
mailing list
software ate it.
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
| Re: Implementation of expl() |
  United States |
2007-12-09 15:34:50 |
On Sun, Dec 09, 2007 at 04:25:05PM -0500, David Schultz
wrote:
> On Fri, Nov 02, 2007, Steve Kargl wrote:
> > With all the success of developing several other
missing
> > C99 math routines, I decided to tackle expl()
(which I
> > need to tackle tanhl()).
>
> Hmm, great, but where's the patch? Maybe the
mailing list
> software ate it.
This is the current version. I need to revise how I
computed
the ploynomial coefficient. In short, I mapped r in
[-ln(2)/2:ln(2)/2] into the range x in [-1,1] for the
Chebyshev
interpolation, but I never scaled x back into r. This is
the
reason why the lines "r = r * TOLN2;" exists.
I don't remember if bde sent me comments on this code. I
sure
he has plenty.
steve
/*-
* Copyright (c) 2007 Steven G. Kargl
* All rights reserved.
*
* Redistribution and use in source and binary forms, with
or without
* modification, are permitted provided that the following
conditions
* are met:
* 1. Redistributions of source code must retain the above
copyright
* notice unmodified, this list of conditions, and the
following
* disclaimer.
* 2. Redistributions in binary form must reproduce the
above copyright
* notice, this list of conditions and the following
disclaimer in the
* documentation and/or other materials provided with the
distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY
EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED.
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.
*/
/*
* Use argument reduction to compute exp(x). The reduction
writes
* x = k * ln(2) + r with r in the range [0, ln(2)]. This
then
* gives exp(x) = 2**k * exp(r), and exp(r) is evaluated via
a nearly
* minimax polynomial approximation such that r is mapped
into [-1:1].
*/
#include "math.h"
#include "math_private.h"
#include "fpmath.h"
/* ln(LDBL_MAX) = 11356.523406294144 */
#define XMAX 0x2.c5c85fdf473de6ap12L
/* ln(LDBL_MIN) = -11355.137111933024 */
#define XMIN -0x2.c5b2319c4843accp12L
/* | ln(smallest subnormal) | = 11399.498531488861 */
#define GRAD 0x2.c877f9fc278aeaap12L
#define LN2 0xb.17217f7d1cf79acp-4L /* ln(2) */
#define LN2HI 0xb.17217f7d2000000p-4L
#define LN2LO -0x3.08654361c4c67fcp-44L
#define TOLN2 0x2.e2a8eca5705fc2fp0L /* 2/ln(2) */
#define ILN2 0x1.71547652b82fe17p0L /* 1/ln(2) */
#define ILN2HI 0x1.71547652b800000p0L
#define ILN2LO 0x2.fe1777d0ffda0d2p-44L
#define LN2O2 0x5.8b90bfbe8e7bcd6p-4L /* ln(2)/2 */
#define ZERO 0.L
/*
* This set of coefficients is used in the polynomial
approximation
* for exp(r) where r is in [-ln(2)/2:ln(2)/2], and the r
comes from
* the argument reduction of x.
*/
#define C0 0x1.000000000000000p0L
#define C1 0x5.8b90bfbe8e7bcd6p-4L
#define C2 0xf.5fdeffc162c7543p-8L
#define C3 0x1.c6b08d704a0bf8bp-8L
#define C4 0x2.76556df749cee54p-12L
#define C5 0x2.bb0ffcf14ce6221p-16L
#define C6 0x2.861225f0d8f0edfp-20L
#define C7 0x1.ffcbfc588b0c687p-24L
#define C8 0x1.62c0223a5c823fdp-28L
#define C9 0xd.a929e9caf3e1ed2p-36L
#define C10 0x7.933d4562e3b2cd7p-40L
#define C11 0x3.d1958e6a3764b64p-44L
#define C12 0x1.c3bd650fc1e343ap-48L
#define C13 0xc.0b0c98b3649ff26p-56L
#define C14 0x4.c525936609b02cfp-60L
#define C15 0x1.c36e84400493e74p-64L
long double
expl(long double x)
{
union IEEEl2bits z;
int k, s;
long double r;
z.e = x;
s = z.bits.sign;
z.bits.sign = 0;
/* x is either 0 or a subnormal number. */
if (z.bits.exp == 0) {
if ((z.bits.manl | z.bits.manh) == 0)
return (1);
else
return (1 + x);
}
if (XMIN <= x && x <= XMAX) {
/* Argument reduction. */
k = (int) (z.e * ILN2HI + z.e * ILN2LO);
r = z.e - k * LN2HI - k * LN2LO;
if (r > LN2O2) {
r -= LN2;
k++;
}
/* Compute exp(r) via the polynomial approximation. */
r = r * TOLN2;
z.e = C0 + r * (C1 + r * (C2 + r * (C3 + r * (C4 + r *
(C5 + r * (C6 + r * (C7 + r * (C8 + r * (C9 + r *
(C10 + r * (C11 + r * (C12 + r * (C13 + r *
(C14 + r * C15))))))))))))));
if (s) {
z.e = 1 / z.e;
z.bits.exp -= k;
} else
z.bits.exp += k;
return (z.e);
}
/*
* If x = +Inf, then exp(x) = Inf.
* If x = -Inf, then exp(x) = 0.
* If x = NaN, then exp(x) = NaN.
*/
if (z.bits.exp == 32767) {
mask_nbit_l(z);
if (!(z.bits.manh | z.bits.manl))
return (s ? ZERO : 1.L / ZERO);
return ((x - x) / (x - x));
}
/* If x > 0 then, overflow to +Inf. */
if (!s)
return (1.L / ZERO);
/* For x < 0, check if gradual underflow is needed. */
if (z.e > GRAD)
return (ZERO);
/* Argument reduction. */
k = (int) (z.e * ILN2);
r = z.e - k * LN2HI - k * LN2LO;
if (r > LN2O2) {
r -= LN2;
k++;
}
/* Compute exp(r) via the polynomial approximation. */
r *= TOLN2;
z.e = C0 + r * (C1 + r * (C2 + r * (C3 + r * (C4 + r * (C5
+ r *
(C6 + r * (C7 + r * (C8 + r * (C9 + r * (C10 + r * (C11
+ r *
(C12 + r * (C13 + r * (C14 + r * C15))))))))))))));
z.e = 1 / z.e;
/*
* FIXME: There has to be a better way to handle gradual
underflow
* because the relative absolute error is fairly large for
numerous
* vlaues of x as exp(x) goes to zero.
*/
z.e = scalbnl(z.e, -k);
return (z.e);
}
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
| Re: Implementation of expl() |
  United States |
2007-12-09 16:39:18 |
On Sun, Dec 09, 2007, Steve Kargl wrote:
> On Sun, Dec 09, 2007 at 04:25:05PM -0500, David Schultz
wrote:
> > On Fri, Nov 02, 2007, Steve Kargl wrote:
> > > With all the success of developing several
other missing
> > > C99 math routines, I decided to tackle expl()
(which I
> > > need to tackle tanhl()).
> >
> > Hmm, great, but where's the patch? Maybe the
mailing list
> > software ate it.
>
> This is the current version. I need to revise how I
computed
> the ploynomial coefficient. In short, I mapped r in
> [-ln(2)/2:ln(2)/2] into the range x in [-1,1] for the
Chebyshev
> interpolation, but I never scaled x back into r. This
is the
> reason why the lines "r = r * TOLN2;"
exists.
Some minor fixes (mostly whitespace) are below.
Also, don't you lose precision when you do stuff like this?
z.e = 1 / z.e;
In any case, if you can get me the appropriate constants for
the
128-bit format, I'll clean everything up and check it in,
which
will make a lot of ports maintainers happy. That will pave
the way
to other stuff (e.g., MD versions of this) as well. We can
worry
about subnormals later.
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
| Re: Implementation of expl() |
  United States |
2007-12-09 16:41:33 |
On Sun, Dec 09, 2007, David Schultz wrote:
> On Sun, Dec 09, 2007, Steve Kargl wrote:
> > On Sun, Dec 09, 2007 at 04:25:05PM -0500, David
Schultz wrote:
> > > On Fri, Nov 02, 2007, Steve Kargl wrote:
> > > > With all the success of developing
several other missing
> > > > C99 math routines, I decided to tackle
expl() (which I
> > > > need to tackle tanhl()).
> > >
> > > Hmm, great, but where's the patch? Maybe the
mailing list
> > > software ate it.
> >
> > This is the current version. I need to revise how
I computed
> > the ploynomial coefficient. In short, I mapped r
in
> > [-ln(2)/2:ln(2)/2] into the range x in [-1,1] for
the Chebyshev
> > interpolation, but I never scaled x back into r.
This is the
> > reason why the lines "r = r * TOLN2;"
exists.
>
> Some minor fixes (mostly whitespace) are below.
Oops, here it is.
--- /usr/src/lib/msun/src/e_expl.c.bak 2007-12-09
17:30:35.000000000 -0500
+++ /usr/src/lib/msun/src/e_expl.c 2007-12-09
17:34:14.000000000 -0500
 -38,45
+38,46 
#define XMAX 0x2.c5c85fdf473de6ap12L
/* ln(LDBL_MIN) = -11355.137111933024 */
-#define XMIN -0x2.c5b2319c4843accp12L
+#define XMIN -0x2.c5b2319c4843accp12L
/* | ln(smallest subnormal) | = 11399.498531488861 */
#define GRAD 0x2.c877f9fc278aeaap12L
-#define LN2 0xb.17217f7d1cf79acp-4L /* ln(2) */
-#define LN2HI 0xb.17217f7d2000000p-4L
-#define LN2LO -0x3.08654361c4c67fcp-44L
+#define LN2 0xb.17217f7d1cf79acp-4L /* ln(2) */
+#define LN2HI 0xb.17217f7d2000000p-4L
+#define LN2LO -0x3.08654361c4c67fcp-44L
-#define TOLN2 0x2.e2a8eca5705fc2fp0L /* 2/ln(2) */
+#define TOLN2 0x2.e2a8eca5705fc2fp0L /* 2/ln(2) */
-#define ILN2 0x1.71547652b82fe17p0L /* 1/ln(2) */
-#define ILN2HI 0x1.71547652b800000p0L
-#define ILN2LO 0x2.fe1777d0ffda0d2p-44L
+#define ILN2 0x1.71547652b82fe17p0L /* 1/ln(2) */
+#define ILN2HI 0x1.71547652b800000p0L
+#define ILN2LO 0x2.fe1777d0ffda0d2p-44L
-#define LN2O2 0x5.8b90bfbe8e7bcd6p-4L /* ln(2)/2 */
+#define LN2O2 0x5.8b90bfbe8e7bcd6p-4L /* ln(2)/2 */
+
+#define ZERO 0.L
-#define ZERO 0.L
/*
* This set of coefficients is used in the polynomial
approximation
* for exp(r) where r is in [-ln(2)/2:ln(2)/2], and the r
comes from
* the argument reduction of x.
*/
-#define C0 0x1.000000000000000p0L
-#define C1 0x5.8b90bfbe8e7bcd6p-4L
-#define C2 0xf.5fdeffc162c7543p-8L
-#define C3 0x1.c6b08d704a0bf8bp-8L
-#define C4 0x2.76556df749cee54p-12L
-#define C5 0x2.bb0ffcf14ce6221p-16L
-#define C6 0x2.861225f0d8f0edfp-20L
-#define C7 0x1.ffcbfc588b0c687p-24L
-#define C8 0x1.62c0223a5c823fdp-28L
-#define C9 0xd.a929e9caf3e1ed2p-36L
-#define C10 0x7.933d4562e3b2cd7p-40L
-#define C11 0x3.d1958e6a3764b64p-44L
-#define C12 0x1.c3bd650fc1e343ap-48L
-#define C13 0xc.0b0c98b3649ff26p-56L
-#define C14 0x4.c525936609b02cfp-60L
-#define C15 0x1.c36e84400493e74p-64L
+#define C0 0x1.000000000000000p0L
+#define C1 0x5.8b90bfbe8e7bcd6p-4L
+#define C2 0xf.5fdeffc162c7543p-8L
+#define C3 0x1.c6b08d704a0bf8bp-8L
+#define C4 0x2.76556df749cee54p-12L
+#define C5 0x2.bb0ffcf14ce6221p-16L
+#define C6 0x2.861225f0d8f0edfp-20L
+#define C7 0x1.ffcbfc588b0c687p-24L
+#define C8 0x1.62c0223a5c823fdp-28L
+#define C9 0xd.a929e9caf3e1ed2p-36L
+#define C10 0x7.933d4562e3b2cd7p-40L
+#define C11 0x3.d1958e6a3764b64p-44L
+#define C12 0x1.c3bd650fc1e343ap-48L
+#define C13 0xc.0b0c98b3649ff26p-56L
+#define C14 0x4.c525936609b02cfp-60L
+#define C15 0x1.c36e84400493e74p-64L
long double
expl(long double x)
 -90,15
+91,10 
z.bits.sign = 0;
/* x is either 0 or a subnormal number. */
- if (z.bits.exp == 0) {
- if ((z.bits.manl | z.bits.manh) == 0)
- return (1);
- else
- return (1 + x);
- }
+ if (z.bits.exp == 0)
+ return (1 + x);
if (XMIN <= x && x <= XMAX) {
-
/* Argument reduction. */
k = (int) (z.e * ILN2HI + z.e * ILN2LO);
r = z.e - k * LN2HI - k * LN2LO;
 -117,11
+113,13 
if (s) {
z.e = 1 / z.e;
z.bits.exp -= k;
- } else
+ } else {
z.bits.exp += k;
+ }
return (z.e);
}
+
/*
* If x = +Inf, then exp(x) = Inf.
* If x = -Inf, then exp(x) = 0.
 -157,10
+155,11 
(C6 + r * (C7 + r * (C8 + r * (C9 + r * (C10 + r *
(C11 + r *
(C12 + r * (C13 + r * (C14 + r * C15))))))))))))));
z.e = 1 / z.e;
+
/*
* FIXME: There has to be a better way to handle gradual
underflow
* because the relative absolute error is fairly large for
numerous
- * vlaues of x as exp(x) goes to zero.
+ * values of x as exp(x) goes to zero.
*/
z.e = scalbnl(z.e, -k);
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
| Re: Implementation of expl() |
  Australia |
2007-12-09 23:18:39 |
On Sun, 9 Dec 2007, David Schultz wrote:
> Some minor fixes (mostly whitespace) are below.
>
> Also, don't you lose precision when you do stuff like
this?
> z.e = 1 / z.e;
>
> In any case, if you can get me the appropriate
constants for the
> 128-bit format, I'll clean everything up and check it
in, which
> will make a lot of ports maintainers happy. That will
pave the way
> to other stuff (e.g., MD versions of this) as well. We
can worry
> about subnormals later.
Why not convert the fdlibm algorithm for exp() as is done
for expf()?
It is much better (*), doesn't need to be debugged (except
for the
conversion), and would be easier to maintain.
Better algorithms exist, like someone named das used for
exp2(), but
would be harder to debug and maintain.
amd64 should probably use hardware for expl(), since long
doubles are
much harder to handle efficiently than doubles. Thus the
software
algorithm is needed mainly for ia64 and sparc64.
(*) Starting with the transformation of exp(x) to
x*(exp(x)+1)/(exp(x)-1).
The power series for the latter converges much faster and
has better
numeric properties. Polynomial approximations inherit these
properties.
Bruce
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
| Re: Implementation of expl() |
  United States |
2007-12-10 01:16:50 |
On Mon, Dec 10, 2007, Bruce Evans wrote:
> On Sun, 9 Dec 2007, David Schultz wrote:
>
> >Some minor fixes (mostly whitespace) are below.
> >
> >Also, don't you lose precision when you do stuff
like this?
> > z.e = 1 / z.e;
> >
> >In any case, if you can get me the appropriate
constants for the
> >128-bit format, I'll clean everything up and check
it in, which
> >will make a lot of ports maintainers happy. That
will pave the way
> >to other stuff (e.g., MD versions of this) as well.
We can worry
> >about subnormals later.
>
> Why not convert the fdlibm algorithm for exp() as is
done for expf()?
> It is much better (*), doesn't need to be debugged
(except for the
> conversion), and would be easier to maintain.
>
> Better algorithms exist, like someone named das used for
exp2(), but
> would be harder to debug and maintain.
I'm not worried about maintenance, since I don't expect God
to add
any major new features to e^x any time soon. Writing exp2()
took a
lot of reading papers and tinkering, and it's a pain in the
neck
to generate the constants and figure out the resulting error
for
each interval. I seem to recall trying the same tricks for
expl(),
but there were problems with rescaling in base e instead of
base 2
without losing accuracy. In any case, I don't have the kind
of time
needed to fix all of that stuff now.
I don't really care how expl() is implemented; anything that
works
is better than anything that doesn't. If someone comes up
with a
better, cleverer scheme later, that's great, but we've been
talking about a lot of this stuff forever and there's still
nothing in the tree.
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
| Re: Implementation of expl() |
  Australia |
2007-12-10 04:16:27 |
On Mon, 10 Dec 2007, David Schultz wrote:
> On Mon, Dec 10, 2007, Bruce Evans wrote:
>> Better algorithms exist, like someone named das used for
exp2(), but
>> would be harder to debug and maintain.
>
> I'm not worried about maintenance, since I don't expect
God to add
> any major new features to e^x any time soon.
But every different precision requires a significantly
different
implementation (mainly for magic number, but table-driven
methods
give a huge number of those) since we don't have algorithms
or
tools to generate all the magic numbers from the precision
parameter.
> Writing exp2() took a
> lot of reading papers and tinkering, and it's a pain in
the neck
> to generate the constants and figure out the resulting
error for
> each interval. I seem to recall trying the same tricks
for expl(),
> but there were problems with rescaling in base e
instead of base 2
> without losing accuracy. In any case, I don't have the
kind of time
> needed to fix all of that stuff now.
The problems are well understood . I didn't
understand them when
you started working on exp2() but do now. Essentially, they
are due
to the strange phenomenon that linear functions are usually
harder to
approximate (to within 1 ULP) than transcendental functions
expanded
in a Taylor series about 0. The linear scaling step with a
not-exactly
representable factor like log2(e) ends up being the most
difficult point
(unless it is done inaccurately).
> I don't really care how expl() is implemented; anything
that works
> is better than anything that doesn't. If someone comes
up with a
> better, cleverer scheme later, that's great, but we've
been
> talking about a lot of this stuff forever and there's
still
> nothing in the tree.
I only agree if "working" includes no new bugs and
good accuracy at little
cost.
We still don't support 64-bit precision on i386. Fixing
that has higher
priority.
Bruce
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
| Re: Implementation of expl() |
  Australia |
2007-12-10 05:22:11 |
On Sun, 9 Dec 2007, Steve Kargl wrote:
> On Sun, Dec 09, 2007 at 04:25:05PM -0500, David Schultz
wrote:
>> On Fri, Nov 02, 2007, Steve Kargl wrote:
>>> With all the success of developing several
other missing
>>> C99 math routines, I decided to tackle expl()
(which I
>>> need to tackle tanhl()).
>>
>> Hmm, great, but where's the patch? Maybe the
mailing list
>> software ate it.
>
> This is the current version. I need to revise how I
computed
> the ploynomial coefficient. In short, I mapped r in
> [-ln(2)/2:ln(2)/2] into the range x in [-1,1] for the
Chebyshev
> interpolation, but I never scaled x back into r. This
is the
> reason why the lines "r = r * TOLN2;"
exists.
>
> I don't remember if bde sent me comments on this code.
I sure
> he has plenty.
I don't think I commented about expl() before. Sorry, it's
further
from what I would like than the trig functions.
Here are some better polynomial approximations for it.
------------------------------------------------------------
----------------
(1) 64-bit precision, approximating exp(), saves a term and
should be more
accurate. Note the difference in the plots between the
Chebyshev
approximation and the final approximation. The
uglyness and extra
error in the latter are due to rounding the
very-high-precision
coeffs used in the former to 64 bits. The final
approximation
uses a Remez algorithm on rounded coeffs to try to
choose the best
set of rounded coeffs (!= the best set of coeffs,
rounded). Its
plot is ugly because my algorithm is unstable for
functions that
aren't even, so I had to stop it after 1 significant
iteration
(adjusting the P3 term from 0xaaaaaaaaaaaaaaab.0p-66 to
...aa8.0p-66).
If it worked properly then it would push all the
extrema back to the
top and bottom of the plot, and be slightly more
accurate than the
Chebyshev approximation. Remez gains very little over
Chebyshev in
infinite precision (about a factor of 2), but can
compensate for
huge errors (a factor of 100[0]) due to unfortunate
rounding of
coeffs.
------------------------------------------------------------
----------------
--- Simple Lagrange interpolation at Chebyshev points ---
5.975e-24
|'''''x'''''''''x'''''''''''x''''''''''''x''''''''''_x''''''
_''"
| : x "
" :: : :
|_ : : : _ " :
" : : : :: :
|: : x : : : : : :
: : :: :
|: " : : : : "
: " : x : x :
|: : : " : _ : : :
" : : : :
|: : : : : : : " :
: : : : :
|: : : : " : : :
: : : : : :
|: : : : : : x : :
: : _ : :
|: : : : : : : :
" : : : : :
|:: : x _ : _ : : :
: : : ::|
`::`:```:````:`````:`````:``````:`````"`````:`````"
;````_```:`::`
: : : : : _ : : : :
: : : :|
: : " : : : : x :
: : : : :|
: : : : : : : : : _
: : : :|
: : : : : : _ : : :
: : : :|
: : : : _ : : : "
: _ : : :|
: x: " : x : _ :
: : : : :|
: :: : : : : : : : :
_ " :|
: : :: : x : _ x :
: : _|
: _ :_ _ _ _
: |
-5.95e-24
x........."...........x............x...........x.......
.._.....|
-0.34658
0.34658
--- Lagrange/Chebyshev with coeff adjusted to 64-bit
precision ---
adjn n = 64 k = 3 j = 120 toter 0
--- rounded coeff ---
P0 = 1.000000000000000000000000000000000000000, /*
0x8000000000000000.0p-63 */
P1 = 1.000000000000000000000000000000000000000, /*
0x8000000000000000.0p-63 */
P2 = 0.5000000000000000000000000000000000000000, /*
0x8000000000000000.0p-64 */
P3 = 0.1666666666666666666310000000000000000000, /*
0xaaaaaaaaaaaaaaa8.0p-66 */
P4 = 0.04166666666666666666100000000000000000000, /*
0xaaaaaaaaaaaaaaa9.0p-68 */
P5 = 0.008333333333333338066950000000000000000000, /*
0x8888888888889e5d.0p-70 */
P6 = 0.001388888888888889569550000000000000000000, /*
0xb60b60b60b60cf28.0p-73 */
P7 = 0.0001984126984124767297770000000000000000000, /*
0xd00d00d00c013acd.0p-76 */
P8 = 0.00002480158730156209466100000000000000000000, /*
0xd00d00d00c1851e1.0p-79 */
P9 = 0.000002755731927361941412700000000000000000000, /*
0xb8ef1d304cd05f90.0p-82 */
P10 = 0.0000002755731927069719307580000000000000000000, /*
0x93f27dbffa11c00d.0p-85 */
P11 =
0.00000002505205082467052704390000000000000000000, /*
0xd7320ad84885e745.0p-89 */
P12 =
0.000000002087671071533106249630000000000000000000, /*
0x8f76b2a8ea87f9b3.0p-92 */
P13 = 1.60924147227467723196999999999999999999 E-10, /*
0xb0f01edf2f5fac20.0p-96 */
P14 = 1.14941936278242808142000000000000000000 E-11, /*
0xca353f02fa6b7741.0p-100 */
7.252e-24
|''''''''''''''''_''''''''''''''''''''_""_''''''''
'''''''''''''|
| " x x
|
| x _ x
_" |
| " x
" |
--------------_---------------_xxx"--------x------_----
---------
| _x : " _"
_ |
| : : _ x x
_ : |
| : " _ x
: |
| _ : "__x
x _ : x |
| : _ : x_
x : |
| : "
: " : |
|_ : xx
: : x |
|: :
_ : : x
|: x
" : :
|:: :
x : :
|:::
::
: _:
:|
: _
:|
:
:|
:
x|
:
|
-2.79e-23
_...........................................................
...|
-0.34658
0.34658
adjn n = 64 k = 3 toter 0
------------------------------------------------------------
----------------
(2) 113-bit precision, approximating exp().
------------------------------------------------------------
----------------
--- Simple Lagrange interpolation at Chebyshev points ---
3.239e-38
x''''x'''''x''''''_''''''''x''''''''"''''''''_''''''&qu
ot;'''''"''''"
: : :
: : x: : : |
|: : :: :: : _ _ : ::
:: : _|
|" : _ : : : " : :
" : : : _ : :|
|: : : : : : : : : : : : :
: : : : :|
|: : : : x : : : : : : : :
x : : : :|
|: : : : : " : : : : : :
" : : : : :|
| : : _ : : : " : : : :
" : : : x : : |
| : _ : : : : : : : : : : :
: : : _ : |
| : : : : : : : _ " " _
: : : : : : : |
| : : : : : : : : : : : : :
: : : : : |
``x`:`:``:```:``:````:```:```:````:```:```:````:``:```:``:`:
`x``
| : : : : : : : : : : : : :
: : : : : |
| : : : x : : : : : : : : :
: x : : : |
| : : : : x x : : : : : : x
_ : : : : |
| :: : : : : " : : : :
" : : : : :: |
| : : : : : : : x x : : :
: : : : |
| : :: : : : " : :
" : : : :: : |
| : _: :: : : : : : :
:: :: : |
| : :: :: :: :: ::
:: :" : |
| : : :_ _: :: :_
:: : : |
-3.13e-38
|..x....x....."........x.......xx......._......."&
quot;.....x...._..|
-0.34658
0.34658
--- Lagrange/Chebyshev with coeff adjusted to 113-bit
precision ---
adjn n = 113 k = 1 j = 120 toter 0
--- rounded coeff ---
P0 = 1.000000000000000000000000000000000000000, /*
0x10000000000000000000000000000.0p-112 */
P1 = 1.000000000000000000000000000000000000000, /*
0x10000000000000000000000000000.0p-112 */
P2 = 0.4999999999999999999999999999999999500000, /*
0x1ffffffffffffffffffffffffffff.0p-114 */
P3 = 0.1666666666666666666666666666666666600000, /*
0x15555555555555555555555555555.0p-115 */
P4 = 0.04166666666666666666666666666668415500000, /*
0x155555555555555555555555560af.0p-117 */
P5 = 0.008333333333333333333333333333336938300000, /*
0x11111111111111111111111111a6d.0p-119 */
P6 = 0.001388888888888888888888888886437923800000, /*
0x16c16c16c16c16c16c16c15fa9389.0p-122 */
P7 = 0.0001984126984126984126984126980627689100000, /*
0x1a01a01a01a01a01a01a0191e8217.0p-125 */
P8 = 0.00002480158730158730158730175735685579700000, /*
0x1a01a01a01a01a01a01ad93230fd1.0p-128 */
P9 = 0.000002755731922398589065255750604381149600000, /*
0x171de3a556c7338faac28603831fb.0p-131 */
P10 = 0.0000002755731922398589065187949851278033800000, /*
0x127e4fb7789f5c72e69d53c0ac1d3.0p-134 */
P11 =
0.00000002505210838544171877444493611961585000000, /*
0x1ae64567f544e38fdb412a0eb7de0.0p-138 */
P12 =
0.000000002087675698786810064988525091623603400000, /*
0x11eed8eff8d8981cadc22e13e3610.0p-141 */
P13 = 1.60590438368216158655710381729946340000 E-10, /*
0x16124613a86d09fa08f48ca8c4d5e.0p-145 */
P14 = 1.14707455977270929588227069467205860000 E-11, /*
0x193974a8c07640267fec271f370a0.0p-149 */
P15 = 7.6471637318180850420027983995938208999 E-13, /*
0x1ae7f3e733b16c573381f7ce23115.0p-153 */
P16 = 4.77947733504221136080232269675803310000 E-14, /*
0x1ae7f3e773e8b2e2b7a23c9a525ff.0p-157 */
P17 = 2.81145725589067097361019490883767970000 E-15, /*
0x1952c7706c73b79a1577d9ebcf121.0p-161 */
P18 = 1.56191903751316458500805670560905000000 E-16, /*
0x168276d284fb449ca1ff48bfbb891.0p-165 */
P19 = 8.2206265778415557385251561200634072999 E-18, /*
0x12f499f725c9f790053993c3f2b17.0p-169 */
P20 = 4.11616915419431123065482372671396870000 E-19, /*
0x1e5f394471e23048e2a82e8f22b34.0p-174 */
P21 = 1.9600698694144411728713343986 E-20, /*
0x1723f29b62196e8abacb3c5c40000.0p-178 */
9.874e-37
"''''''''''''''''''''''''''''''''''''''''''''''''''''''
''''''''|
|_
|
|
|
| _
|
| :
|
| :
|
| "x
|
| "
|
| _
|
|
|
| _
|
|
|
| "x_
|
| x
|
| x
|
| _
_x
|
x_ x |
| "_
xx _" x |
| """x
" "x |
| x
x"x___" |
| " __ __
_" |
0
,,,,,,,,,,,,,,,,,,,,,,"xxx",,"x__x",,&qu
ot;x__x,,,,,,,,,,,,,,,,,,,,,,
-0.34658
0.34658
adjn n = 113 k = 1 toter 0
------------------------------------------------------------
----------------
(3) 64-bit precision, approximating nqexp() := x * (exp(x) +
1) / (exp(x) - 1)
exp(x) can be recovered from nqexp(), and the latter
has much better
numerical behaviour. E.g., it needs only 7 coeffs
instead of 15.
The Remez search completed and equalized all the
extrema. It's
interesting that The Chevyshev approximation equalized
the extrema
for exp() but not for nqexp().
------------------------------------------------------------
----------------
--- Simple Lagrange interpolation at Chebyshev points ---
2.327e-21
"''''''''''''''''''''''''''''''''''''''''''''''''''''''
''''''''"
:
:
: _"
"_ :
: ::
:: :
: ::
:: :
: : : ""
"" : : :
: : : x _ _
x : : :
: : x : : :
: x : :
: : : : : :
: : : :
: : : _ " __ __
" _ : : :
: : : : " "x_
_x" " : : : :
`:_```:```:``````x``````"``````""``````"
``````x``````:```:```_:`
|:: : : x x
: : ::|
|:: _ _ x _ _ x
_ _ ::|
|:: : : _ _ _ _
: : ::|
|:: : : "
" : : ::|
|: : :
: : :|
|: : x
x : :|
|: x
x :|
|: x
x :|
|:
:|
-2.23e-21
|_..........................................................
.._|
-0.34658
0.34658
--- Lagrange/Chebyshev with coeff adjusted to 64-bit
precision ---
adjn n = 64 k = 13 toter 0
C-1 = 2.000000000000000000000000000000000000000, /*
0x8000000000000000.0p-62 */
C0 = 0.1666666666666666660071100000000000000000, /*
0xaaaaaaaaaaaaaa7a.0p-66 */
C1 = -0.002777777777777671015759280000000000000000, /*
-0xb60b60b60b5904a2.0p-72 */
C2 = 0.00006613756613181145420083840000000000000000, /*
0x8ab355dfd4d5cfe3.0p-77 */
C3 = -0.000001653439010798086927524850000000000000000, /*
-0xddebbb58747e58c9.0p-83 */
C4 = 0.00000004175172569487852482302880000000000000000, /*
0xb35282048125b16f.0p-88 */
C5 =
-0.000000001045797728772232364397340000000000000000, /*
-0x8fbbbc85f0e61575.0p-93 */
1.275e-21
"''"''''''''"''''''''''''"x''''''''''x&q
uot;''''''''''''"''''''''"''"
: : x _ _
x : :
: :: " : : "
" : : " :: :
:
: : : : : : x: :
: : : : : : x x :
: : : : :
: : : : x " : :
" x : : : :
: : : _ : : : : :
: _ : : :
: : : : : : " "
: : : : : :
: " : : : : :
: : : " :
: : : : : x x x x :
: : : :
|:: : : _ : __ : _
: : ::|
`::``_```:`````:``````:``````````````````:``````:`````:```_`
`::`
|:: : : : : : :
: : ::|
|:: : x : : : :
x : ::|
|: : : : "
" : : : :|
|: : : : : : :
: : :|
|: : : " : :
" : : :|
|: : : : : : :
: : :|
|: : : : "
" : : : :|
|: " " : :
: : " " :|
|_ " :
: " _|
-1.28e-21
|......_.........._"........................"_....
......_......|
-0.34658
0.34658
adjn n = 64
------------------------------------------------------------
----------------
(4) 113-bit precision, approximating nqexp(). The Remez
search was stable
but made negative progress after the C3 (?) term so I
stopped it early.
It had only almost equalized the extrema when stopped.
------------------------------------------------------------
----------------
--- Simple Lagrange interpolation at Chebyshev points ---
4.567e-37
|''''''"'''''''''''''''''''''''''''''''''''''''''''''''
'"''''''|
| " :
: " |
| : : _
_ : : |
| :: : : :_
_: : : :: |
| :: : : ::
:: : : :: |
| x: : " : : x x
: : " : |
| :: _ : : : x x :
: : _ :: |
| :: : : " : : "
" : : " : : :: |
| :: : : : _ : __ __ : _
: : : :: |
--:-:-:-:---:--:---x---x---_--"xx"--_---x---x---:-
-:---:-:-:-:--
| : : : : : : : : :
: : : : : |
| : : : : : : : x " "
x : : : : : : : |
| : :: : : : : " "
: : : : :: : |
|: :: : : x "
" x : : :: :|
|: :: " "
" " :: :|
|: :: : : "
" : : :: :|
|: _: :
: :_ :|
|: : _
_ : :|
|: "
" :|
|:
:|
|"
"|
-5.88e-37
_...........................................................
..._
-0.34658
0.34658
--- Lagrange/Chebyshev with coeff adjusted to 113-bit
precision ---
adjn n = 113 k = 3 j = 120 toter 0
--- rounded coeff ---
C-1 = 2.000000000000000000000000000000000000000, /*
0x10000000000000000000000000000.0p-111 */
C0 = 0.1666666666666666666666666666666662000000, /*
0x15555555555555555555555555542.0p-115 */
C1 = -0.002777777777777777777777777777553326200000, /*
-0x16c16c16c16c16c16c16c16b85140.0p-121 */
C2 = 0.00006613756613756613756613752850992610100000, /*
0x11566abc011566abc0114a7e047ee.0p-126 */
C3 = -0.000001653439153439153439150292458494169500000, /*
-0x1bbd779334ef0aac6588d54fb28a1.0p-132 */
C4 = 0.00000004175351397573619780544017497215914800000, /*
0x166a8f2bf70ebd9cab06da3dfcbe9.0p-137 */
C5 =
-0.000000001056838027737493956831265550261534200000, /*
-0x122805d6442668558fc0ca2fcc585.0p-142 */
C6 = 2.67650730612754829770358952948658760000 E-11, /*
0x1d6db2c4e01fe52c6eb7674cf86f8.0p-148 */
C7 = -6.7793605801133740547903998874808967000 E-13, /*
-0x17da4e1ebc3556f34306a709a13c6.0p-153 */
C8 = 1.71721130775699965239461146049180330000 E-14, /*
0x1355864cf343fe6e71f6644193ab0.0p-158 */
C9 = -4.34912151080995129543227548203712180000 E-16, /*
-0x1f56b690b4b95dd505b8a8e6666b1.0p-164 */
C10 = 1.08203893915267000322982006799182099999 E-17, /*
0x18f33b03a36ff97329d6990c65e9c.0p-169 */
3.741e-37
|''''''"'''''''''''''''''''''''''''''''''''''''''''''''
'"''''''|
| : _
_ : |
| :: : _ " "
_ : :: |
| :: :_ ":
:" _: : : |
| _ x : : : :: " "
" " :: : : : x _ |
| : : : : : : : : : : : : :
: : : : : |
| :" : : _ : : : : : : : :
: : x _ : ": |
| :: : " : : : x : : : : x
: : : : : :: |
| :: : : : : : : : " " :
: : : : : : :: |
| :: : : : : " : x : : x :
" : : : : :: |
| :: : : : : : : : :: : : :
: :: : :: |
--::-:---::---x---:---:---:----""----:---:---:---x
---::---:-::--
| :: : :: : : : : : : :
: :: : :: |
|: : : :" : : : : : :
: : "_ : : :|
|: :: " : : _ : : _
: : :: :|
|: :: : : : : : : : :
:: :|
|: :: : x : " "
: x : :: :|
|: :_ : : : : : : : :
_: :|
|: :: _: :: :: :_
:: :|
|: : : _: :_ :
: :|
|: : _ x x _
: :|
-3.37e-37
_x.."..................................................
...."..x_
-0.34658
0.34658
adjn n = 113 k = 3 toter 0
---
Attempts to use 64-bit coeffs for the 113-bit approximation
were
unsuccessful -- the error was about 5e-24, but about
2^-(113+5) =
3e-36 is needed. The interval is just too wide for this to
work.
Probably only the first couple of coeffs need to have full
precision.
Bruce
_______________________________________________
freebsd-standards freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sta
ndards
To unsubscribe, send any mail to
"freebsd-standards-unsubscribe freebsd.org"
|
|
[1-9]
|
|