[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# Issue #97, Colander page 134: floating-point assembly and disassembly

• To: Common-Lisp at sail
• Subject: Issue #97, Colander page 134: floating-point assembly and disassembly
• From: MOON at SCRC-TENEX
• Date: Thu, 30 Sep 1982 09:55:00 -0000

```I am not completely happy with the FLOAT-FRACTION, FLOAT-EXPONENT, and
SCALE-FLOAT functions in the Colander edition.  At the meeting in August I
was assigned to make a proposal.  I am slow.

A minor issue is that the range of FLOAT-FRACTION fails to include zero (of
course it has to), and is inclusive at both ends, which means that there
are two possible return values for some numbers.  I guess that this ugliness
has to stay because some implementations require this freedom for hardware
reasons, and it doesn't make a big difference from a numerical analysis point
of view.  My proposal is to include zero in the range and to add a note about
two possible values for numbers that are an exact power of the base.

A more major issue is that some applications that break down a flonum into
a fraction and an exponent, or assemble a flonum from a fraction and an
exponent, are best served by representing the fraction as a flonum, while
others are best served by representing it as an integer.  An example of
the former is a numerical routine that scales its argument into a certain
range.  An example of the latter is a printing routine that must do exact
integer arithmetic on the fraction.

In the agenda for the August meeting it was also proposed that there be
a function to return the precision of the representation of a given flonum
(presumably in bits); this would be in addition to the "epsilon" constants
described on page 143 of the Colander.

A goal of all this is to make it possible to write portable numeric functions,
such as the trigonometric functions and my debugged version of Steele's
totally accurate floating-point number printer.  These would be portable
to all implementations but perhaps not as efficient as hand-crafted routines
that avoided bignum arithmetic, used special machine instructions, avoided
computing to more precision than the machine really has, etc.

Proposal:

SCALE-FLOAT x e -> y

y = (* x (expt 2.0 e)) and is a float of the same type as x.
SCALE-FLOAT is more efficient than exponentiating and multiplying, and
also cannot overflow or underflow unless the final result (y) cannot
be represented.

x is also allowed to be a rational, in which case y is of the default
type (same as the FLOAT function).

[x being allowed to be a rational can be removed if anyone objects.  But
note that this function has to be generic across the different float types
in any case, so it might as well be generic across all number types.]

UNSCALE-FLOAT y -> x e
The first value, x, is a float of the same type as y.  The second value, e,
is an integer such that (= y (* x (expt 2.0 e))).

The magnitude of x is zero or between 1/b and 1 inclusive, where b is the
radix of the representation: 2 on most machines, but examples of 8 and
16, and I think 4, exist.  x has the same sign as y.

It is an error if y is a rational rather than a float, or if y is an
infinity.  (Leave infinity out of the Common Lisp manual, though).
It is not an error if y is zero.

FLOAT-MANTISSA x -> f
FLOAT-EXPONENT x -> e
FLOAT-SIGN x -> s
FLOAT-PRECISION x -> p
f is a non-negative integer, e is an integer, s is 1 or 0.
(= x (* (SCALE-FLOAT (FLOAT f x) e) (IF (ZEROP S) 1 -1))) is true.
It is up to the implementation whether f is the smallest possible integer
(zeros on the right are removed and e is increased), or f is an integer with
as many bits as the precision of the representation of x, or perhaps a "few"
more.  The only thing guaranteed about f is that it is non-negative and
the above equality is true.

f is non-negative to avoid problems with minus zero.  s is 1 for minus zero
even though MINUSP is not true of minus zero (otherwise the FLOAT-SIGN function
would be redundant).

p is an integer, the number of bits of precision in x.  This is a constant
for each flonum representation type (except perhaps for variable-precision
"bigfloats").

[I am amenable to converting these four functions into one function that
returns four values if anyone can come up with a name.  EXPLODE-FLOAT is
the best so far, and it's not very good, especially since the traditional
EXPLODE function has been flushed from Common Lisp.  Perhaps DECODE-FLOAT.]

[I am amenable to adding a function that takes f, e, and s as arguments
and returns x.  It might be called ENCODE-FLOAT or MAKE-FLOAT.  It ought to
take either a type argument or an optional fourth argument, the way FLOAT
takes an optional second argument, which is an example of the type to return.]

FTRUNC x -> fp ip
The FTRUNC function as it is already defined provides the fraction-part and
integer-part operations.

These functions exist now in the Lisp machines, with different names and slightly
different semantics in some cases.  They are very easy to write.

Comments?  Suggestions for names?

```