[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue #97, Colander page 134: floating-point assembly and disassembly



I am not completely happy with the FLOAT-FRACTION, FLOAT-EXPONENT, and
SCALE-FLOAT functions in the Colander edition.  At the meeting in August I
was assigned to make a proposal.  I am slow.

A minor issue is that the range of FLOAT-FRACTION fails to include zero (of
course it has to), and is inclusive at both ends, which means that there
are two possible return values for some numbers.  I guess that this ugliness
has to stay because some implementations require this freedom for hardware
reasons, and it doesn't make a big difference from a numerical analysis point
of view.  My proposal is to include zero in the range and to add a note about
two possible values for numbers that are an exact power of the base.

A more major issue is that some applications that break down a flonum into
a fraction and an exponent, or assemble a flonum from a fraction and an
exponent, are best served by representing the fraction as a flonum, while
others are best served by representing it as an integer.  An example of
the former is a numerical routine that scales its argument into a certain
range.  An example of the latter is a printing routine that must do exact
integer arithmetic on the fraction.

In the agenda for the August meeting it was also proposed that there be
a function to return the precision of the representation of a given flonum
(presumably in bits); this would be in addition to the "epsilon" constants
described on page 143 of the Colander.

A goal of all this is to make it possible to write portable numeric functions,
such as the trigonometric functions and my debugged version of Steele's
totally accurate floating-point number printer.  These would be portable
to all implementations but perhaps not as efficient as hand-crafted routines
that avoided bignum arithmetic, used special machine instructions, avoided
computing to more precision than the machine really has, etc.

Proposal:

SCALE-FLOAT x e -> y

  y = (* x (expt 2.0 e)) and is a float of the same type as x.
  SCALE-FLOAT is more efficient than exponentiating and multiplying, and
  also cannot overflow or underflow unless the final result (y) cannot
  be represented.

  x is also allowed to be a rational, in which case y is of the default
  type (same as the FLOAT function).

  [x being allowed to be a rational can be removed if anyone objects.  But
   note that this function has to be generic across the different float types
   in any case, so it might as well be generic across all number types.]

UNSCALE-FLOAT y -> x e
  The first value, x, is a float of the same type as y.  The second value, e,
  is an integer such that (= y (* x (expt 2.0 e))).

  The magnitude of x is zero or between 1/b and 1 inclusive, where b is the
  radix of the representation: 2 on most machines, but examples of 8 and
  16, and I think 4, exist.  x has the same sign as y.

  It is an error if y is a rational rather than a float, or if y is an
  infinity.  (Leave infinity out of the Common Lisp manual, though).
  It is not an error if y is zero.

FLOAT-MANTISSA x -> f
FLOAT-EXPONENT x -> e
FLOAT-SIGN x -> s
FLOAT-PRECISION x -> p
  f is a non-negative integer, e is an integer, s is 1 or 0.
  (= x (* (SCALE-FLOAT (FLOAT f x) e) (IF (ZEROP S) 1 -1))) is true.
  It is up to the implementation whether f is the smallest possible integer
  (zeros on the right are removed and e is increased), or f is an integer with
  as many bits as the precision of the representation of x, or perhaps a "few"
  more.  The only thing guaranteed about f is that it is non-negative and
  the above equality is true.

  f is non-negative to avoid problems with minus zero.  s is 1 for minus zero
  even though MINUSP is not true of minus zero (otherwise the FLOAT-SIGN function
  would be redundant).

  p is an integer, the number of bits of precision in x.  This is a constant
  for each flonum representation type (except perhaps for variable-precision
  "bigfloats").

  [I am amenable to converting these four functions into one function that
  returns four values if anyone can come up with a name.  EXPLODE-FLOAT is
  the best so far, and it's not very good, especially since the traditional
  EXPLODE function has been flushed from Common Lisp.  Perhaps DECODE-FLOAT.]

  [I am amenable to adding a function that takes f, e, and s as arguments
   and returns x.  It might be called ENCODE-FLOAT or MAKE-FLOAT.  It ought to
   take either a type argument or an optional fourth argument, the way FLOAT
   takes an optional second argument, which is an example of the type to return.]

FTRUNC x -> fp ip
  The FTRUNC function as it is already defined provides the fraction-part and
  integer-part operations.

These functions exist now in the Lisp machines, with different names and slightly
different semantics in some cases.  They are very easy to write.

Comments?  Suggestions for names?