[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Revised array proposal (long)

Here is a revision of my array proposal, fixed up in response to some of
the feedback I've received.  See if you like it any better than the
original.  In particular, I have explictly indicated that certain
redundant forms such as MAKE-VECTOR should be retained, and I have
removed the :PRINT keyword, since I now believe that it causes more
trouble than it is worth.  A revised printing proposal appears at the
end of the document.


Arrays can be 1-D or multi-D.  All arrays can be created by MAKE-ARRAY
and can be accessed with AREF.  Storage is done via SETF of an AREF.
The term VECTOR refers to any array of exactly one dimension.
Vectors are special, in that they are also sequences, and can be
referenced by ELT.  Also, only vectors can have fill pointers.

Vectors can be specialized along several distinct axes.  The first is by
the type of the elements, as specified by the :ELEMENT-TYPE keyword to
MAKE-ARRAY.  A vector whose element-type is STRING-CHAR is referred to
as a STRING.  Strings, when they print, use the "..." syntax; they also
are the legal inputs to a family of string-functions, as defined in the
manual.  A vector whose element-type is BIT (alias (MOD 2)), is a
BIT-VECTOR.  These are special because they form the set of legal inputs
to the boolean bit-vector functions.  (We might also want to print them
in a strange way -- see below.)

Some implementations may provide a special, highly efficient
representation for simple vectors.  A simple vector is (of course) 1-D,
cannot have a fill pointer, cannot be displaced, and cannot be altered
in size after its creation.  To get a simple vector, you use the :SIMPLE
keyword to MAKE-ARRAY with a non-null value.  If there are any
conflicting options specified, an error is signalled.  If an
implementation does not support simple vectors, this keyword/value is
ignored except that the error is still signalled on inconsistent cases.

We need a new set of type specifiers for simple things: SIMPLE-VECTOR,
SIMPLE-STRING, and SIMPLE-BIT-VECTOR, with the corresponding
type-predicate functions.  Simple vectors are referenced by AREF in the
usual way, but the user may use THE or DECLARE to indicate at
compile-time that the argument is simple, with a corresponding increase
in efficiency.  Implementations that do not support simple vectors
ignore the "simple" part of these declarations.

Strings (simple or non-simple) self-eval; all other arrays cause an
error when passed to EVAL.  EQUAL descends into strings, but not
into any other arrays.  EQUALP descends into arrays of all kinds,
comparing the corresponding elements with EQUALP.  EQUALP is false
if the array dimensions are not the same, but it is not sensitive to
the element-type of the array, whether it is simple, etc.  In comparing
the dimensions of vectors, EQUALP uses the length from 0 to the fill
pointer; it does not look at any elements beyond the fill pointer.

The set of type-specifiers required for all of this is ARRAY, VECTOR,
Each of these has a corresponding type-P predicate, and each can be
specified in list from, along with the element-type and dimension(s).

MAKE-ARRAY takes the following keywords: :ELEMENT-TYPE, :INITIAL-VALUE,
:INITIAL-CONTENTS, :FILL-POINTER, and :SIMPLE.  There is still some
discussion as to whether we should retain array displacement, which

The following functions are redundant, but should be retained for
clarity and emphasis in code: MAKE-VECTOR, MAKE-STRING, MAKE-BIT-VECTOR.
MAKE-VECTOR takes the same keywords as MAKE-ARRAY, but can only take a
single integer as the dimension argument.  MAKE-STRING and
MAKE-BIT-VECTOR are like MAKE-VECTOR, but do not take the :ELEMENT-TYPE
keyword, since the element-type is implicit.  Similarly, we should
retain the forms VREF, CHAR, and BIT, which are identical in operation
to AREF, but which declare their aray argument to be VECTOR, STRING, or
BIT-VECTOR, respectively.

If the :SIMPLE keyword is not specified to MAKE-ARRAY or related forms,
the default is NIL.  However, vectors produced by random forms such as
CONCATENATE are simple, and vectors created when the reader sees #(...)
or "..." are also simple.

As a general rule, arrays are printed in a simple format that, upon
being read back in, produces a form that is EQUALP to the original.
However, some information may be lost in the printing process:
element-type restrictions, whether a vector is simple, whether it has a
fill pointer, whether it is displaced, and the identity of any element
that lies beyond the fill pointer.  This choice was made to favor ease
of interactive use; if the user really wants to preserve in printed form
some complex data structure containing non-simple arrays, he will have
to develop his own printer.

A switch, SUPPRESS-ARRAY-PRINTING, is provided for users who have lots
of large arrays around and don't want to see them trying to print.  If
non-null, this switch causes all arrays except strings to print in a
short, non-readable form that does not include the elements:
#<array-...>.  In addition, the printing of arrays and vectors (but not
of strings) is subject to PRINLEVEL and PRINLENGTH.

Strings, simple or otherwise, print using the "..."  syntax.  Upon
read-in, the "..." syntax creates a simple string.

Bit-vectors, simple or otherwise, print using the #"101010..." syntax.
Upon read-in, this format produces a simple bit-vector.  Bit vectors do

All other vectors print out using the #(...) syntax, observing
in as a simple vector of element-type T.

All other arrays print out using the syntax #nA(...), where n is the
number of dimensions and the list is a nest of sublists n levels deep,
with the array elements at the deepest level.  This form observes
in as an array of element-type T.

Query: I am still a bit uneasy about the funny string-like syntax for
bit vectors.  Clearly we need some way to read these in that does not
turn into a type-T vector.  An alternative might be to allow #(...) to
be a vector of element-type T, as it is now, but to take the #n(...)
syntax to mean a vector of element-type (MOD n).  A bit-vector would
then be #2(1 0 1 0...) and we would have a parallel notation available
for byte vectors, 32-bit word vectors, etc.  The use of the #n(...)
syntax to indicate the length of the vector always struck me as a bit
useless anyway.  One flaw in this scheme is that it does not extend to
multi-D arrays.  Before someone suggests it, let me say that I don't
like #nAm(...), where n is the rank and m is the element-type -- it
would be too hard to remember which number was which.  But even with
this flaw, the #n(...) syntax might be useful.