[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Vectors and Arrays



There is yet another rather fundamental issue that we need to discuss:
how to handle vectors and arrays.  It is not clear to me what, if
anything, was decided at the November meeting.  There is a line in the
"Decisions" document indicating that henceforth vector is to be a
subtype of array, but this could mean any number of things, some of them
reasonable and some of them not.  Let me try briefly to spell out the
issues as I see them and to propose a possible solution.

First, let me explain the rationale for making vectors and arrays
distinct types in the Swiss Cheese edition.

In non-microcoded implementations (which Common Lisp MUST accommodate if
it is to be at all common), it is important to have a very simple vector
data type for quick access.  Features that are only used occasionally
and that make access more expensive should not be included in the
simplest kind of vector.  That means leaving out fill pointers and the
ability to expand vectors after they are allocated, since the latter
requires an extra level of indirection or some sort of forwarding
pointer.  These simple vectors are referenced with VREF and VSET, which
tip off the compiler that the vector is going to be a simple one.
Bit-vectors and strings must also be of this simple form, for the same
reason: they need to be accessed very efficiently.

Given a vector data type, it is very straightforward to build arrays
into the system.  An array is simply a record structure (built from a
vector of type T) that contains slots for the data vector (or string),
the number of indices, the range of each index, a fill pointer, perhaps
some header slots, and so on.  The actual data is in a second vector.
Arrays are inherently more expensive to reference (using AREF and ASET),
and it seems to me that this is where you want to put the frills.  The
extra level of indirection makes it possible to expand the array by
allocating a new data vector; the expanded array (meaning the header
vector) is EQ to the original.  A fill pointer adds negligible expense
here, and makes sense since the array is able to grow.  (To provide fill
pointers without growability is pretty ugly.)

So, the original proposal, as reflected in Swiss Cheese, was that
vectors and arrays be separate types, even if the array is 1-D.  The
difference is that arrays can be expanded and can have fill pointers and
headers, while vectors cannot.  Strings and bit-vectors would be
vectors; if you want the added hair, you create a 1-D array of bits or
of characters.  VREF only works on vectors and can therefore be
open-coded efficiently; there is no reason why AREF should not work on
both types, but the array operations that depend on the fancy features
will only work on arrays.

The problem is that the Lisp Machine people have already done this the
opposite way: they implement arrays as the fundamental entity, complete
with headers, fill pointers, displacement, and growability.  There is no
simpler or cheaper form of vector-like object available on their system.
(I think that this is a questionable decision, even for a microcoded
system, but this is not the forum in which to debate that.  The fact
remains that their view of arrays is woven all through Zetalisp
and they evidently do not want to change it.)

Now, if the Lisp Machine people really wanted to, they could easily
implement the simpler kind of vector using their arrays.  There would
simply be a header bit in certain 1-D arrays that marks these as vectors;
arrays so marked would not be growable and could into be given headers or
fill pointers.  Those things that we would have called vectors in the
original scheme would have this bit set, and true arrays would not.

I was not at the November meeting, but I gather that the Lisp Machine
folks rejected this suggestion -- why do extra work just to break
certain features that you are already paying for and that, in the case
of strings, are already being used in some places?  The position stated
by Moon was that there would be no distinction between vectors and any
other 1-D arrays in Zetalisp.  However, if we simply merge these types
throughout Common Lisp, the non-microcoded implementations are screwed.

Could everyone live with the following proposal?

1. Vector is a subtype of Array.  String and Bit-Vector are subtypes of
Vector.

2. AREF and ASET work for all arrays (including the subtypes listed
above).  The generic sequence operators work for all of the above, but
only for 1-D arrays.  (I believe that the proposal to treat multi-D
arrays as sequences was voted down.)

3. VREF and VSET work only for vectors, including Strings and
Bit-Vectors.

4. We need a new predicate (call it "EXTENSIBLEP"?).  If an array is
extensible, then one can grow it, give it a fill pointer, displace it,
etc.

5. In the Common Lisp spec, we say that vectors and their subtypes
(including strings) are not, in general, extensible.  The arrays
created by MAKE-ARRAY are extensible, at least as a default.  Thus, in
vanilla Common Lisp, users could choose between fast, simple vectors and
strings and the slower, extensible 1-D arrays.

6. Implementations (including Zetalisp) will be free to make vectors
extensible.  In such implementations, all arrays would be extensible and
there would be no difference between vectors and 1-D arrays.
Implementations that take this step would be upward-compatible supersets
of Common Lisp.  Code from vanilla implementations can be ported to
Zetalisp without change, and nothing will break; the converse is not
true, of course.  This is just one of the ways in which Zetalisp is a
superset, so we haven't really given anything up by allowing this
flexibility.

7. It would be nice if the superset implementations provided a
"compatibility mode" switch which would signal a (correctable) runtime
error if a vector is used in an extensible way.  One could turn this on
in order to debug code that is meant to be portable to all Common Lisp
implementations.  This, of course, is optional.
   --------