[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why Vectors? and taking a cue from SYSLISP



This note is probably the only comment I'll have to say  during this round
(via mail) on the "Vectors and Arrays" issue.  There is so much in
the mails on it now that I think we should have had more face-to-face
discussions, preferably by a small representative group which could
present its recommendations.

Originally, the NIL proposal wanted a distinction between
ARRAYs, with potentially hairy access methods, and simple
linear index-accessed data, which we broke down into three
prominent cases of VECTORs of "Q"s, STRINGs of characters,
and BITStrings of random data.  The function names VREF,
CHAR, and BIT/NIBBLE are merely access methods, for there is
a certain amount of "mediation" that has to be done upon
access of a sequence with packed elements.  Admittedly, this
distinction is non-productive when micro-code can select the
right access method at runtime (based on some internal structure
of the datum), but it is critical for efficient open-compilation
on stock hardware.  So much for history and rationale.

Some of the discussion on the point now seems to be centered
around just exactly how these data structures will be implemented,
and what consequences that might have for the micro-coded case.
E.g., do we need two kinds of VECTORs?  I don't think so, but in
order to implement vectors to have the "growabililty" property it may
be better to drop the data format of the existing NIL implementations
(where the length count is stored in the word preceeding the data)
For instance, if vectors (all kinds: Q, character, and bit) are
implemented as a fixed word header with a count/active component and
an address component then the following advantages(+)/disadvantages(-)
can be seen:
  1+) Normal user code has type-safe operations on structured data
      (at least in interpreter and close-compiled code)
  2+) "system" type code can just extract the address part, and
      deal with the data words almost as if the code were written
      in a machine-independent systems language (like "C"?)  I think
      the SYSLISP approach of the UTAH people may be somewhat like this.
  3-) Access to an individual element, by the normal user-level functions,
      is slower by one memory reference;  but this may be of lesser
      importance if most "heavy" usage is through system functions like
      STRING-REVERSE-SEARCH.  There is also room for optimization
      by clever compilers to bypass most of the "extra" time.
  4-) use of "addresses", rather than typed data is a loophole in
      the memory integrity of the system;  but who needs to protect
      the system programmer from himself anyway.
  5+) hardware forwarding pointers wouldn't be necessary to make
      growability and sharability work -- they work by updating the
      address and length components of the vector header;  true, there
      would not be full compatibility with forwarding-pointer
      implementations (installing a new "address" part loses some
      updates that wouldn't be lost under forwarding pointers), but
      at least NSUBSTRING and many others could work properly.
  6-) without micro-code, it would probably be a loss to permit random
      addresses (read, locatives) into the middle of vectors; thus
      sharability would probably require a little extra work somewhere so
      that the GC wouldn't get fould up.  Shared data may need to be
      identified.  This can be worked out.
  7+) even "bibop" implementations with generally-non-relocating GC can
      implement these kinds of vectors (that is, with "headers") without
      much trouble.
  8+) it will be easier to deal with chunks of memory allocated by a
      host (non-Lisp) operating system this way;  e.g. a page, whereby
      any "header" for the page need not appear at any fixed offset
      from the beginning of the  page.

As far as I can see, retaining the NIL idea of having header information
stored at fixed offset from the first address primarily alleviates point 3
above.  It also permits two kinds of vectors (one with and one without
header information) to be implemented so that the same open-coded accessing
sequence will work for both.  I think we may be going down a wrong track by
adhering to this design, which is leading us to two kinds of vectors.
The SYSLISP approach, with possibly additional "system" function names
for the various access methods, should be an attractive alternative.
[DLW's note of
    Date: Sunday, 14 March 1982, 18:27-EST
    From: Daniel L. Weinreb <dlw at MIT-AI>
    Subject: Re: Vectors and Arrays
    To: FAHLMAN at CMU-20C
seems to indicate his confusion about the state of this question -- it
does need to be cleared up.]