[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Why Vectors? and taking a cue from SYSLISP
- To: fahlman at CMU-10A
- Subject: Why Vectors? and taking a cue from SYSLISP
- From: Jon L White <JONL at MIT-MC>
- Date: Mon, 15 Mar 1982 03:08:00 -0000
- Cc: COMMON-LISP at SU-AI
This note is probably the only comment I'll have to say during this round
(via mail) on the "Vectors and Arrays" issue. There is so much in
the mails on it now that I think we should have had more face-to-face
discussions, preferably by a small representative group which could
present its recommendations.
Originally, the NIL proposal wanted a distinction between
ARRAYs, with potentially hairy access methods, and simple
linear index-accessed data, which we broke down into three
prominent cases of VECTORs of "Q"s, STRINGs of characters,
and BITStrings of random data. The function names VREF,
CHAR, and BIT/NIBBLE are merely access methods, for there is
a certain amount of "mediation" that has to be done upon
access of a sequence with packed elements. Admittedly, this
distinction is non-productive when micro-code can select the
right access method at runtime (based on some internal structure
of the datum), but it is critical for efficient open-compilation
on stock hardware. So much for history and rationale.
Some of the discussion on the point now seems to be centered
around just exactly how these data structures will be implemented,
and what consequences that might have for the micro-coded case.
E.g., do we need two kinds of VECTORs? I don't think so, but in
order to implement vectors to have the "growabililty" property it may
be better to drop the data format of the existing NIL implementations
(where the length count is stored in the word preceeding the data)
For instance, if vectors (all kinds: Q, character, and bit) are
implemented as a fixed word header with a count/active component and
an address component then the following advantages(+)/disadvantages(-)
can be seen:
1+) Normal user code has type-safe operations on structured data
(at least in interpreter and close-compiled code)
2+) "system" type code can just extract the address part, and
deal with the data words almost as if the code were written
in a machine-independent systems language (like "C"?) I think
the SYSLISP approach of the UTAH people may be somewhat like this.
3-) Access to an individual element, by the normal user-level functions,
is slower by one memory reference; but this may be of lesser
importance if most "heavy" usage is through system functions like
STRING-REVERSE-SEARCH. There is also room for optimization
by clever compilers to bypass most of the "extra" time.
4-) use of "addresses", rather than typed data is a loophole in
the memory integrity of the system; but who needs to protect
the system programmer from himself anyway.
5+) hardware forwarding pointers wouldn't be necessary to make
growability and sharability work -- they work by updating the
address and length components of the vector header; true, there
would not be full compatibility with forwarding-pointer
implementations (installing a new "address" part loses some
updates that wouldn't be lost under forwarding pointers), but
at least NSUBSTRING and many others could work properly.
6-) without micro-code, it would probably be a loss to permit random
addresses (read, locatives) into the middle of vectors; thus
sharability would probably require a little extra work somewhere so
that the GC wouldn't get fould up. Shared data may need to be
identified. This can be worked out.
7+) even "bibop" implementations with generally-non-relocating GC can
implement these kinds of vectors (that is, with "headers") without
8+) it will be easier to deal with chunks of memory allocated by a
host (non-Lisp) operating system this way; e.g. a page, whereby
any "header" for the page need not appear at any fixed offset
from the beginning of the page.
As far as I can see, retaining the NIL idea of having header information
stored at fixed offset from the first address primarily alleviates point 3
above. It also permits two kinds of vectors (one with and one without
header information) to be implemented so that the same open-coded accessing
sequence will work for both. I think we may be going down a wrong track by
adhering to this design, which is leading us to two kinds of vectors.
The SYSLISP approach, with possibly additional "system" function names
for the various access methods, should be an attractive alternative.
[DLW's note of
Date: Sunday, 14 March 1982, 18:27-EST
From: Daniel L. Weinreb <dlw at MIT-AI>
Subject: Re: Vectors and Arrays
To: FAHLMAN at CMU-20C
seems to indicate his confusion about the state of this question -- it
does need to be cleared up.]