[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Jan 1 cs proposal comments



>>   From: "David A. Moon" <Moon@SCRC-STONY-BROOK.ARPA>
>>   Subject: Comments on the Character proposal dated January 1, 1989
>>
>>   Page 6 -- *all-registry-names* should be renamed to
>>   *all-character-registry-names*; the word "registry" by itself
>>   is too general.

I made this change to the latest version of the proposal.

>>
>>   Page 9 -- the fourth bullet requires a defined total ordering of all
>>   characters.  This seems unnecessary, and is impossible to implement in any
>>   system (such as Symbolics Genera) that allows dynamic addition of character
>>   registries by third-party software vendors and by users; in such a system
>>   character codes have to be allocated dynamically and therefore their order
>>   cannot be fixed ahead of time.

You are quite right.  This bullet is removed.

>>
>>   Page 9 -- This says an implementation must define the result of
>>   standard-char-p on the characters it supports.  I think that is incorrect.
>>   Common Lisp fully defines the result of standard-char-p, which is NIL
>>   for all characters added by an implementation.

Right.  This bullet is removed.

>>
>>   Page 14 -- This EXTERNAL-WIDTH function probably should be part of a
>>   database facility or a terminal screen template facility; I'm not sure it
>>   is useful by itself.  Also note that its result is only meaningful with
>>   respect to a specific state of the stream.  To give two examples, with the
>>   SO/SI encoding the answer can vary by 1 depending on whether the stream is
>>   already shifted into the correct state for the first character; with the
>>   universal encoding Symbolics uses, the answer can vary by a lot depending on
>>   whether the character repertoires appearing in the string have been used
>>   earlier on the same stream (and hence have been assigned encoding numbers).
>>   Because of this dependence on the state of the stream, I cannot think of
>>   any correct use of EXTERNAL-WIDTH that does not involve immediately
>>   outputting the string to the stream.  Therefore I believe the same effect
>>   can be achieved without adding any new functions, by calling FILE-POSITION,
>>   outputting to the stream, calling FILE-POSITION again, and subtracting.  If
>>   you still want to propose this feature, you should change the name: use
>>   "length" instead of "width", since that's the word Common Lisp always uses,
>>   and use a name that relates to the :EXTERNAL-CODE-FORMAT option to OPEN;
>>   for example, STRING-LENGTH-IN-EXTERNAL-CODE-FORMAT or
>>   EXTERNAL-CODED-STRING-LENGTH.

I changed the name to EXTERNAL-CODED-STRING-LENGTH.  The description
already contained a comment regarding current state.  Actually, I
favored the STREAM-INFO proposal which was voted down.  This is
much less ambitious but I still feel more useful than actually
forcing I/O, backing up and rewriting.  It's also not clear
that your alternative has the same effect since it seems that
some unwanted side-effects would occur such as premature appearance
on a display screen.

>>
>>   Page 24 -- I can't figure out what you intend the meaning of SIMPLE-STRING
>>   to be.  Your report mostly does not mention it, but it doesn't say to
>>   remove it either.  If I have correctly correlated page 24 back to CLtL, you
>>   are defining SIMPLE-STRING to be synonymous with SIMPLE-GENERAL-STRING.
>>   Maybe what you really meant, though, was what you said in November you
>>   would do, which was to make SIMPLE-STRING mean (AND STRING SIMPLE-ARRAY),
>>   in other words a union of several subtypes.  This is particular confusing
>>   because Common Lisp uses the name SIMPLE-VECTOR to mean what you might call
>>   a simple general vector, that is, (SIMPLE-ARRAY T 1) rather than
>>   (SIMPLE-ARRAY * 1).  Here are my suggestions for what to do with the
>>   various names for string subtypes:
>>
>>     STRING                  As a union of all strings, this is fine.
>>     GENERAL-STRING          I think (VECTOR CHARACTER) is just as good.
>>     BASE-STRING             I think (VECTOR BASE-CHARACTER) is just as good.
>>     SIMPLE-STRING           Should mean (SIMPLE-ARRAY CHARACTER 1).
>>     SIMPLE-BASE-STRING      This is fine.
>>     SIMPLE-GENERAL-STRING   This name is horrible, use SIMPLE-STRING.
>>
>>   My rationale for these suggestions largely comes from thinking about
>>   which of these names would ever be used in type declarations and about
>>   how these names relate to the other names already in Common Lisp.  To
>>   repeat older comments:
>>
>>     Pages 19 and 20 introduce a new type named simple-base-string, in addition
>>     to simple-string.  If you think about how simple-string would be used for
>>     compiler optimization, it makes sense for simple-string to be the name for
>>     the single simplest representation, rather than a name for a whole family
>>     of representations that would have to be discriminated at run time.  Thus
>>     what you call simple-base-string should be called simple-string, and what
>>     you call simple-string should just be called (simple-array character (*)).
>>     This would not be an incompatible change in the meaning of simple-string.
>>     Simple-string would be analogous to simple-vector.
>>
>>   I changed my mind slightly on that and now claim that while SIMPLE-STRING
>>   should still be a single representation, not a union, it should be the
>>   representation that can hold all characters.  This is both because of the
>>   principle that correct programs should be easier to write than
>>   extra-efficient programs, and because of the powerful analogy with the name
>>   SIMPLE-VECTOR.  Then the name SIMPLE-BASE-STRING is also needed for
>>   convenient type declarations of the more efficient but less functional
>>   string representation.  That name is good, by analogy to BASE-CHARACTER.
>>
>>   Adopting the above suggestions helps you decide what to do about the
>>   SCHAR, SBCHAR, and SGCHAR mess.  First of all, you only need two functions,
>>   not three, because there are only two specified specialized representations.
>>   SCHAR should be for what I've called SIMPLE-STRING, SBCHAR should be
>>   for SIMPLE-BASE-STRING, and SGCHAR is not needed.  (In fact I would prefer
>>   to remove all of the specialized versions of AREF from the language, in
>>   favor of THE or type declarations, but I know that would only pass over
>>   some peoples' dead bodies so I won't push it.)
>>
>>   In case you are wondering, I have no quarrel with the name BASE-CHARACTER
>>   and would not want to see it removed.  I guess I differ from Larry here,
>>   unless I erred when I wrote down his comments during the meeting.

The statement on p24 making SIMPLE-STRING == (SIMPLE-ARRAY CHARACTER (*))
was in error.  P25 had it right.  Since we changed SCHAR to accept
all simple strings there is no reason for SGCHAR and SBCHAR and
these are eliminated.

  String and simple-string are (more clearly I hope) defined as union
types.  I've changed the terminology from 'for the purpose of
declaration' to 'for object creation'.   Perhaps there is a better
term but the effect seems to be identical to what you suggest. That is,
correct, portable programs are easier to write, one simply uses
string and simple-string.  More efficient, less portable programs
need to specify the specialized subtype(s) explicitly.
  Having both string and simple-string defined as union types seems
desirable on the basis of uniformity.
  Of the type abbreviations I think BASE-CHARACTER is the most
useful and GENERAL-STRING, SIMPLE-BASE-STRING and SIMPLE-GENERAL-STRING
less so.  I don't believe that any of these really complicate the
language.

>>
>>   Page 25 -- The discussion of STRING and SIMPLE-STRING thinks that there
>>   is a distinction between declaration and discrimination, but Common Lisp
>>   no longer has such a distinction.  Even when Common Lisp did have such
>>   a distinction, the meanings for declaration stated here were incorrect.

I changed this to 'object creation'.  Perhaps there is a better term.

>>
>>   Page 29 -- *all-character-registry-names* has to be a variable, not a
>>   constant, to accomodate systems (such as Symbolics Genera) that allows
>>   dynamic addition of character registries by third-party software vendors
>>   and by users.

Right, I made this change.

>>
>>   Page 35 -- CHAR-REGISTRY should be renamed to CHAR-REGISTRY-NAME, so that
>>   if at some later time character registry objects are added, there is no
>>   possibility of confusion about whether this function returns a name or
>>   an object.

Right, I made this change.

>>
>>   Page 40 -- the default :ELEMENT-TYPE for OPEN cannot be BASE-CHARACTER.  I
>>   think this was discussed at the X3J13 meeting.  The report suffers from a
>>   confusion between two meanings of BASE-CHARACTER: the character type
>>   implemented most efficiently by the Lisp, and the character type most
>>   natural to the file system.  These are not always the same.  Furthermore,
>>   in a network-based system that supports multiple file systems equally
>>   (Symbolics Genera is an example), each file system might have a different
>>   natural character type.  BASE-CHARACTER should just mean the character type
>>   implemented most efficiently by the Lisp.  The default for :ELEMENT-TYPE
>>   has two viable choices that I can see, and maybe you should just propose
>>   both and let people vote:
>>
>>     (1) CHARACTER.  This matches the behavior of MAKE-STRING and friends,
>>     adheres to the principle that writing correct programs should be easier
>>     than writing extra-efficient programs (since making a program correct
>>     requires making every part of it correct, while making a program
>>     efficient only requires improving the bottlenecks), and doesn't cost
>>     anything in implementations that don't have extended characters.
>>
>>     (2) The most natural type for the particular pathname being opened.
>>     In some systems this would be a constant, and in a subset of those
>>     systems this would be BASE-CHARACTER, however in general this might
>>     depend on the host, device, or even type fields of the pathname,
>>     and might also depend on information stored in the file system.
>>     In general this would always be an (improper) supertype of
>>     BASE-CHARACTER, but it's probably a bad idea to make that a requirement,
>>     as some file systems might not be able to implement it conveniently.
>>     Again this doesn't cost anything in implementations that don't have
>>     extended characters.

The discussion on p16 about the base coded character set efficiency
has been removed.  The default element-type now states that it is
implementation defined as character or a subtype of character.

>>
>>   The relationship of option 2 to :ELEMENT-TYPE :DEFAULT (a feature that
>>   already exists in Common Lisp) needs to be clarified.  Perhaps they
>>   are the same.

The same?  I don't understand.  For example, I can imagine the
element-type default as base-character and the external format
defaulted to either an ASCII or EBCDIC encoding.

>>
>>   Also the following promise from 14 November did not show up in the report:
>>
>>     >>     There should be a name for the "natural" encoding and there should be a
>>     >>     specification of the properties of the natural encoding that a programmer
>>     >>     can rely on.  Suggestions for the name include :BASE, :NATURAL, and
>>     >>     :INTERCHANGE.  The definition probably involves the concept of data
>>     >>     interchange with non-Lisp programs on the same system.
>>
>>     This will be added to the revision.

I lied.  No one came up with the 'properties' of such an encoding.
Do you have some text to suggest?

>>
>>   Appendix B -- I disagree with the way you've used deprecation.  I'll
>>   comment on each individual point:
>>    - I see no justification for deprecating STANDARD-CHAR.
>>    - I agree that STRING-CHAR should be deprecated, not deleted nor kept.
>>    - I think fonts and bits should be removed outright, not deprecated,
>>      because no portable program could possibly be using them.
>>    - I think the CHAR-INT function needs to be kept, although the INT-CHAR
>>      function should go away.  This is for hashing.  See comments below
>>      on character attributes.

I've removed Appendix B and mention of deprecation.  STANDARD-CHAR
is simply (characterp :standard).  String-char is back in as
implementation-defined either character or base-character (and
maybe should be voted as a deprecated type).

>>
>>   No particular page -- the use of strings for naming registries, labelling
>>   characters, and naming external code formats is objectionable.  Nothing
>>   else in Common Lisp is named by strings.  Use of strings might lead to
>>   efficiency problems.  We feel that keyword symbols are the appropriate
>>   objects to use for these three kinds of names.

I changed these back to symbols.

>>
>>   No particular page -- We agree with the deprecation or deletion of the two
>>   particular character attributes defined by CLtL, but not with the
>>   deprecation of the whole concept of character attributes.  In fact on page
>>   20 you say "characters are uniquely distinguished by their codes," which
>>   makes it impossible to have character attributes at all.  The language must
>>   define how conforming programs should be written so that they will work
>>   both in implementations with character attributes and in implementations
>>   without them.  For example, the value of (eql x (code-char (char-code x)))
>>   is unspecified.  Another thing that needs to be said is that the exact
>>   character operations (char=, string=, etc.) respect all character
>>   attributes, while the inexact character operations (char-equal,
>>   string-equal, etc.) respect or ignore each character attribute in an
>>   implementation-defined but consistent fashion.  Some of what you say on
>>   page 44 about attributes in general needs to be part of the spec, not
>>   deprecated.  I would retain everything on that page except for INT-CHAR and
>>   the last bullet (referring to bits and fonts), and I would add a remark
>>   that FIND-SYMBOL and INTERN respect character attributes.  If you want,
>>   perhaps I or someone else at Symbolics can provide exact text for what
>>   to say about character attributes that you could insert into your report.

I moved the attribute list previously in Appendix B back into the
description of characters.  Let me know what text you would like
to see for FIND-SYMBOL and INTERN and I'll add it to the list.

>>   No particular page -- On the subject of defining character registries in a
>>   separate document, and relating them to ISO standards for character
>>   encoding: I think that's fine.  I don't see anything wrong with introducing
>>   the concept of character registry and the requirement that each character
>>   object relates to exactly one registry.  However, I think the somewhat
>>   random list of character registries on pages 7-8 and again on page 21 does
>>   not belong in the language specification.  Even the names of the

Right.  They are not part of the Common LISP standard.  The revised
document is considerably clearer in this regards.

>>   standardized character registries belong in the character registry
>>   standard, not in the Common Lisp language standard.  I'm confused about the
>>   meaning of BASE, STANDARD, and CONTROL as character registry names; these
>>   are mentioned in your report but not explained very well.  If these are
>>   character registries that are required to exist in all Common Lisp
>>   implementations, then unlike the others they do belong in the Common Lisp
>>   language standard, not in the character registry standard.

By CONTROL, I meant a registry which contains the various control
codes mentioned in the various ISO coded character set standards.
BASE and STANDARD are no longer mentioned here.  They are allowed
as Common LISP repertiore names in characterp and the character
type specifier.

>>
>>   At the meeting there was some discussion about the issue of enumerating all
>>   characters in a character registry.  People claimed incorrectly that it was
>>   impossible.  In fact it's possible to do this, with questionable
>>   efficiency, by the following program:
>>
>>     (dotimes (code char-code-limit)
>>       (let ((char (code-char code)))
>>         (when char
>>           (when (eq (char-registry-name char) desired-registry-name)
>>             ... process this char ...))))
>>
>>   Of course you have to change the EQ to EQUALP if you continue to use
>>   strings to name character registries.  For more efficiency, you could add
>>   a way to iterate over all the codes in one character registry, but I think
>>   that is unnecessary.
>>
>>
>>   TYPOS:

Right. I've made these corrections.

>>
>>   25 -- base-string is missing from the Table 4-1 amendment.
>>
>>   26 -- general-string is not an array of BASE characters, also the first
>>   two paragraphs under A.4.8 are garbled (the two separate sentences for
>>   strings for symbols got smushed together).
>>
>>   37 -- This says the default for the :ELEMENT-TYPE option to MAKE-STRING
>>   is SIMPLE-STRING.  Actually it's CHARACTER.
>>