[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

cs proposal comments

>>   From: David N Gray <Gray@DSG.csc.ti.com>
>>   Subject: characters proposal
>>   I have read the documented titled "Extensions to Common LISP to Support
>>   International Character Sets" dated January 1, 1989, and feel that it is
>>   not much of an improvement over what we saw in October.  Following are
>>   some random comments about things I happened to notice; this is not
>>   intended to be a comprehensive analysis.
>>   First, documents such as this ought to be labelled with an X3J13
>>   document number so that they can be referred to conveniently and
>>   unambiguously.
>>   "Appendix A" and "Appendix B" really should be chapters 3 and 4 since
>>   they are an essential part of the proposal, rather than being an
>>   appendage to it.

Appendix B is now eliminated.  Appendix A is really quite unlike
chapters 1 and 2 in structure.

>>   Page 7 says that the definition of semi-standard-characters "is replaced
>>   by a more uniform approach with introduction of the Control Character
>>   Registry".  Do you really mean that it _will_be_ replaced when the
>>   Control Character Registry is defined in some subsequent document?  I
>>   certainly don't see anything in this document that could be considered a
>>   replacement.

Yes.  The revision is clearer on this.  This document does not define
names for character registries nor their constituents.

>>   This whole concept of registries seems rather strange.  Is the intent
>>   that the alphabetic characters of the standard characters are to be in
>>   the "Latin" registry while characters such as period and comma are in
>>   "Latin-Punctuation"?   Is #\NEWLINE in the "Control" registry?  Where do
>>   the digits go -- "Mathematical"?.  Is #\- a "Latin-Punctuation" or a
>>   "Mathematical"?  Which registry is #\SPACE in?  Now tell me what to do
>>   with the extra non-Latin alphabetic characters used in Sweedish?  Does
>>   that require a separate registry for just those additional characters?
>>   Now we have simple text in a single language using characters from at
>>   least four different registries.  Do you really think it possible to
>>   agree on a "fixed", non-extensible, set of "Mathematical" or "Pattern"
>>   characters?

  Actually, I believe the simplicity of the registry framework will make
agreement easy.  Currently, members of the coded character set
committees spend vast amounts of time lobbying for inclusion of their
favorite character(s) in the 'popular' coded character set standard.
The effect of not being included means fewer installations will
support their native language properly.

  I think a new group, hopefully formed within
programming languages, should define the registries rather than
the existing coded character set committees.  There is no competition
between registries, ie. no advantage of one over another.  What this
committee has to agree upon is 1) a useful set of registry names and
2) definition of the constituents of each registry.  The only argument
I would anticipate is "are the semantics of my alpha the same
or different from your alpha" type debates.
  By the way,
the registries are fixed only in that a Common LISP implementation
cannot modify the standard definitions.  This guarantees an application
program can portably rely on the composition and decomposition
functions to establish the availability of any given character.

>>   Page 9 says that an implementation needs to specify the total ordering
>>   of characters within each registry, but what about the ordering of
>>   characters in different registries?  Is that completely undefined?

There is no ordering of characters within registries.  As mentioned
in Hawaii, the character index (a number) was changed to character
label (a symbol) throughout the proposal.

>>   Page 25 section A.4.5 doesn't specify the syntax of a registry name; did
>>   you intend it to be a string?

These have been changed to be symbols.

>>   Page 27 has an example using  (typep x '(character "standard"))  but
>>   page 25 said that had to be a registry name; "standard" is not a
>>   registry name.

The revision is clearer on this.  character and characterp can take
registry names, :base or :standard.  The meaning of :base and :standard
is defined by Common LISP as the base character repertoire and
standard character repertoire respectively.

>>   Page 29 - *ALL-REGISTER-NAMES* -- a list of strings?

Now a list of symbols.

>>   Page 33 -- FIND-CHAR -- does the index value within a registry have any
>>   portable meaning?  Is that intended to be specified for the standard
>>   registries?  Is "base" supposed to be accepted here?  If not, how can
>>   you access the base codes?  If I were going to construct a character
>>   from its index value, it would be more meaningful to use an index
>>   relative to some coded character set rather than these registries.

FIND-CHAR takes a character label and registry.  These are specified
by the registry standard.  Base is not a registry name.  We have
introduced a new function CHAR-CCS-VALUE which takes a character
object and a coded character set name (a symbol) and returns the
encoding of the character in the coded character set.

>>   Page 36, the last sentence doesn't make sense.  The default for
>>   :ELEMENT-TYPE would have to be either CHARACTER or BASE-CHARACTER.

Right. I've made this change.

>>   Page 37, section A.22.1.1 -- the part being deleted specifies the
>>   meaning of including tab and form-feed characters in a Common Lisp
>>   source file; do you really intend that to not have any standard meaning?
>>   If my editor uses tabs for indenting, does that mean that the resulting
>>   source file is not a standard-conforming program?

That really depends on the definition of a conforming program. Is
this defined yet?

>>   Page 38, the first reference to p360 of CLtL should be p353; the
>>   deletion here says that there shall not be any standard name for the
>>   commonly used control characters such as tab and form-feed.  That still
>>   seems wrong to me.
>>   Page 41, what's the point of appending "ccs" to the name of the
>>   standard?  Presumably that stands for "coded character set", but isn't
>>   that adequately implied by the fact that this string will follow the
>>   keyword :EXTERNAL-CODE-FORMAT ?   The use of "default" seems odd since
>>   :DEFAULT is used everywhere else.

This was to distinguish from someone referring to the set of characters
(repertoire) represented in a given coded character set. Ie. to
distinguish ISO8859/6-1987 coded character set from the ISO8850/6-1987
repertoire.  In fact, the ISO coded character set standards never
refer to repertoires in isolation (ie. without the codes), so I've
dropped the 'ccs'.  Also, "default" is now :DEFAULT as elsewhere.

>>   I agree with Moon that the excising of bits and fonts has not been done
>>   carefully enough for them to be compatible extensions.

I think the new revision takes care of this by incorporating the
attribute list as part of the language proper (ie. not deprecated).