[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

READ and "illegal" characters

To: SEB1525@draper.com, common-lisp@SAIL.STANFORD.EDU
Subject: READ and "illegal" characters
From: Michael Greenwald <Greenwald@STONY-BROOK.SCRC.Symbolics.COM>
Date: Tue, 30 Aug 88 12:46 EDT
In-reply-to: The message of 25 Aug 88 13:36 EDT from "Steve Bacher (Batchman)" <SEB1525@draper.com>

    Date: Thu, 25 Aug 88 13:36 EDT
    From: "Steve Bacher (Batchman)" <SEB1525@draper.com>

    Taking the description of the CL reader at face value, I infer that an
    "illegal" character may occur in a symbol name if it is preceded by a
    backslash ("single escape"), but not if it occurs inside a pair of
    vertical bars ("multiple escape").  This seems strange.  Is it merely
    an oversight, or is it intentional?

There appear to be two types of "illegal" characters - (a) a character
with an "illegal" syntax type, or (b) a constituent character with an
"illegal" attribute.

Only illegal characters of type (b) are specified in the manual.  Since
it is impossible for a programmer to explicitly specify the syntactic
type of a character (you can only copy it by set-syntax-from-char), it
is up to the implementors to allow, or disallow an "illegal" syntactic
type (type (a) illegal characters) in their implementation.

Step 9 on page 337 specifically says that the reader performs "one of
the following actions" according to the character's >syntactic< type.

If you want multiple escapes to behave identically to single escapes,
you can choose to make no characters with "illegal" syntactic type.
(Make them all whitespace, or constituent with an "illegal" attribute)

This finesses the question of whether CLtL means to treat single and
multiple-escapes differently.  But it does mean that the question isn't
>significant<.  

Notice, though, that portability of printed representation isn't an
issue here, because none of the standard characters have an "illegal"
syntax type. 

    This is a potentially significant problem, because it mandates that the
    printer must slashify "illegal" characters by preceding each one
    individually with a backslash rather than being able to just surround
    the entire name with vertical bars.  For some implementations (i.e. mine),
    it is easier to embar the entire name, once it is determined that funny
    characters are present somewhere in the name.

    Is it intended that all characters not listed in the table as consituent,
    macro, etc. are "illegal"?  Or might an implementation be able to treat
    them all as constituent characters?

I believe the latter must be correct, (the implementor can choose the
syntactic type of all non-standard characters), otherwise it would be
impossible to read in symbols in (for example) Japanese.

Prev by Date: Re: hash tables and GC
Next by Date: RE: Read and "illegal" characters
Previous by thread: READ and "illegal" characters
Next by thread: RE: Read and "illegal" characters
Index(es):
- Date
- Thread