[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

long-char, kanji

To: "David A. Moon" <Moon@SCRC-STONY-BROOK.ARPA>
Subject: long-char, kanji
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
Date: Sat, 31 May 1986 04:20:00 -0000
Cc: common-lisp@SU-AI.ARPA
In-reply-to: Msg of 30 May 1986 23:44-EDT from David A. Moon <Moon at SCRC-STONY-BROOK.ARPA>
Sender: FAHLMAN@C.CS.CMU.EDU

        First, we could alter the type hierarchy as Moon suggests, and begin to
        encourage implementations to exercise their right to have zero-length
        font and bit fields in characters.  A lot of us have come to feel that
        these were a major mistake and should begin to disappear.  (We wouldn't
        legislate these fields away, just not implement them and encourage code
        developers not to use them.)  An implementation that does this can have
        Fat-Strings with 16-bits per char, all of it Char-Code.

    This would be fine.  The only problem is that if the implementation later
    wants to add character styles, it has to double the width of fat-strings
    or add a third type of string.

True, but my guess is that few implementations will choose to add such a
thing.  I think our current view at CMU (Rob will correct me if I'm
wrong) is that highlighting and the other things you do with "styles" is
better accomplished with some sort of external data structure that
indicates where the highlighting starts and stops.  It seems wasteful to
do this on a per-character basis, and even more wasteful to tax every
character (or even just every Japanese character) with a field to
indicate possible style modification.  We wouldn't make it illegal to do
this, but many implementations will go for the 2x compactness instead.

        Alternatively, we could say that Fat-Char is a subtype of Character, with
        Char-Bit and Char-Font of zero.  String-Char is a subtype of Fat-Char,
        with a Char-Code that fits (or can be mapped) into eight bits.  A
        Thin-String holds only characters that are of type String-Char.  A
        Fat-String holds Fat-Chars (some of which may also be String-Chars).  If
        you want a vector of characters that have non-zero bits and fonts, then
        you use (Vector Character).  I'm not sure what we do with the String
        type-specifier; the two reasonable possibilities are to equate it to
        Thin-String or tow the union of Thin and Fat Strings.

    I take it the way this differs from your first alternative is that there
    are three subtypes of character and three subtypes of string, and you
    propose to name the additional types CHARACTER and (VECTOR CHARACTER).
    I don't think that's viable.  The informal definition of STRING is
    anything that prints with double-quotes around it.  Surely any one
    dimensional array of characters should qualify as a string.  I don't
    think it makes sense to use the name STRING for a specialized subtype of
    (VECTOR CHARACTER) and have a different name for the general thing; I
    think it's always cleaner to use the short name for the general thing
    and qualified names for the specializations of it.  Surely using STRING
    to mean the union of thin and fat strings, excluding extra-fat strings,
    would be confusing.

As I read the manual, Common Lisp strings are not now allowed to contain
any characters with non-zero bit and font attributes.  Arbitrary
characters can be stored in vectors of type Character, which are not
Strings and do not print with the double-quote notation.  I am just
suggesting that we preserve this staus quo: the name String might be
extended to include Fat-String (in the narrow sense of Fat-String
defined above) but not to include vectors of arbitrary characters.

-- Scott

Follow-Ups:
- long-char, kanji
  - From: Robert W. Kerns <RWK@SCRC-YUKON.ARPA>

Prev by Date: long-char, kanji
Next by Date: Guidelines for the Standard
Previous by thread: long-char, kanji
Next by thread: long-char, kanji
Index(es):
- Date
- Thread