[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
long-char, kanji
First, we could alter the type hierarchy as Moon suggests, and begin to
encourage implementations to exercise their right to have zero-length
font and bit fields in characters. A lot of us have come to feel that
these were a major mistake and should begin to disappear. (We wouldn't
legislate these fields away, just not implement them and encourage code
developers not to use them.) An implementation that does this can have
Fat-Strings with 16-bits per char, all of it Char-Code.
This would be fine. The only problem is that if the implementation later
wants to add character styles, it has to double the width of fat-strings
or add a third type of string.
True, but my guess is that few implementations will choose to add such a
thing. I think our current view at CMU (Rob will correct me if I'm
wrong) is that highlighting and the other things you do with "styles" is
better accomplished with some sort of external data structure that
indicates where the highlighting starts and stops. It seems wasteful to
do this on a per-character basis, and even more wasteful to tax every
character (or even just every Japanese character) with a field to
indicate possible style modification. We wouldn't make it illegal to do
this, but many implementations will go for the 2x compactness instead.
Alternatively, we could say that Fat-Char is a subtype of Character, with
Char-Bit and Char-Font of zero. String-Char is a subtype of Fat-Char,
with a Char-Code that fits (or can be mapped) into eight bits. A
Thin-String holds only characters that are of type String-Char. A
Fat-String holds Fat-Chars (some of which may also be String-Chars). If
you want a vector of characters that have non-zero bits and fonts, then
you use (Vector Character). I'm not sure what we do with the String
type-specifier; the two reasonable possibilities are to equate it to
Thin-String or tow the union of Thin and Fat Strings.
I take it the way this differs from your first alternative is that there
are three subtypes of character and three subtypes of string, and you
propose to name the additional types CHARACTER and (VECTOR CHARACTER).
I don't think that's viable. The informal definition of STRING is
anything that prints with double-quotes around it. Surely any one
dimensional array of characters should qualify as a string. I don't
think it makes sense to use the name STRING for a specialized subtype of
(VECTOR CHARACTER) and have a different name for the general thing; I
think it's always cleaner to use the short name for the general thing
and qualified names for the specializations of it. Surely using STRING
to mean the union of thin and fat strings, excluding extra-fat strings,
would be confusing.
As I read the manual, Common Lisp strings are not now allowed to contain
any characters with non-zero bit and font attributes. Arbitrary
characters can be stored in vectors of type Character, which are not
Strings and do not print with the double-quote notation. I am just
suggesting that we preserve this staus quo: the name String might be
extended to include Fat-String (in the narrow sense of Fat-String
defined above) but not to include vectors of arbitrary characters.
-- Scott