[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: long-char, kanji

>From ccut!Shasta!@SU-AI.ARPA:nuyens.pa%Xerox.COM@u-tokyo.junet Wed Jun  4 11:49:48 1986
>Date: 3 Jun 86 15:22 PDT
>Subject: re: long-char, kanji
>To: common-lisp@su-ai.ARPA
>  ...
>Strings are represented as homogeneous simple vectors of thin (8 bit) or
>fat (16 bit) characters.  Ignoring storage taken to represent them, the
>difference between fat characters and thin characters is transparent to
>the user.  In particular, since we allow fat characters in symbol print
>names, we use an equivalent of Ida's string-normalize function to
>guarantee unique representation for hashing.  
This is the most important decision point, I think.
I agree to do.
With Moon's idea, the relation of thin- and fat- is like a relation of fixnum
and bignum.
This means the characters in fat-char and in thin-char 
are completely independent.
But, any character set may contain the characters which have same appearance
with standard-char.
Such as the space code, alphabetic characters, and terminating-macro characters.
Actually JIS 6226 has an another code for the standard-char.
I think, other foreign character set may have characters which have same visual
figure as standard-char.
With Moon's idea of ASSURE-FAT-STRING, once there is a fat-char in a string, 
it can not be reduced
to thin-string, even if the modification made it to the string with the characters
which can be representable in thin-code only.

>NS includes all "JIS C 6226" graphic characters including the 6300 most
>common Japanese kanji.  There are also Hiragana and Katakana character
>codes specified.  (While there is substantial overlap with the Japanese
>kanji, Chinese characters are semantically separate and their character
>code assignments have not yet been published.)
The reason why I stick to kanji issue is not only I am a japanese, but I feel
it is the test case to cope with multi-byte characters and as a Common Lisper,
I feel a need to polish up the character data type.

>type hierarchy:
>Since we have char-bits-limit = char-font-limit = 1, STANDARD-CHAR is
>the same as STRING-CHAR.  I agree with Moon that STRING should be
>(VECTOR CHARACTER) and provide specialisations (even though this is a
>change from the status quo).  In our applications, we do as Fahlman
>suggests and use external data-structures to represent the sort of
>information encoded in "styles".  (It is hard to standardize which
>attributes should be made part of style (some people claim "case" should
>be a style bit!)).  

I like "style" idea also.
I don't want to use font.

>number of character codes required:
>At first glance it seems hard to imagine exceeding 16 bits.  Note
>however that the 7200 characters in NS don't include Chinese, Korean,
>Farsi,  Hindi, etc.  How many times have you been *sure* that the FOO
>field wouldn't be required to be larger than 16 bits?
As far as the japanese character set concerned, 16 bits for char-code is enough.
But, as an international standard, I feel the room for more bits is needed.

>Greg Nuyens
>Text, Graphics and Printing,
>Xerox AI Systems

Masayuki Ida