[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A multi-byte character extension proposal

To: baggins@ibm.com, ida%tansei.cc.u-tokyo.junet%utokyo-relay.csnet@RELAY.CS.NET
Subject: Re: A multi-byte character extension proposal
From: Masayuki Ida <a37078%tansei.cc.u-tokyo.junet%utokyo-relay.csnet@RELAY.CS.NET>
Date: Thu, 25 Jun 87 23:48:59+0900
Cc: common-lisp@SAIL.STANFORD.EDU

Date: Tue, 23 Jun 87 12:54:31 PDT
From: "Thomas Linden (Thom)" <baggins@ibm.com>

Are characters with different codes always syntactically
distinct?
Yes.
Can the standard character #\( have two different codes,
corresponding, for example, to two different external file system
representations of that character?
No.
(because you said 'the standard character'...)
We cannot talk about the different representations on files.
Some implementations may read 'similar' characters into one internal
code, but others may not.

Does the JEIDA proposal permit two different string-chars to have
the same print glyph, '(' for example, but different syntactical
properties?
We did not discuss about the issue related to your question,
because we have no scope on the characters which has the same print glyph but
different syntactical properties.
There are several japanese characters which have similar glyphs.
But their glyphs are not 'the same' (except for blank characters).

Is it
allowable to map both of these sets of codes into the one,
internal Lisp character code set when inputting data to Lisp, and
adopt our own conventions for translating output back to single
and double byte?
yes.

An elaboration of the the previous question: Is it possible for an
implementation to represent all of the standard characters internally
with 2-byte codes, and to map some 2-byte character codes and some
1-byte character codes in system files onto the same set of 2-byte
internal codes for the standard characters when read into Lisp?
yes.

The English copy we saw of the proposal did not contain section 4.4.
Based on our own translation from the original in Japanese, this
section seems to discuss implementation issues.
Since we could not make a good conclusion on the issue,
the section 4.4 of the early draft injapanese was deleted.
The proposal have many freedom for implementors.
there
seem to be two possible treatments of double byte characters. The
first is the case where a double-byte character can be a standard
character. The second is where a double-byte character cannot be
a standard character.
I think so too.

5a)

Implementation dependent.

5b)

Is the difference
between option 1 and option 2 whether the Lisp system would
recognize a single-byte version and a double-byte version
of this symbol-name in the same file as referring to the same
(EQ) symbol?
Yes.

1. (list abc /fg " xy " )
--------------------------------
2. (list abc /fg " xy " )
-- ---- ----- --- ----
3. (list abc /fg " xy " )
------------------------ -----
We tried to select one and only one selection among the above 3 'options'.
But we found we cannot make decision until ISO related standardization
of japanese character representation.

5c)
I cannot understand what you said.
I don't imagine the status like "there is a character which has a same print glyph but different code."

5d)
Implementation dependent.
Standard-character may be single-byte or may be multi-byte,
according to the definition of the implementation.

5e)

Is section 4.4 a part of the proposal to ANSI?
No.

If you could elaborate (in English) on the content of section
4.4, we would greatly appreciate it.
Please ask IBM japan (your subsidiary) for the complex issue
behind the section 4.4 of the early draft in japanese.
We need more observations on other languages, file systems,
operating systems and JIS character set definition refinement
itself before we might make a firm guideline for the matter.

Correct?
Your interpretation can cope with our proposal.

If a Lisp system supports a large character code set, need it allow
every character of type string-char to have a non-constituent syntax
type defined in the readtable, or is the proposal's default that
only standard characters need be represented in the readtable?

CLtL says (22.1.5 page 360):
"every character of type string-char must be represented in the readtable."
The members felt as we extended the definition of string-char to include
japanese characters, as the results of a natual interpretation of CLtL,
the readtable must have more than 64k 'logical' entries.

Regards,
Thom Linden

Masayuki Ida

PS: our proposal is the result of several japanese and USA CL implementations.
Though we will welcome any opinions to our proposal, I feel
the final decision will be by ANSI, JIS, and ISO.
One of the members of our WG will attend the X3J13 meeting, since
I cannot leave my university on the next week.
He is a very active members and he knows the process.
He is scheduled to have a presentation on this issue at the X3J13 meeting.

Prev by Date: A multi-byte character extension proposal
Next by Date: Format ~E nit -- language lawyer sought
Previous by thread: A multi-byte character extension proposal
Next by thread: Re: A multi-byte character extension proposal
Index(es):
- Date
- Thread