[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
portability of pathnames
Date: Sun, 22 Jun 1986 20:30 EDT
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To: SANDRA <LOOSEMORE at UTAH-20.ARPA>
Cc: common-lisp@SU-AI.ARPA
I'm going through old mail to make a list of issues we need to settle,
or at least work on. I came across your complaint about portability of
pathnames and the problems with make-pathname. I was just wondering if
you had anything specific to propose. If not, I'll just add this to the
agenda of things we collectively need to think about, but I'm not sure
there's a good solution to be had.
By the way, in my work with Macsyma I've seen most of the same problems
as Sandra mentioned in her message that kicked off this line of conversation.
She sounded in that message like she expected to get dumped on, but I
hope that she's neither dumped on nor ignored. Most of those comments were
very to the point.
I do take very minor issue with her remark that the package issue seems
a small one next to the other issues she cited. My reason for pushing so
hard to get these package issues resolved is that they impede everyone's
ability to get a foothold in a Lisp they're trying to port to. If we can't
get expressions to read the same in each each Lisp, then we're stripped
even of the ability to talk about language problems at the level of
expressions and must too frequently resort to discussions of the meaning
of source text. Also, in practice, implementation-specific workarounds are
something you can get to much more easily once the syntactic barriers are
resolved.
But I don't mean to diminish the importance of these other issues. Pathnames
are a horror to use in CL. Here's a list of the gripes I have with pathnames
which come to mind just off the top of my head; I'm sure if I though harder
I could think of others. Maybe Sandra could add some of her favorites...
* Canonical case. If you study the Symbolics pathname system, you'll note
that elaborate pains are taken to make the case of the components
be stored in uppercase for interchange purposes even if they're
composed as a namestring in another case. This allows the internal
representation of the Unix pathname /joe/math.text and the Tops-20
pathname <JOE>MATH.TEXT to use the same internal notation, with a
name of "MATH" and type of "TEXT", and allows cross-file-system
merging to be done correctly. The result of the current system is
that one must write gross things like:
(MAKE-PATHNAME :NAME THE-GIVEN-NAME
:TYPE (IF *LOWERCASE-FILENAMES-P* "text" "TEXT"))
and initialize the *LOWERCASE-FILENAMES-P* variable on the basis of
implementation-specific information. As Moon points out, the
Symbolics pathname system does this sort of thing invisibly, and
people interested in how to fix this should study the documentation.
It may seem hairy, but a portable file system interface is going to
necessarily be somewhat hairy just because of the variance of file
systems. I think given the constraints, it's not gratuitously hairy.
* What can go in a host slot? CLtL don't say whether a Lisp
implementation on host "FOO" is required to treat :HOST "FOO"
the same as :HOST NIL or :HOST "" in MAKE-PATHNAME. In fact,
nothing says whether "FOO:" or "FOO::" might be allowed (depending
on what the native notation was for hosts was); I definitely don't
think they should be, but there's nothing I can find protecting me
from an implementation making this the -only- way to notate a host.
* What can go in a directory slot? CLtL says it can hold a string,
but it doesn't say whether the string contains any notational
devices. eg, on VMS, is "FOO" ok for a directory or do you want
"[FOO]". "FOO" would seem the most portable, since it doesn't get
involved in the fact that TOPS-20 might want "<FOO>" and the LispM
might want ">FOO>" but it all doesn't matter much anyway because if
you want to talk about subdirs, "FOO.BAR" doesn't completely hide the
implementation because it works for systems that use the notation
"<FOO.BAR>" or "[FOO.BAR]" but not that use ">FOO>BAR" or "/FOO/BAR".
Without this much information, the kinds of operations you can do
on the contents are unreasonably limited. On the LispM, you say
:DIRECTORY "JOE" but in VAXLISP you say :DIRECTORY "[JOE]". The
LispM idea of allowing this to contain a list of directory names,
as in ("FOO" "BAR") to mean /FOO/BAR or >FOO>BAR> is clearly more
reasonable and I can't imagine why it was not adopted.
* Canonical types. The extension which is used for certain standard
kinds of files varies from implementation to implementation. eg,
some systems call text files .txt and others .text. Some call
lisp files .lsp, others .lisp, and others .clisp. Some call binary
files .BIN, others .FAS, and so on. It would be nice if we'd
adopted the LispM's canonical type system such certain dignified
file types could be predefined for use with portable programs.
Thus, (MAKE-PATHNAME :NAME "FOO" :TYPE :LISP) could refer to
"FOO.LISP" in some implementations, "foo.l" in others, etc.
* This business about semi-standard features like :NEWEST and :OLDEST
is a pain. We need those features, but we should fully enumerate the
entire set of possible contents and exactly what they denote, even
if not everyone supports them all. It should be possible to construct
a program that would be "ready for anything". Perhaps each
implementation could keep a list of which keywords were valid for
that implementation.
* No way is provided for creating a relative pathname. This would
be very useful for merging purposes even on systems which don't
provide a namestring syntax for pathnames. It is especially
essential in the absence of a clear specification of what the
directory slot contains.
* On issue is that there are so many fields which are allowed to
contain implementation-dependent gunk as to make those fields
are effectively write-only.
* Printing pathnames. We provide no convenient way to print a
pathname. On the LispM you can do (FORMAT T "~A" pathname) but
not all implementations support that because CLtL doesn't say
it should work. Doing (FORMAT T "~A" (NAMESTRING pathname))
seems dumb since, among other things, it forces gratuitous consing.
* How do you compare pathnames? EQUAL pathnames are not obliged
to be EQ. Since pathnames contain all these options for
implementation-dependent featurism, the user is not able to
write a PATHNAME-EQUAL. As far as I can tell, an implementation
in which a directory slot of "FOO.BAR" and ("FOO" "BAR") are
equivalent is not constrained to return T for EQUAL on two
pathnames which contain identical things except one uses
"FOO.BAR" and the other uses ("FOO" "BAR"). Indeed, even doing
(EQUAL (NAMESTRING X) (NAMESTRING Y)) isn't good enough because,
for example, VAX VMS allows logical names like "FOO:[.BAR]X.Y" to
expand into "DEV1A:[FOO][.BAR]X.Y". I don't care if
"FOO:[.BAR]X.Y" is PATHNAME-EQUAL to "DEV1A:[FOO][.BAR]X.Y"
because that's a semantic issue that may get caught up in how
the FOO logical device is implemented, but I do care that
"DEV1A:[FOO][.BAR]X.Y" and "DEV1A:[FOO.BAR]X.Y" are PATHNAME-EQUAL
because that's just a syntactic issue ... but I see no way of
writing a portable PATHNAME-EQUAL.
* I consider it to be a complete bug (and the only one that I've
seen which I believe to also be a bug in the Symbolics pathname
system) that you can't create a non-hosted pathname. eg, in
the case of someone doing
(MERGE-PATHNAMES "" "FOO")
and later planning to do
(MERGE-PATHNAMES * "JOE::")
where "::" is the host syntax used by the book, if you force the
first merge to put a host on, then the second merge won't pick
up the "JOE" and the wrong thing will happen. This actually came
up in MACSYMA and I was forced to invent my own pathname system
which holds a CL pathname in a slot and also holds host-valid-p
info that it keeps set to NIL after the first merge above (which
must be done via MY-MERGE-PATHNAMES, not CL's MERGE-PATHNAMES)
so that MY-MERGE-PATHNAMES can correctly do the second merge.
* The phrase "in which case no parsing is needed, but an error
check may be made for matching hosts" at the end of the first
paragraph of the description of PARSE-NAMESTRING on p414 is
an invitation to disaster since we don't specify how to obtain
even the current machine's host name or in what syntax it should
be presented in order to make this function happy. For example,
(PARSE-NAMESTRING "FOO.LISP") in VAXLISP might return
#S(PATHNAME :HOST "PETER" :DEVICE NIL :DIRECTORY NIL
:NAME "FOO" :TYPE "LISP" :VERSION NIL)
but (PARSE-NAMESTRING "FOO.LISP" "PETER")
errs telling me that host "" and "PETER" conflict. I might
report this as a bug and maybe they'd even fix it for me but
I'd have nothing to fall back on if they disagreed because CLtL
certainly doesn't come out and claim it's a bug.
* The description of the pathname system offers no examples of
using it any non-trivial way. All the examples use strings as
arguments, but that's just the problem. In portable applications,
strings just don't work. Sometimes, you're merging something that
was typed in by the user but rarely is it being merged with something
else typed by the user. The other thing is often something your
program wanted to have wired in. If you tried to write even the
simplest program using the given primitives in an even slightly
non-trivial way, the problem would become apparent. eg, try to
figure out how to specify the examples on p141 or p415 in a
portable way. To put yourself in the right frame of mind,
replace "DUMPER" on p414 with "MACSYMA" or "KEE" or "MYCIN" or
something that you don't think of as TOPS-20 specific. The
example on p415 is hard just as it is, for exactly the reasons
of canonical types I've mentioned above. What am I expected to
write? Top of p423 is the only place CLtL tries to do this, and
it more or less succeeds in this trivial case ... except for the
fact MERGE-PATHNAME-DEFAULTS isn't in the index and I suspect
never made it into the spec. Any way you cut it, these three
examples just don't do enough to illustrate what you can and
can't do with the given primitives.
I do think pathnames useful only for the most trivial purposes in CL.
I don't think this means we should flush them. I think we should
seriously study systems, particularly those offered by the LispM
vendors, where there's been success in dealing with multiple file
systems, and then I think we should agree on the additional mechanisms
necessary to make things really work.