[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
hash table types
- To: George J. Carrette <GJC at MIT-MC>
- Subject: hash table types
- From: David C. Plummer <DCP at SCRC-TENEX>
- Date: Sun, 12 May 1985 14:56:00 -0000
- Cc: DCP at SCRC-QUABBIN, TIM at MIT-MC, COMMON-LISP at SU-AI
- In-rely-to: <[MIT-MC].499100.850512.GJC>
Date: Sun, 12 May 85 08:51:09 EST
From: George J. Carrette <GJC@MIT-MC>
We figured that fixing the root of the problem, the lack of
canonicalization in the hardware database would possibly cost
enough time to break a production development window and put
off the introduction of a new hardware item for a few months,
with possible losses in sales of a million dollars or more
for the year. Given this sort of lossage mode one cant stress
enough the importance of having *some* way of getting
compatibility between RELEASE N and RELEASE N+1. Remember
that at some time in "zetalisp" EQUAL on type STRING meant
STRING-EQUAL, which was for better or worse case insensitive.
LMI release 2 and presumably also Symbolics release 6 does
away with this. If there is nothing available which behaves
in exactly the same way as the *old* EQUAL-HASH-TABLE then
people can be screwed badly. Remember also that many ugly
things that were left in Common-Lisp from maclisp days (such
as the empty list being a SYMBOL, making the set of all lists
and all symbols have a non-null intersection) were justified
soley on the basis that there was no *automatic* way for
people to detect if they were depending on such lossage and
convert their programs. To quote Dave Moon: "Our customers
just wont stand for it."
For that exact reason, "Our customers just won't stand for it",
Release 6 is compatible with Release 5 w.r.t. EQUAL-HASH-TABLEs,
i.e., case is ignoreed. Release 6 provides hints on how to write
code that will work in both Release 6 and Release 7, requiring
recompilation and possibly minimal, if any conversion. In
Release 7 there will most likely be a CASE-INSENSITIVE-HASH-TABLE
that uses CASE-INSENSITIVE-EQUAL as the predicate and
CASE-INSENSITIVE-EQUAL-HASH (unless EQUAL-HASH will continue to
ignore case).
More generally, and on the generic issue: Symbolics has had
generic hash tables for quite a while. I don't know if Lisp
Machine Lisp is general has. The definer of the hash table is
required to write two methods, the equality predicate and the
hash function. They are quite useful. All implementations must
do a dispatch, since the entrypoints are the same for all types
of hash tables. In a flavor-based implementation, the dispatch
happens at the message send level. In other implementations, my
guess is that FUNCALL is used where the function comes from some
defstruct slot. In any case, generalizing does not appear to be
a hard thing to implement.
The external things I see needing changing are
* An additional argument to make-hash-table. My vote for a name
is :hash-function.
* If the hash function is not provided, it defaults from the
predicate.
* It is an error not to supply a hash function if the predicate
is not one of those builtin (currently, EQ, EQL and EQUAL).
[Implementatyions may supply more builtins and opt not to
signal the error.]
* Requirements of the hash function: It must return an integer,
possibly restricted to fixnum, possibly required to be
positive. [The LispM builtin hash functions attempt to return
positive fixnums that use as many bits as possible. This is
so that there is more uniform distribution after it is MODed
with the hash table modulus.]
* Hints on how to write implementation independent, efficient
hash functions. This is the hardest part. Maybe there are
enough constants defined in the language to make this possible
(e.g., bits-per-fixnum, etc). Hmmm... Many LispM hash
functions use the functions ROT (better called ROTATE-FIXNUM)
and LSH (better called LOGICAL-SHIFT-FIXNUM). They often use
LOGXOR, but this is part of CLtL. I have a memory of being
told that ROT and LSH were consciously not part of the CL
language because they weren't mathematical. With names like
ROTATE-FIXNUM and LOGICAL-SHIFT-FIXNUM maybe they should be
put back? [Also see Turing's biography about the hardware
designers diking out his boolean instructions for similar
reasons.]
* The use of MAKNUM, %POINTER or whatever it is for a particular
implementation is to be forbidden for user suplied hash
functions. This is because the GC can move objects and change
their addresses, and there is no way in CL to communicate this
between the hash table and the GC. Individual implementations
may wish to implement such a communication and allow %POINTER
and use it for builtin hash tables to improve efficiency.
Conclusion: generic hash tables are useful and probably simple to
implement. User supplied hash functions are required but
difficult.
Additionaly, I think there should be a CASE-INSENSITIVE-EQUAL
function and a corresponding (builtin) hash table type. I have
an uneasy feeling about EQUALP hash tables. I think the
uneasiness comes in the traversing of arbitrary array structure
that EQUALP implies and the possible inefficiency. The example
needs I have so far seen for EQUALP are dealt with by
CASE-INSENSITIVE-EQUAL.