[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

hash table question




SXHASH has the contract of returning the same value on things that
"print alike".  This is implicit in the definition of sxhash that says:
"(equal x y) implies (= (sxhash x) (sxhash y))".  [See CLtL p285; also
the rule-of-thumb definition of EQUAL on p80.]

In your example, you have two items which are EQ but not EQUAL  [wait, 
read the rest before you judge this a contradiction].  Since they are
not equal, you can't expect them to have the same sxhash values.

The "two items" involved are two readings of the same "memory cell" at 
different times.  Since it is "the same memory cell", the two items are 
EQ;  but since an intrusive update occured during the time between the 
the two readings, "they" are no longer equal.

I find it somewhat curious that the following dumb definition perfectly 
satisfies  CLtL's requirements for SXHASH:
    (defun sxhash (x) 5)
and this definition would give you the before/after equivalence you are 
asking about.  How SXHASH differs fundamentally from the non-Common-Lisp 
function %POINTER available in ZetaLisp (or the LOC in Interlisp) is that
it must be independent of the address of a "memory cell", if any, holding 
the object.  CLtL tries to say this with language like "incarnation" and
"core image".  But, note how well the above dumb definition of sxhash 
satisfies this independence criterion!

On the other hand, 5 really isn't a very good hashing function.  It would
give very poor distribution if it were the basis of gethash -- all entries
would be in the same collision chain -- so it seems reasonable to descend 
into structures, even updatable structures, and compute a value that is, 
by design, potentially different after the object is updated.  The value 
returned by %POINTER (and LOC) wouldn't be affected by the RPLACA in your 
example, but they have other problems: 
  (1) they don't have the "equal[x,y] ==> sxhash[x] = sxhash[y]" property 
      mentioned above, and 
  (2) they don't remain the same over a relocating gc or a generational 
      scavenge.

If your point in asking this question is to examine the semantics of
EQ-type hash tables, then I think there is general consensus that
EQ-type tables don't simply mean that the "test" is EQ (as opposed
to EQUAL), but that the collision chains are based upon the object
"address" rather than the object "contents".  EQ-testing hash tables
with collision chains built the other way are certainly conceivable,
but one wonders what value they would be.  As I surmised in a previous 
note to this community, there is an unfortunate tendency to think that 
EQ-tables are "much faster" than EQUAL-tables, and the semantically 
wrong choice of :test is often made because of a mistaken conception 
about performance.   [A typical place where EQ-type tables *are* indeed 
the correct choice is in circularity detection, or possibly in macro-
expansion caches like MACROMEMO of MacLisp and CLISPARRAY of Interlisp].

This may also explain why there hasn't been a greater interest in
opening up an independent :sxhash argument to MAKE-HASH-TABLE;  EQ-type
tables are only really useful when it is the "address" of the key
that is important (rather than its contents), so suppying some other
:sxhash argument wouldn't be as useful as the built-in %POINTER.  Also,
most people are reluctant to admit %POINTER/LOC as a primitive in the
language since it is so deeply affected by memory-management strategy.


-- JonL --