[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

EQUAL, and hash tables, and value/object distinctions



    Date: Thu, 30 Jun 88 12:40:54 PDT
    From: jrose@Sun.COM (John Rose)

    Two remarks on EQUAL and hash tables, from a hash table fan.

    ...
       Date: Wed, 29 Jun 88 18:57:09 PDT
       From: Jon L White <edsel!jonl@labrea.stanford.edu>

       As you pointed out in subsequent discussion, a user-supplied EQUAL method,
       whether for CLOS classes or as a defstruct option, leaves open the question
       of mechanically verifying that the alleged predicate is reflexive, symmetric,
       and transitive.  Another major problem with using hash-tables on objects
       with non-default EQUAL methods is that the SXHASH function must be similarly
       extended for these data types; SXHASH must obey the property that
	   (equal x y)   imples   (= (sxhash x) (sxhash x))
       See CLtL p285.  At one time, there was a suggestion that hash tables admit 
       a user-supplied equivalence predicate; but it never got very far for just 
       this reason, that the equivalence predicate must be *** paired up** with an
       appropriate "numericalizer" function, and many implementors felt that users 
       wouldn't understand these issues and wouldn't be able to construct up their 
       own versions of sxhash to go with their alleged equivalence predicates. 

    Yuck.  Who are these brilliant Implementors, that we may all worship
    at their feet?  For they are the curators of such profound knowledge as
	   (EQUALITY-PREDICATE X Y)   ==>   (= (HASH-FN X) (HASH-FN Y))

Yes, but the equality-predicate must be transitive over the domain of
possible keys in order for a hash-function to work.

For example, one cannot construct a hash-function for #'= unless you limit
the valid keys to not include both a floating point number and two
integers that both coerce to that same float.

(i.e. 
  (let* ((a (expt 2 56))
	 (b (1+ a))
	 (c (float a)))
    (values (= a b) (= a c) (= b c)))
 => NIL T T
)

I'm not voting for condescension, nor for protecting programmers from
themselves.  I just want to point out that it is slightly more
complicated than you make it seem.

My personal opinion is that CLtL should have allowed both the predicate
and hash-function to be defined once it included hash tables in the
language spec.  In a real application the programmer will probably want
to construct a custom hash function anyway, even for :TEST 'EQ.  If the
table has a known, or typical, set of keys, that might allow a much more
efficient, or more collision-free, hash function.

Hash tables aren't so complicated that one can't construct them
yourself.  My guess (and only a guess) is that the only reason they're
included in the language spec is to give a portable way of hashing on
%pointer where that might be useful.  SXHASH would have been enough, and
that allows easy implementation of the common case of EQUAL hash tables,
too.  

The minimalist in me still wonders why hash tables were included in
CLtL.  

    Seriously, I hear an alarming amount of condescension in the above reasoning.

    The above logical implication is ALL THAT IS NEEDED to construct an
    appropriate hash function.  It is the sole and fundamental insight which
    makes hash tables work; I have relied on it for years, building many a
    hash table in various extensible frameworks.  What's the problem of
    requiring a hash-table builder from promising that his hash and equality
    functions bear the simple relation to each other?  And if he can't
    handle that, what chance does he have of building a working stream type,
    or making sense out of CLOS?  (Note that many vendors have supplied more
    or less hairy stream definition facilities, which usually require
    invariants much more complex than the hash invariant above.)