[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

EQUAL, and hash tables, and value/object distinctions



   Date: Thu, 30 Jun 88 16:36 EDT
   From: Michael Greenwald <Greenwald@STONY-BROOK.SCRC.Symbolics.COM>

       Date: Thu, 30 Jun 88 12:40:54 PDT
       From: jrose@Sun.COM (John Rose)

       Two remarks on EQUAL and hash tables, from a hash table fan.

       ...
	  Date: Wed, 29 Jun 88 18:57:09 PDT
	  From: Jon L White <edsel!jonl@labrea.stanford.edu>

	  As you pointed out in subsequent discussion, a user-supplied EQUAL method,
	  whether for CLOS classes or as a defstruct option, leaves open the question
	  of mechanically verifying that the alleged predicate is reflexive, symmetric,
	  and transitive.
       ...

       [Flame from Rose...]
	      (EQUALITY-PREDICATE X Y)   ==>   (= (HASH-FN X) (HASH-FN Y))

   Yes, but the equality-predicate must be transitive over the domain of
   possible keys in order for a hash-function to work.

   For example, one cannot construct a hash-function for #'= unless you limit
   the valid keys to not include both a floating point number and two
   integers that both coerce to that same float.

   (i.e. 
     (let* ((a (expt 2 56))
	    (b (1+ a))
	    (c (float a)))
       (values (= a b) (= a c) (= b c)))
    => NIL T T
   )

   I'm not voting for condescension, nor for protecting programmers from
   themselves.  I just want to point out that it is slightly more
   complicated than you make it seem.

Well, for the record, I'll try not to gloss over the fact that the
EQUALITY-PREDICATE must be an equality predicate (:-).  That is, it must
be reflexive, symmetric, and transitive.  In my experience, equality
predicates are not difficult to write, and are verified by inspection,
so "reflexive, symmetric, and transitive" is not really a burden.
Writing a hash function takes a little more effort, since you've got to
make sure its structure accurately mirrors the equality predicate
you've already written.

Your point about #'= and domains is well taken.  I had forgotten
about the CL numeric conversions.  I think the general principle
here is that if you're comparing equality over domains inside
of which there are implicit conversions that lose information,
equality testing and hashing should be performed in such a way that
the information is reliably thrown out, or reliably retained.
(And these two choices yield key domains of different granularities.)
For example, a hash table over real numbers including floats might
use one of these equality predicates instead of #'=:
	;; Coarser grain:
	#'(lambda (x y) (= (float x 0L0) (float y 0L0)))
	;; Finer grain:
	#'(lambda (x y) (= (rational x) (rational y)))

					-- John