[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: hash tables and GC
Now that mail on this topic has quieted down, I would like to try to
summarize. My original message suggested that something like Pop11's
"temporary properties" would be useful in Common Lisp, and said that
temporary properties were similar to hash tables except that being
in one did not prevent GC. This left imprecise such things as the
meaning of "in".
What I had in mind was this: Consider an EQ hash table. I can keep
track of some property of objects by entering the objects as keys in
a hash table. There is, however, an unfortunate side effect: the
keys cannot be garbage collected until the table is. I suspect
this greatly restricts the cases in which hash tables can be used.
For example, one way to associate new information with an object would
be to use CLOS to define a subclass of the object's class. The
subclass would have one additional slot, and you could call CHANGE-CLASS
to add that slot to a given object. In this case, the association
would not affect the collectibility of the object. However, there may
be times when you would rather not make the new information be part of
the object (or when you can't because the object isn't of a kind that
can have its class changed). It might seem that you could use a hash
table. But since objects have indefinite lifetimes, the hash table
has to stay around more or less forever, and then the objects have to
stay around forever too.
[Suppose you implement SYMBOL-PLIST using a hash table instead of a
slot that's part of each symbol. This seems reasonable until you
notice that then no symbol with a plist -- interned or not --
could ever be collected.]
It's probably best to have a concrete proposal, so I'll make one:
Add a keyword parameter :TEMPORARY to MAKE-HASH-TABLE. If this
argument is specified and not NIL, and if :TEST is #'EQ or #'EQL,
entries in the table may be removed by the GC if the key (i.e.,
an object EQ or EQL to the key) is accessible only through the
table(*). Entries in EQUAL tables are never so removed, nor are
numbers in EQL tables. [Explanation: in these cases, it is
generally possible to construct new objects that are respectively
EQUAL or EQL to the key.]
(*) Actually, I think this should be "accessible only through
such tables", because something might be in more than one.
Pop11 has a similar capability as "temporary properties".
(For this purpose, we can think of Pop as Lisp with a different
T has "weak sets". T also has OBJECT-HASH and OBJECT-UNHASH,
which use "weak pointers". If (OBJECT-HASH object) => i, then
(OBJECT-UNHASH i) => object if the object is still accessible
and false if it is not. i is an integer.
R2RS Scheme also has OBJECT-HASH and OBJECT-UNHASH, but they
were not "essential" and have since been removed.
0. Can such tables be implemented?
It seems possible that some GC methods might have difficulty,
but no one has mentioned anything that makes temporary entries
impossible (or even impractical).
Note that since CL does not require any GC at all, it probably
cannot require that entries ever actually vanish (even if there
is a GC).
1. Tables or sets?
For some applications, all that's required is a "set" of objects
that might be used (or reused) if they still exist. This is
equivalent to a table where the value for a key is always T or
NIL but is simpler and requires less storage. Moreover, temporary
table entries can be implemented, given such "weak sets", by
occasionally going through the table and removing entries whose
keys are not still in the set.
Minimalists may therefore prefer sets; pragmatists will tend to favor
tables because they're more generally useful.
2. Hash tables or a new kind of table?
There is some reason to say that adding a new feature to hash tables
is not the best solution. In particular, there are problems with EQL
and EQUAL tables. However, a new kind of table would most likely be
so similar to a hash table that it might as well *be* a hash table.
This dilemma could be considered an argument for sets instead of
* Instead of a keyword argument to MAKE-HASH-TABLE, have a separate
constructor, MAKE-PROPERTY-TABLE, say. This implies that, instead
of property-table being a subtype of hash-table, hash- and property-
tables would both be subtypes of a type table.
However, it is difficult to come up with a good name for these things,
and I find this relationship between the two types of table less
natural than adding a new dimension to hash tables.
* Have a keyword argument to MAKE-ARRAY. [This was suggested by
Christopher Eliot <ELIOT@edu.umass.cs>, who also suggested the
keyword :GC-PROTECTING.] Membership in an array is, of course,
by EQ, thus avoiding questions about EQL and EQUAL. Such arrays
would be a kind of "weak set".
I do not think such arrays would be very useful as-is because
they would require a linear search for membership.
3. What about EQL and EQUAL hash tables?
It does not make much sense to have temporary entries in EQUAL hash
tables. Suppose an entry could be removed if there were no accessible
objects EQUAL to the key. This is not a condition that it is practical
to test. Nor is it sufficient: it is often possible to create new
EQUAL objects, and so the entry should remain unless it is not
possible to create new EQUAL objects.
EQL tables share this problem, because it is always possible to create
a new EQL number. Nonetheless, any entry that was not a number could
be subject to collection as in EQ tables.
Well, what about numbers in EQ tables? What about numbers that are
represented directly rather than given storage of their own? Such
problems already exist for EQ hash tables and are not (I think) made
noticeably worse by the addition of temporary entries. (Suppose
you have a number. How do you know *that number* is in an EQ
4. Pointers to the value.
In several messages, it was suggested (or assumed) that a table entry
would not be removed unless both the key and its associated value
could be collected. Against this notion, I would cite the following:
(1) The key and value are not symmetric (hash tables are mappings
from keys to values) and so it is reasonable to treat them differently.
(2) Tables in which only the key need be inaccessible are more useful --
often the value will be something like T or NIL that will have a
lifetime at least as long as the table's.
5. The implications of MAPHASH.
> Date: Fri, 22 Jul 88 15:19 EDT
> From: Barry Margolin <firstname.lastname@example.org>
> Date: Fri, 22 Jul 88 18:23:53 BST
> From: Jeff Dalton <jeff%aiai.edinburgh.ac.uk@NSS.Cs.Ucl.AC.UK>
> Even if you do a MAPHASH, there is no way for you to tell K should be
> in the table (or even that it exists) unless you have independent
> access to it. So, the possibility of MAPHASH would still allow K
> and V to be removed. Of course, you could gather various statistics
> with MAPHASH that would change if K were removed (e.g., the number of
> keys would decrease), but I think that's OK.
> That's not quite correct. You could remember some property of the
> key, or a value that you expect to find, without retaining independent
> access to the key itself. [...]
> (defun gethash-by-pname (string table &optional default)
> (maphash #'(lambda (key val)
> (when (and (symbolp key)
> (string-equal (symbol-name key) string))
> (return-from gethash-by-pname (values val t)))
> (values default nil))
> The general point is that MAPHASH allows you to find entries that have
> properties other than the one defined by the table's equality predicate.
OK, but assuming some key K has pname S and you have no pointer to K
other then the entry in the hash table, all your program can know is
that some key has (well, had) pname S, not that K was in the table.
To know that K (up to EQ) is in the table, you have to have a pointer
Jeff Dalton, JANET: J.Dalton@uk.ac.ed
AI Applications Institute, ARPA: J.Daltonemail@example.com
Edinburgh University. UUCP: ...!ukc!ed.ac.uk!J.Dalton