Monday, November 2, 2009

Recipe 5.5. Using an Array or Other Modifiable Object as a Hash Key










Recipe 5.5. Using an Array or Other Modifiable Object as a Hash Key



Problem


You want to use a modifiable built-in object (an array or a hash, but not a string) as a key in a hash, even while you modify the object in place. A naive solution tends to lose hash values once the keys are modified:



coordinates = [10, 5]
treasure_map = { coordinates => 'jewels' }
treasure_map[coordinates] # => "jewels"

# Add a z-coordinate to indicate how deep the treasure is buried.
coordinates << -5

coordinates # => [10, 5, -5]
treasure_map[coordinates] # => nil
# Oh no!





Solution


The easiest solution is to call the
Hash#rehash
method every time you modify one of the hash's keys.
Hash#rehash
will repair the broken treasure map defined above:



treasure_map.rehash
treasure_map[coordinates] # => "jewels"



If this is too much code, you might consider changing the definition of the object you use as a hash key, so that modifications don't affect the way the hash treats it.


Suppose you want a reliably hashable Array class. If you want this behavior universally, you can reopen the Array class and redefine hash to give you the new behavior. But it's safer to define a subclass of Array that implements a reliable-hashing mixin, and to use that subclass only for the
Arrays
you want to use
as hash keys:



module ReliablyHashable
def hash
return object_id
end
end

class ReliablyHashableArray < Array
include ReliablyHashable
end



It's now possible to keep track of the jewels:



coordinates = ReliablyHashableArray.new([10,5])
treasure_map = { coordinates => 'jewels' }
treasure_map[coordinates] # => "jewels"

# Add a z-coordinate to indicate how deep the treasure is buried.
coordinates.push(-5)

treasure_map[coordinates] # => "jewels"





Discussion


Ruby performs hash lookups using not the key object itself but the object's hash code (an integer obtained from the key by calling its hash method). The default implementation of hash, in Object, uses an object's internal ID as its hash code. Array, Hash, and String override this method to provide different behavior.


In the initial example, the hash code of [10,5] is 41 and the hash code of [10,5,5] is83. The mapping of the coordinate list to 'jewels' is still present (it'll still show up in an iteration over each_pair), but once you change the coordinate list, you can no longer use that variable as a key.


You may also run into this problem when you use a hash or a string as a hash key, and then modify the key in place. This happens because the hash implementations of many built-in classes try to make sure that two objects that are "the same" (for instance, two distinct
arrays with the same contents, or two distinct but identical strings) get the same hash value. When coordinates is [10,5], it has a hash code of 41, like any other Array containing [10,5]. When coordinates is [10,5,5] it has a hash code of83, like any other Array with those contents.


Because of the potential for confusion, some languages don't let you use arrays or hashes
as hash keys at all. Ruby lets you do it, but you have to face the consequences if the key changes. Fortunately, you can dodge the consequences by overriding hash to work the way you want.


Since an object's internal ID never changes, the Object implementation is what you want to get reliable hashing. To get it back, you'll have to override or subclass the hash method of Array or Hash (depending on what type of key you're having trouble with).


The implementations of hash given in the solution violate the principle that different representations of the same data should have the same hash code. This means that two ReliablyHashableArray objects will have different
hash codes even if they have the same contents. For instance:



a = [1,2]
b = a.clone
a.hash # => 11
b.hash # => 11
a =
ReliablyHashableArray.new([1,2])
b = a.clone
a.hash # => -606031406
b.hash # => -606034266



If you want a particular value in a hash to be accessible by two different
arrays with the same contents, then you must key it to a regular array instead of a ReliablyHashableArray. You can't have it both ways. If an object is to have the same hash key as its earlier self, it can't also have the same hash key as another representation of its current state.


Another solution is to freeze your
hash keys. Any frozen object can be reliably used as a hash key, since you can't do anything to a frozen object that would cause its hash code to change. Ruby uses this solution: when you use a string as a hash key, Ruby copies the string, freezes the copy, and uses that as the actual hash key.




See Also


  • Recipe 8.15, "Freezing an Object to Prevent Changes"













No comments: