codegourmet

savory code and other culinary highlights

Hash Constructors Are the Devil’s Work!

| Comments

Hash constructors are a convenient way to default hash values.
But when you are defaulting hash values to complex objects you might produce unexpected results and hard to find bugs.

Let’s say we have a structure of counters, which we want to increment:

1
2
3
4
5
6
counters = {
  warnings: 0,
  errors: 0
}

counters[:errors] += 1 # => 1

This way of defaulting the hash has the drawback that the incrementer needs to know about which keys are available, else we’d get a NilError.

A hash with default values is created via Hash::new.

1
2
3
counters = Hash.new(0)
counters[:errors] += 1 # => 1
counters[:warnings]    # => 0

Pretty straight forward, right? This works because we told Hash to return the object given as constructor argument whenever a key is not found (the default is nil).

Hash Constructors with complex objects

Now let’s tackle another example:

1
2
3
4
5
6
measurements = {
  get_requests: [],
  post_requests: []
}

measurements[:get_requests] << 0.451 # => 0.451

Blech! Hash::new to the rescue:

1
2
measurements = Hash.new([])
measurements[:get_requests] << 0.451 # => 0.451

Neat! But wait:

1
2
measurements[:get_requests] # => 0.451
measurements                # => {}

What?? Where did our values go? Weird, we now have a hash with “hidden” values! Even more weirdness:

1
2
3
measurements[:post_requests] # => 0.451
measurements[:get_requests] << 0.733
measurements[:post_requests] # => [0.451, 0.733]

The symptoms are obvious: the key :post_requests should give a default value of [], but it returns :get_requests’s value! To find out why the assignments are “leaking” into other hash values, let’s rewrite the erroneous code by unrolling the constructor call and the array access:

1
2
3
4
5
6
# initialization:
default_value = []
measurements = Hash.new(default_value)

# dereferencing + assignment:
(measurements[:get_requests])[0] = 0.451

Can you now see what actually happened? We passed a reference to [] as the default object for the hash constructor. In Ruby, Array is a complex datatype and is not passed by value, but by reference.

This way we always modify the default value, because Hash is always returning a reference to it when we access an unknown key. We actually never really modified the value for the key :get_requests!

Proof:

1
2
3
4
5
6
7
8
9
10
11
12
default_value = []
measurements = Hash.new(default_value)

default_value                          # => []
measurements[:get_requests] << 0.451
default_value                          # => [0.451]

measurements[:get_requests].object_id  # => 70152525348120
measurements[:post_requests].object_id # => 70152525348120
default_value.object_id                # => 70152525348120

# => all objects are the same!

Hash Constructor Blocks

The solution is the block syntax of the Hash constructor. From the ruby core documentation:

1
2
If a block is specified, it [...] should return the default value.
It is the blocks responsibility to store the value in the hash if required.

The first statement alone is not sufficient in our case:

1
2
3
4
5
6
measurements = Hash.new do |hash, key|
  []
end

measurements[:get_requests] << 0.451
measurements[:get_requests]          # => []

This made things even worse! We’re losing the values because again what looks like an array-push/assignment is only a read access - and thus the value won’t get stored (instead, the returned default [] will be modified and then garbage collected!).

The second part of the documentation is very important for complex object default values: we need to store the value in the hash, so that on the next call (and on an immediate assignment!) the default value won’t be returned:

1
2
3
4
5
6
7
8
9
10
measurements = Hash.new do |hash, key|
  hash[key] = []
end

measurements[:get_requests] << 0.451
measurements[:get_requests]          # => [0.451]
measurements[:post_requests]         # => []

measurements[:get_requests].object_id  # => 70021749097560
measuremenst[:post_requests].object_id # => 69975815914720

Unresolved Weirdness

Now everytime an unknown hash key is read, the block is evaluated. Note that we not only return a new hash, but we also assign the hash on read. This has the following side effect:

1
2
3
4
5
6
7
8
9
measurements = Hash.new do |hash, key|
  hash[key] = []
end

measurements.keys           # => []
measurements[:get_requests] # => {}

measurements                # => { get_requests: [] }
measurements.keys           # => [:get_requests]

This might not seem like a problem, since implicitly every hash key of counters has the value {}. But it’s confusing that counters gets modified on read and you definitely wouldn’t expect measurements.keys returning different values before and after the read access!

Conclusion

  • flat Hash constructors with atomic default values are elegant and easy to use
  • always use assignment-block syntax for hash constructors with data types that are passed by reference
  • hash constructors with block syntax have side effects

Hash constructors for complex objects are dangerous

The errors stemming from a seemingly simple use of Hash.new([]) were really hard to find, as always is the case with unexpected behavior or messed up object references.

This constructor call looks so harmless that I promised myself to steer clear of hash constructors in the future - except for number values!

The correct implementation of supplying a block does have it’s own problems, which might bite anyone who does read access on a defaulting hash and then querying Hash.keys().

Happy Coding! – codegourmet

Comments