What if a HashMap is full?

5k Views Asked by At

I know java Hashmap has a capacity and load factor parameter.So , if the number of items in this hashmap is more than capacity* load factor, a new hashmap will be reconstructed.I have some questions about the re-constructions of it:

  1. The previous hashmap will be reclaimed or still be in use if the reconstruction happened?
  2. since we need a larger size hashmap , so , will the hash function be changed ?
  3. For ConcurrentHashMap , what if one thread is inserting(Of cource, this insert operation has lead to a re-construction) and another thread is reading?For example, it will read from the old hashmap or from the new hashmap?
5

There are 5 best solutions below

0
On

The previous hashmap will be reclaimed or still be in use if the reconstruction happened?

It's still the same hashmap, just the internal storage is reconstructed. After reconstruction the old bucket array is not needed anymore and could be gc'ed.

Update: Internally HashMap has Node<K,V>[] table. During resize a new array will be constructed, the elements are moved and then table is replaced with the new array. After that operation the map itself would not reference the old array anymore so unless there are no other references (which is unlikely due to table being package private) it is elligible for gc.

since we need a larger size hashmap , so , will the hash function be changed ?

No, the hash function won't change. It generally doesn't depend on the number of buckets but the generated hash will be adjusted to get the correct bucket (e.g. by applying a modulo)

Update: HashMap calculates the bucket index like this: (size - 1) & hash, hash is the return value of the key's hashCode() method, which doesn't depend on the map itself.

For ConcurrentHashMap , what if one thread is inserting(Of cource, this insert operation has lead to a re-construction) and another thread is reading?For example, it will read from the old hashmap or from the new hashmap?

I'd have to guess here (I'll look up the code later) but I assume as long as threads are reading from the old buckets they'll still be in use and will be freed later.

Update: I had a quick look at the ConcurrentHashMap sources and there are reference to the current table which is used by get() and a possible nextTable which is the target for resize operations. During resize the elements are transferred to nextTable and at the end table is set to nextTable, effectively switching tables.

This means that during resizing the old table is still read and at some point it gets replaced. During insert operations there might be some blocking, e.g. by using synchronized blocks, especially if a resize is needed or already being performed.

The documentation also hints at this:

A hash table supporting full concurrency of retrievals and high expected concurrency for updates.

This means that get won't block but put, remove etc. might block at some point.

0
On

The previous hashmap will be reclaimed or still be in use if the reconstruction happened?

If there are no references to prior structures, they can be garbage collected.

since we need a larger size hashmap , so , will the hash function be changed ?

Not exactly. The hash is still the same, only the way hashes map to buckets is changed. For example, before the expansion, hashes 27,103 and 32,504 may map to the same bucket. After the expansion, they may map to different buckets. The hash map will have more buckets and fewer elements in each bucket.

For ConcurrentHashMap , what if one thread is inserting(Of cource, this insert operation has lead to a re-construction) and another thread is reading? For example, it will read from the old hashmap or from the new hashmap?

If you care, then you have to use a lock. You use ConcurrentHashMap because you want concurrency. If you want predictable ordering between reads and writes, you have to order them. This is true even if there is no re-construction. If you do a read and write at the same time, the read may or may not reflect the write. If you do a write in one thread and issue a read from another thread before that write returns, the read may or may not reflect the write. ConcurrentHashMap gives you sane results but doesn't enforce ordering (how could it?).

0
On

From HashMap API

1)

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

2) If you know the size well in advance better create hashbuckets while constructing map object.

HashMap(int initialCapacity)
0
On
  1. previous will be garbage collected
  2. hash function won't change but the hash values (buckets) might because hash function normally depends on size of the hashmap
  3. hashmaps are not thread safe. Use ConcurrentHashMap if the hashmap is shared amongst threads
0
On

As the number of elements in the HashMap increases, the capacity is expanded. The load factor is the measure that decides when to increase the capacity of the Map. The default load factor is 75% of the capacity.

When the number of items in the Map crosses the threshold limit, the capacity of the Map is doubled. when capacity is increased, we need to equally distribute all the entries (including existing entries and new entries) across all buckets. Here, we need rehashing. That is, for each existing key-value pair, calculate the hash code again with increased capacity as a parameter.(Rehashing)

for third question you should use concurrenthashmap to prevent bug happen.