Bucket count in unordered_map

Question

Bucket count in unordered_map

6.8k Views Asked by Dr. Debasish Jana At 20 October 2024 at 03:51

In the sample program given below (source: http://www.cplusplus.com/reference/unordered_map/unordered_map/rehash/)

// unordered_map::rehash
#include <iostream>
#include <string>
#include <unordered_map>

int main ()
{
  std::unordered_map<std::string,std::string> mymap;

  mymap.rehash(20);

  mymap["house"] = "maison";
  mymap["apple"] = "pomme";
  mymap["tree"] = "arbre";
  mymap["book"] = "livre";
  mymap["door"] = "porte";
  mymap["grapefruit"] = "pamplemousse";

  std::cout << "current bucket_count: " << mymap.bucket_count() << std::endl;

  return 0;
}

output becomes:

current bucket_count: 23

why the bucket count becomes 23? What's the impact on heapsize? When does the heap allocation done? On bucket rehash or on actual insert? When the dynamic deallocation gets done? when clear() is used or erase() or both?

Original Q&A

There are 2 best solutions below

**Brian Bi** · Answer 1

Brian Bi On 01 March 2017 at 05:09

The default rehash policy used by libstdc++ is to go up to the smallest prime number of buckets greater than or equal to the number requested. 23 is the smallest prime above 20.

**Richard** · Answer 2

Hash tables are usually sized to be "comfortably" larger than the number of items to be stored in the table. This is because the probability of two different items mapping to the same bucket increases as a hash table fills.

As the following image from wikipedia (image source) shows, for some methods of collision resolution, the behavior of the hash table suffers dramatic if its "load factor" --- the percentage of buckets which are used --- surpasses a certain fraction.

Therefore, the number of buckets should always be larger than the number of elements in your hash table.

Having the number of buckets be a prime number helps ensure that entries in the hash table are distributed evenly across it. In general, any key which shares a common factor with the number of buckets will be hashed to a bucket that is a multiple of this factor. Therefore, if you set the number of buckets to 20 and your hash values happen to be even, you'll waste about 50% of the table's space. The situation's worse if your keys have factors like 4, 5, or 10.

Knowing the above, you can see why the hash table might be larger than you specify: the extra space (usually) contributes to performance. You can also see why the number of bins will be a prime: because that gives better usage of the space you do have. Combining these makes 23 seem like a reasonable choice.

Bucket count in unordered_map

There are 2 best solutions below

Related Questions in C++

Related Questions in HEAP-MEMORY

Related Questions in UNORDERED-MAP

Related Questions in DYNAMIC-ALLOCATION

Related Questions in G++4.9

Trending Questions

Popular # Hahtags

Popular Questions