Why number of buckets in hive should be equal to number of reducers?

1.5k Views Asked by At

In hive, why number of buckets should be equal to number of reducers?

2

There are 2 best solutions below

0
On BEST ANSWER

Because this is the most optimized way of working for mapreduce (all else equal). Tasks will be divided among reducers.

In hive 0.x and 1.x you have to specify the following: hive.enforce.bucketing = true. This means that the number of reducers will be automatically determined based on the number of buckets in your table. In later versions of hive (2.x) this is set by default.

Source: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables

0
On

Number of reducers launched while inserting into a bucketed table is a divisor of number of buckets in that table. The divisor, which is closest to the max reducers set, is selected and that many reducers are launched.

Example:

Num of buckets in a table 5956.
hive.exec.reducers.max=1009
divisors of 5956=1489*4
number of launched reducers: 4

so either 1489 or 4 reducers can be launched but since max reducers that can be launched are 1009, only 4 reducers will run which can take a decade to run for big sized table.

Setting hive.exec.reducers.max=2000 will launch 1489 reducers.