How bucketing works for Hive Acid Tables?

91 Views Asked by At

In Hive, I understand how bucketing works for External Tables and Non Acid Managed tables.Based on the column that is specified inside clustered-by clause in the corresponding DDL statement, bucket is identified for corresponding row and that data is inserted into that relevant directory on the HDFS.

For Hive ACID Tables, I checked the directory structure of tables and noticed data is directed towards specific buckets inside delta directory though no bucketing is configured in corresponding DDL statement while creating that table. Following is example

hdfs dfs -ls /warehouse/tablespace/managed/hive/part.db/employee/delta_0000001_0000001_0000
Found 3 items /warehouse/tablespace/managed/hive/part.db/employee/delta_0000001_0000001_0000/bucket_00000_0 /warehouse/tablespace/managed/hive/part.db/employee/delta_0000001_0000001_0000/bucket_00001_0 /warehouse/tablespace/managed/hive/part.db/employee/delta_0000001_0000001_0000/bucket_00002_0

Can someone please help here in understanding the above directory structure of Hive ACID tables as there are total 3 buckets are present inside delta directory for the employee table?

1

There are 1 best solutions below

0
Raid On

If you are curious about the code then follow below link.

https://github.com/apache/hive/blob/36f5d91acb0fac00a5d46049bd45b744fe9aaab6/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L490

Basically this is done for delete operation in hive.