In Hive, I understand how bucketing works for External Tables and Non Acid Managed tables.Based on the column that is specified inside clustered-by clause in the corresponding DDL statement, bucket is identified for corresponding row and that data is inserted into that relevant directory on the HDFS.
For Hive ACID Tables, I checked the directory structure of tables and noticed data is directed towards specific buckets inside delta directory though no bucketing is configured in corresponding DDL statement while creating that table. Following is example
hdfs dfs -ls /warehouse/tablespace/managed/hive/part.db/employee/delta_0000001_0000001_0000
Found 3 items /warehouse/tablespace/managed/hive/part.db/employee/delta_0000001_0000001_0000/bucket_00000_0 /warehouse/tablespace/managed/hive/part.db/employee/delta_0000001_0000001_0000/bucket_00001_0 /warehouse/tablespace/managed/hive/part.db/employee/delta_0000001_0000001_0000/bucket_00002_0
Can someone please help here in understanding the above directory structure of Hive ACID tables as there are total 3 buckets are present inside delta directory for the employee table?
If you are curious about the code then follow below link.
https://github.com/apache/hive/blob/36f5d91acb0fac00a5d46049bd45b744fe9aaab6/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L490
Basically this is done for delete operation in hive.