When does HIVE (not) use WHERE clause on partition as predicate filter

1.7k Views Asked by At

I have two tables, tbl_a and tbl_b, both formatted as ORC, and partitioned on the column dt. One table uses a partition format %Y%m%d%H%M whereas the other table uses %Y%m%d.

When I look at the execution plan of a simple SELECT + WHERE statement in Hive 1.2.1.2.4 (using TEZ 0.7.0.2.4), I only see the predicate filter set for tbl_a, but not the other tbl_b. This means, that in tbl_b a full table scan will be executed and not only a scan over the desired partition. Both queried partitions exist and contain data.

hive> EXPLAIN SELECT * FROM tbl_a WHERE dt='1001010600';
OK
Plan not optimized by CBO.

Stage-0
  Fetch Operator
     limit:-1
     Select Operator [SEL_2]
        outputColumnNames:["_col0","_col1","_col2","_col3"]
        Filter Operator [FIL_4]
           predicate:(dt = '1001010600') (type: boolean)
           TableScan [TS_0]
              alias:tbl_a

Time taken: 0.866 seconds, Fetched: 12 row(s)

hive> EXPLAIN SELECT * FROM tbl_b WHERE dt='161001';
OK
Plan not optimized by CBO.

Stage-0
  Fetch Operator
     limit:-1
     Select Operator [SEL_2]
        outputColumnNames:["_col0","_col1","_col2","_col3"]
        TableScan [TS_0]
           alias:tbl_b

Time taken: 0.904 seconds, Fetched: 10 row(s)

This behavior doesn't look very clear to me. Which criteria decides if a where condition is used as predicate filter or not?

0

There are 0 best solutions below