I have two tables, tbl_a
and tbl_b
, both formatted as ORC, and partitioned on the column dt
. One table uses a partition format %Y%m%d%H%M
whereas the other table uses %Y%m%d
.
When I look at the execution plan of a simple SELECT + WHERE statement in Hive 1.2.1.2.4 (using TEZ 0.7.0.2.4), I only see the predicate filter set for tbl_a
, but not the other tbl_b
. This means, that in tbl_b
a full table scan will be executed and not only a scan over the desired partition. Both queried partitions exist and contain data.
hive> EXPLAIN SELECT * FROM tbl_a WHERE dt='1001010600';
OK
Plan not optimized by CBO.
Stage-0
Fetch Operator
limit:-1
Select Operator [SEL_2]
outputColumnNames:["_col0","_col1","_col2","_col3"]
Filter Operator [FIL_4]
predicate:(dt = '1001010600') (type: boolean)
TableScan [TS_0]
alias:tbl_a
Time taken: 0.866 seconds, Fetched: 12 row(s)
hive> EXPLAIN SELECT * FROM tbl_b WHERE dt='161001';
OK
Plan not optimized by CBO.
Stage-0
Fetch Operator
limit:-1
Select Operator [SEL_2]
outputColumnNames:["_col0","_col1","_col2","_col3"]
TableScan [TS_0]
alias:tbl_b
Time taken: 0.904 seconds, Fetched: 10 row(s)
This behavior doesn't look very clear to me. Which criteria decides if a where condition is used as predicate filter or not?