Any way to compute statistics on a hive table for all partitions with a single analyze command?

33.7k Views Asked by At

The syntax I see for computing statistics in hive seems to indicate the answer to the title question would be 'no':

ANALYZE TABLE [TABLENAME] PARTITION(parcol1=…, partcol2=….) COMPUTE STATISTICS

However, I wanted to throw it out here, since it i surprising that it were always required to write a script to iterate over the partitions to generate the per-partition statements. We have about a thousand partitions on this small table right now and it will be growing by orders of magnitude.

BTW I tried the following without specifying the partition:

hive> analyze table metrics compute statistics;
FAILED: SemanticException [Error 10115]: Table is partitioned and partition specification is needed
3

There are 3 best solutions below

2
msciwoj On BEST ANSWER

Yes, you can.

At least from hive v0.13 which I'm on. Just try partition spec syntax without specific values (no =… bits)

If you're using FOR COLUMNS then you can't due to the bug: https://issues.apache.org/jira/browse/HIVE-4861

1
SKP On

According to Hive manual if you do not specify partition specs statistics are gathered for entire table, https://cwiki.apache.org/confluence/display/Hive/StatsDev

When the user issues that command, he may or may not specify the partition specs. If the user doesn't specify any partition specs, statistics are gathered for the table as well as all the partitions (if any).
3
minhas23 On

I am on latest Hive 1.2 and the following command works very fine

hive> analyze table member partition(day) compute statistics noscan;
Partition mobi_mysql.member{day=20150831} stats: [numFiles=7, numRows=-1, totalSize=4735943322, rawDataSize=-1]
Partition mobi_mysql.member{day=20150901} stats: [numFiles=7, numRows=117512, totalSize=19741804, rawDataSize=0]
Partition mobi_mysql.member{day=20150902} stats: [numFiles=7, numRows=-1, totalSize=17734601, rawDataSize=-1]
Partition mobi_mysql.member{day=20150903} stats: [numFiles=7, numRows=-1, totalSize=13091084, rawDataSize=-1]
OK
Time taken: 2.089 seconds