Hive - Is it mandatory to have '=' for external table to consider as partition

462 Views Asked by At

I am new to Hive and have a below basic question:

I am trying to create external table on HDFS directory at location

    /projects/score/output/scores_2020-06-30.gzip

but it is not considering it as partition.

Should developer need to change directory name "scores=yyyy-mm-dd" in place of "scores_yyyy-mm-dd.gzip" like "/projects/score/output/scores=2020-06-30" then only it would consider as partitioned?

i.e. Is it mandatory to have '=' for external table to consider as partition

Or can I change something in below table while creation. Trying as below:

CREATE EXTERNAL TABLE IF NOT EXISTS XYZ (
...
)
PARTITIONED BY (scores STRING)
LOCATION '/projects/score/output/';

Thanks in advance!

1

There are 1 best solutions below

2
On

You can define partition on top of any location, even outside table directory using ALTER TABLE ADD PARTITION. Partition in HDFS is a directory usually inside table location but not necessarily. If it is inside table directory, then you can use msck repair table to attach existing subdirestories inside table directory as partitions, it will scan table location and add partitions metadata.

In your example partition directory is missing, you have only table directory with a file inside. Filename does not matter in this case.

It is not absolutely mandatory to have partition directory in the format key=value, though msck repair table may not work in your Hive distribution, you still can add partitions using ALTER TABLE ADD PARTITION ... LOCATION command on top of any directory.

It may depend on vendor. For example on Qubole, ALTER TABLE RECOVER PARTITIONS(EMR alternative of MSCK REPAIR TABLE) works fine with directories like '2020-06-30'.

By default when inserting data using dynamic partitioning, it creates partition folders in the format key=value, but if you creating partition directories using some other tools, 'value' as partition folder name is okay. Just check does MSCK REPAIR work or not in your case. If it does not, create directories key=value if you need MSCK REPAIR.

The name of file(s) and the number of files inside partition folder does not matter in this context.