Multiple compression on same hive table

881 Views Asked by Bheem Singh At 01 July 2025 at 13:42

I have a Hive table partitioned by Year/Month and it contains data for at least 7 years. What I want to do it compress the latest data (like upto 1 year old) through Snappy but the older data through a better compression technique like gzip etc. How can I perform this in Hive?

Original Q&A

There are 1 best solutions below

leftjoin On 11 June 2018 at 06:43

You can overwrite different partitions with different compression settings. Configure compression codec and do insert overwrite partitions you want to be compressed using this codec.

For snappy:

set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec; 
set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;

Overwrite partitions using snappy codec:

--enable dynamic partitions
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

--use where conditions to limit new data only
insert overwrite table table_name partition(partition_key)
select * from table_name where partition_key <your condition here> ;

For gzip use GzipCodec:

set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec; 
set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec;

Overwrite partitions using gzipcodec:

--enable dynamic partitions
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

--use where conditions to limit OLD data only
insert overwrite table table_name partition(partition_key)
select * from table_name where partition_key <your condition here> ;

In such way you can use different compression codecs for different partitions. And you do not need to specify codec during selecting these table. Hive will automatically recognize which codec should be used for decompression.

Of course this is not about self-contained file formats like ORC or parquet. They can have their own compression properties. For example orc.compress=SNAPPY

Multiple compression on same hive table

There are 1 best solutions below

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in HIVEQL

Related Questions in HADOOP2

Related Questions in HIVE-CONFIGURATION

Trending Questions

Popular # Hahtags

Popular Questions