Using : Amazon Aws Hive (0.13)
Trying to : output orc files with snappy compression.
create external table output{
col1 string}
partitioned by (col2 string)
stored as orc
location 's3://mybucket'
tblproperties("orc.compress"="SNAPPY");
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.compress.output = true;
set mapred.output.compression.type = BLOCK;
set mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;
insert into table output
partition(col2)
select col1,col2 from input;
The problem is that, when I look at the output in the mybucket directory, it is not with SNAPPY extension. However, it is a binary file though. What setting am I missing out to convert these orc file to be compressed and output with a SNAPPY extension ?
OrcFiles are binary files that are in a specialized format. When you specify
orc.compress = SNAPPY
the contents of the file are compressed using Snappy. Orc is a semi columnar file format.Take a look at this documentation for more information about how data is laid out.
In short, your files are compressed using Snappy codec, you just can't tell that they are because the blocks inside the file are what's actually compressed.