hive sql add sort or distribute then the result file size bigger than before

228 Views Asked by At

My hive tables are all lzo compressed type. I have two hive-sql like this:

[1]

set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
insert overwrite table a partition(dt='20160420')
select col1, col2 ... from b where dt='20160420';

because the [1] sql will has no reduce, it will create many small files.

[2]

set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
insert overwrite table a partition(dt='20160420')
select col1, col2 ... from b where dt='20160420'
  sort by col1;

The only diffrent is the last line, sql [2] has the "sort by ".

The data count and content is same, But the file size of [2] is more bigger than [1], our hdfs file size is almost 1times greater than before.

Can you help me find the reason.

0

There are 0 best solutions below