Indexed .lzo log files performing slower than .gz compressionxt

137 Views Asked by At

I have some log files compressed at lzo setting 7 and gzip at default compression and my results are as follows:

MapReduce job over:

  • 1GB .gz file - 340 seconds
  • 1GB .lzo file un-indexed - 410 seconds
  • 1GB .lzo file indexed - 380 seconds

The MapReduce job simply utilizes the Hadoop-LZO library's LzoTextInputFormat class instead of the usual TextInputFormat class. That's the only difference.

I see 37 map tasks come through and split up the job and use the .index file, but the performance leaves a lot to be desired. Any ideas?

0

There are 0 best solutions below