can reduce task accept compressed data in hadoop

158 Views Asked by bruceparker At 29 August 2013 at 20:03

we see that map can accept and output compressed and uncompressed data. I was going through cloudera training and teacher mentioned that reduce task input has to be in form o key value and thus can't work on compressed data.

Is that right? If thats right, how can i handle network latency when transferring bug data from shuffler/partitioner to reduce task.

Thanks for your help.

Original Q&A

There are 2 best solutions below

Sean Owen On 29 August 2013 at 20:39

If the Mapper can output compressed data, of course, the Reducer can accept compressed data. This is transparent to both of them, so the output is compressed and uncompressed automatically.

I think he/she must have been saying that Hadoop must uncompress that compressed input for you since the Reducer is not expecting compressed data that it has to uncompress itself.

Reducers can also output compressed data, and that's controlled separately. You can control the codec. You can also read compressed data as input to a Mapper automatically.

There are some catches though: for example, gzip compressed files can't be split by a Mapper, and that's bad for parallelism. But a bzip compressed file can be split in some cases.

Derek Farren On 30 August 2013 at 04:15

Yes it can. Just add this on your driver class' main method:

  Configuration conf = new Configuration();
  conf.setBoolean("mapred.compress.map.output", true);
  conf.setClass("mapred.map.output.compression.codec", SnappyCodec.class, CompressionCodec.class);

can reduce task accept compressed data in hadoop

There are 2 best solutions below

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in HADOOP-PARTITIONING

Trending Questions

Popular # Hahtags

Popular Questions