How can I identify the Input Formats in MapReduce Program

65 Views Asked by At

I just started learning Hadoop and there are various formats of input types. I have few programs to study and my main question is how can I identify if the input format is TextInputFormat or KeyValueTextInputFormat or any other. Your help is really appreciated

1

There are 1 best solutions below

0
On

You don't have to identify which InputFormat is being used by the MapReduce program.

InputFormat is something that you can specify in your program explicitly and the MapReduce job will use that.

If you don't specify anything, it uses the default which is TextInputFormat which extends FileInputFormat<LongWritable, Key>. That's why in a simple wordcount program, you would often see the Mapper class defined as :

public class MyMapper extends Mapper<LongWritable, Key, Text, IntWritable> {
    //...
}

You can specify the InputFormat to use in the JobConf object :

JobConf job = new JobConf(new Configuration(), MyJob.class);

job.setInputFormat(SequenceFileInputFormat.class);
job.setOutputFormat(SequenceFileOutputFormat.class);

Link to: InputFormat.class for further reading.