Change the default delimiter of the mapreduce

569 Views Asked by fTTTTT At 28 July 2025 at 11:18

Hi I am a beginner to MapReduce, and I want to program the WordCount so it output the K/V pairs. But the question is I don't want to use the 'tab' as the key value pair delimiter for the file. How could I change it?

The code I use is slightly different from the example one. Here is the driver class.

    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Job1");
    job.setJarByClass(Simpletask.class);
    job.setMapperClass(TokenizerMapper.class);
    //job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(IntWritable.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);

Since I want the file name to be respective with the partition of the reducer, I use multipleout.write() in the reduce function, and thus the code is slightly different.

public void reduce(IntWritable key,Iterable<Text> values, Context context) throws IOException, InterruptedException {
    String accu = "";
    for (Text val : values) {
        String[] entry=val.toString().split(",");
        String MBR = entry[1];
        //ASSUME MBR IS ENTRY 1. IT CAN BE REPLACED BY INVOKING FUNCTION TO CALCULATE MBR([COORDINATES])
        String mes_line = entry[0]+",MBR"+MBR+" ";
        result.set(mes_line);
        mos.write(key, result, generateFileName(key));
    }

Any help will be appreciated! Thank you!

Original Q&A

There are 1 best solutions below

Ramzy On 03 August 2015 at 03:05

Since you are using FileInputFormat the key is the line offset in the file, and the value is a line from the input file. It's upto the mapper to split the input line with any delimiter. You can use it to split the record read in map method. The default behavior comes with a specific input format like TextInputFormat etc.

Change the default delimiter of the mapreduce

There are 1 best solutions below

Related Questions in JAVA

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in OUTPUTFORMAT

Trending Questions

Popular # Hahtags

Popular Questions