Hadoop ArrayWritable giving me a ClassCastException

2.7k Views Asked by At

EDIT: PROBLEM SOLVED - I had a pretty silly error.

I've got a MapReduce pipeline consisting of a map, reduce, map, and reduce. I use SequenceFileOutputFormat for the first reduce, and SequenceFileInputFormat for the second map. I've looked at usage for it, and it seems like I'm using it right. The types that I'm putting into this are IntWritable and IntPairArrayWritable (a custom ArrayWritable subclass that uses IntPairWritable from mahout). The problem is that when reading the IntPairArrayWritable in the second map, I get a ClassCastException when I try to get individual IntPairWritables out. I'm not sure if this is due to an error in how I'm using the ArrayWritable class or if it's something wrong with my use of SequenceFile{Input,Output}Format. I've looked at a bunch of examples here and elsewhere, and it looks to me like I'm doing both of them right, but I'm still getting an error. Any help?

The specifics:

Here's my first reducer class:

public static class WalkIdReducer extends MapReduceBase implements
        Reducer<IntWritable, IntPairWritable, IntWritable, IntPairArrayWritable> {

    @Override
    public void reduce(IntWritable walk_id, Iterator<IntPairWritable> values,
            OutputCollector<IntWritable, IntPairArrayWritable> output,
            Reporter reporter) throws IOException {
        ArrayList<IntPairWritable> value_array = new ArrayList<IntPairWritable>();
        while (values.hasNext()) {
            value_array.add(values.next());
        }
        output.collect(walk_id, IntPairArrayWritable.fromArrayList(value_array));
    }
}

And the second mapper class:

public static class NodePairMapper extends MapReduceBase implements
        Mapper<IntWritable, IntPairArrayWritable, IntPairWritable, Text> {

    @Override
    public void map(IntWritable key, IntPairArrayWritable value,
            OutputCollector<IntPairWritable, Text> output,
            Reporter reporter) throws IOException {
        // The following line gives a ClassCastException;
        // See IntPairArrayWritable.toArrayList(), below
        ArrayList<IntPairWritable> values = value.toArrayList();
        // other unimportant stuff
    }
}

Relevant parts of the job configuration for the first MapReduce:

    conf.setReducerClass(WalkIdReducer.class);
    conf.setOutputKeyClass(IntWritable.class);
    conf.setOutputValueClass(IntPairArrayWritable.class);
    conf.setOutputFormat(SequenceFileOutputFormat.class);

And for the second MapReduce:

    conf.setInputFormat(SequenceFileInputFormat.class);
    conf.setMapperClass(NodePairMapper.class);

And, finally, my ArrayWritable subclass:

public static class IntPairArrayWritable extends ArrayWritable
{
    // These two methods are what people say is all you need for
    // creating an ArrayWritable subclass
    public IntPairArrayWritable() {
        super(IntPairArrayWritable.class);
    }

    public IntPairArrayWritable(IntPairWritable[] values) {
        super(IntPairArrayWritable.class, values);
    }

    // Some convenience methods, so I can use ArrayLists in
    // other parts of the code
    public static IntPairArrayWritable fromArrayList(
            ArrayList<IntPairWritable> array) {
        IntPairArrayWritable writable = new IntPairArrayWritable();
        IntPairWritable[] values = new IntPairWritable[array.size()];
        for (int i=0; i<array.size(); i++) {
            values[i] = array.get(i);
        }
        writable.set(values);
        return writable;
    }

    public ArrayList<IntPairWritable> toArrayList() {
        ArrayList<IntPairWritable> array = new ArrayList<IntPairWritable>();
        for (Writable pair : this.get()) {
            // This line is what kills it.  I get a ClassCastException here.
            IntPairWritable int_pair = (IntPairWritable) pair;
            array.add(int_pair);
        }
        return array;
    }
}

The specific error I get is the following:

java.lang.ClassCastException: WalkAnalyzer$IntPairArrayWritable cannot be cast to org.apache.mahout.common.IntPairWritable
at WalkAnalyzer$IntPairArrayWritable.toArrayList(WalkAnalyzer.java:231)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:84)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:77)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

I am pretty baffled as to why what is coming out of the get() method from ArrayWritable is an instance of WalkAnalyzer$IntPairArrayWritable - I'm expecting get() to return an array of the elements contained by IntPairArrayWritable, as stated in the API.

EDIT

I found the problem. It was in how I wrote the constructors for IntPairArrayWritable. I called super(IntPairArrayWritable.class); when I should have called super(IntPairWritable.class);. The code should actually look like this:

public static class IntPairArrayWritable extends ArrayWritable
{
    // These two methods are what people say is all you need for
    // creating an ArrayWritable subclass
    public IntPairArrayWritable() {
        super(IntPairWritable.class);
    }

    public IntPairArrayWritable(IntPairWritable[] values) {
        super(IntPairWritable.class, values);
    }
}

I suppose it would have been a good idea to use a less obviously confused name for the ArrayWritable subclass, so the error would have been easier to spot.

1

There are 1 best solutions below

2
On

Check your import statements for IntPairWritable. Looks like you picked up the wrong package name in the Mapper and are therefore casting to a difference class, even though its name is IntPairWritable, too.