Gora MongoDb Exception, can't serialize Utf8

495 Views Asked by At

I'm trying to get nutch 2.3 work with mongoDB but I get the following exception:

java.lang.IllegalArgumentException: can't serialize class org.apache.avro.util.Utf8
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:284)
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:185)

I've found the following ticket related to this problem, which says it should be resolved in nutch 2.3: https://issues.apache.org/jira/browse/NUTCH-1843

There's another ticket for the Gora project which says this issue is actually resolved in Gora 0.6 which can be found in https://issues.apache.org/jira/browse/GORA-388 . However Nutch 2.3 uses gora 0.5. So I don't see how this issue would be resolved in nutch 2.3.

I really would like to use MongoDB, but I can't seem to overcome the issue. Is there anyone who has insight into this problem? Is it a configuration issue?

1

There are 1 best solutions below

0
On BEST ANSWER

The solution is to apply the following patch: https://issues.apache.org/jira/browse/NUTCH-1946 to your project. This patch updates gora to 0.6, which contains the fix for this problem.

If you run into a RuntimeException during the GeneratorJob, please add the following to your nutch-site.xml

<property>
    <name>io.serializations</name>
    <value>org.apache.hadoop.io.serializer.WritableSerialization</value>
    <description>A list of serialization classes that can be used for
        obtaining serializers and deserializers.</description>
</property>