Mongo Hadoop Connector Issue

1.5k Views Asked by At

I am trying to run a MapReduce job: I pull from Mongo and then write to HDFS, but I cannot seem to get the job to run. I could not find an example, but the issues I am having that if I set an input path of Mongo it loos for the output path of Mongo. And now I am getting an authentication error when my MongoDB instance does not have authentication.

final Configuration conf = getConf();
final Job job = new Job(conf, "sort");
MongoConfig config = new MongoConfig(conf);
MongoConfigUtil.setInputFormat(getConf(), MongoInputFormat.class);
FileOutputFormat.setOutputPath(job, new Path("/trythisdir"));
MongoConfigUtil.setInputURI(conf,"mongodb://localhost:27017/fake_data.file");
//conf.set("mongo.output.uri", "mongodb://localhost:27017/fake_data.file");
job.setJarByClass(imageExtractor.class);
job.setMapperClass(imageExtractorMapper.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

job.setInputFormatClass( MongoInputFormat.class );

// Execute job and return status
return job.waitForCompletion(true) ? 0 : 1;

Edit: This is the current error I am having:

Exception in thread "main" java.lang.IllegalArgumentException: Couldn't connect and authenticate to get collection
    at com.mongodb.hadoop.util.MongoConfigUtil.getCollection(MongoConfigUtil.java:353)
    at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitterByStats(MongoSplitterFactory.java:71)
    at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitter(MongoSplitterFactory.java:107)
    at com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:56)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1079)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1096)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:177)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:995)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
    at com.orbis.image.extractor.mongo.imageExtractor.run(imageExtractor.java:103)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at com.orbis.image.extractor.mongo.imageExtractor.main(imageExtractor.java:78)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.NullPointerException
    at com.mongodb.MongoURI.<init>(MongoURI.java:148)
    at com.mongodb.MongoClient.<init>(MongoClient.java:268)
    at com.mongodb.hadoop.util.MongoConfigUtil.getCollection(MongoConfigUtil.java:351)
    ... 22 more
3

There are 3 best solutions below

2
helmy On

You haven't shared the complete code so it's hard to tell, but what you've got there does not look consistent with typical usage of the MongoDB Connector for Hadoop.

I would suggest that you start with the examples in github.

0
İlker Korkut On

Late answer.. It may be helpul for people. I encountered with same problem while playing with Apache Spark.

I think you should set correctly mongo.input.uri and mongo.output.uri which will be used by hadoop and also input and output formats.

/*Correct input and output uri setting on spark(hadoop)*/
conf.set("mongo.input.uri", "mongodb://localhost:27017/dbName.inputColName");
conf.set("mongo.output.uri", "mongodb://localhost:27017/dbName.outputColName");

/*Set input and output formats*/
job.setInputFormatClass( MongoInputFormat.class );
job.setOutputFormatClass( MongoOutputFormat.class )

Btw, if "mongo.input.uri" or "mongo.output.uri" strings have typos it causes same error.

1
O.Chougna On

Replace:

MongoConfigUtil.setInputURI(conf, "mongodb://localhost:27017/fake_data.file");

by:

MongoConfigUtil.setInputURI(job.getConfiguration(), "mongodb://localhost:27017/fake_data.file");

The conf object is already 'consumed' by your job, so you need to set it directly on the configuration of the job.