I am trying to run a MapReduce job: I pull from Mongo and then write to HDFS, but I cannot seem to get the job to run. I could not find an example, but the issues I am having that if I set an input path of Mongo it loos for the output path of Mongo. And now I am getting an authentication error when my MongoDB instance does not have authentication.
final Configuration conf = getConf();
final Job job = new Job(conf, "sort");
MongoConfig config = new MongoConfig(conf);
MongoConfigUtil.setInputFormat(getConf(), MongoInputFormat.class);
FileOutputFormat.setOutputPath(job, new Path("/trythisdir"));
MongoConfigUtil.setInputURI(conf,"mongodb://localhost:27017/fake_data.file");
//conf.set("mongo.output.uri", "mongodb://localhost:27017/fake_data.file");
job.setJarByClass(imageExtractor.class);
job.setMapperClass(imageExtractorMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass( MongoInputFormat.class );
// Execute job and return status
return job.waitForCompletion(true) ? 0 : 1;
Edit: This is the current error I am having:
Exception in thread "main" java.lang.IllegalArgumentException: Couldn't connect and authenticate to get collection
at com.mongodb.hadoop.util.MongoConfigUtil.getCollection(MongoConfigUtil.java:353)
at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitterByStats(MongoSplitterFactory.java:71)
at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitter(MongoSplitterFactory.java:107)
at com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:56)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1079)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1096)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:177)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:995)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
at com.orbis.image.extractor.mongo.imageExtractor.run(imageExtractor.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.orbis.image.extractor.mongo.imageExtractor.main(imageExtractor.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.NullPointerException
at com.mongodb.MongoURI.<init>(MongoURI.java:148)
at com.mongodb.MongoClient.<init>(MongoClient.java:268)
at com.mongodb.hadoop.util.MongoConfigUtil.getCollection(MongoConfigUtil.java:351)
... 22 more
You haven't shared the complete code so it's hard to tell, but what you've got there does not look consistent with typical usage of the MongoDB Connector for Hadoop.
I would suggest that you start with the examples in github.