MongoDB Hadoop error : no FileSystem for scheme:mongodb

564 Views Asked by At

I'm trying to get a basic Spark example running using mongoDB hadoop connector. I'm using Hadoop version 2.6.0. I'm using version 1.3.1 of mongo-hadoop. I'm not sure where exactly to place the jars for this Hadoop version. Here are the locations I've tried:

  • $HADOOP_HOME/libexec/share/hadoop/mapreduce
  • $HADOOP_HOME/libexec/share/hadoop/mapreduce/lib
  • $HADOOP_HOME/libexec/share/hadoop/hdfs
  • $HADOOP_HOME/libexec/share/hadoop/hdfs/lib

Here is the snippet of code I'm using to load a collection into Hadoop:

Configuration bsonConfig = new Configuration();
bsonConfig.set("mongo.job.input.format", "MongoInputFormat.class");
JavaPairRDD<Object,BSONObject> zipData = sc.newAPIHadoopFile("mongodb://127.0.0.1:27017/zipsdb.zips", MongoInputFormat.class, Object.class, BSONObject.class, bsonConfig);

I get the following error no matter where the jar is placed:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: mongodb
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:505)
at org.apache.spark.SparkContext.newAPIHadoopFile(SparkContext.scala:774)
at org.apache.spark.api.java.JavaSparkContext.newAPIHadoopFile(JavaSparkContext.scala:471)

I dont see any other errors in hadoop logs. I suspect I'm missing something in my configuration, or that Hadoop 2.6.0 is not compatible with this connector. Any help is much appreciated.

0

There are 0 best solutions below