I am trying to connect from a Hive Database to a collection in MongoDB using a driver (jars) provided on the wiki site. Here are the steps I did: -
I created a collection in MongoDB called "Diamond" under a database called "Moe" and it has got 20 documents:
I wanted to connect from Hive via the Hadoop MongoDB Driver and view these documents via Hive.
I have both MongoDB and Hive installed on the same server and configured. However I don't see any variable called the HIVE_CLASPATH I wonder where that is.
So I installed 3 divers on the server: -
mongo-hadoop-core-1.5.2.jar;
mongo-hadoop-hive-1.5.2.jar;
mongo-java-driver-3.0.0.jar;
Now, I connect to Hive, and then add these 2 jar's to my classpath by the following commands: - (they get added successfully)
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-hadoop-hive-1.5.2.jar;
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-hadoop-core-1.5.2.jar;
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-java-driver-3.0.0.jar;
Now I create a table in HIVE: -
CREATE TABLE Diamond
(
carat DOUBLE,
cut STRING,
color STRING,
clarity STRING,
depth DOUBLE,
table DOUBLE,
price DOUBLE,
xcord DOUBLE,
ycord DOUBLE,
zcord DOUBLE
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"carat":"carat","cut":"cut",
"color":"color", "clarity":"clarity", "depth":"depth", "table":"table",
"price":"price", "xcord":"x", "ycord":"y", "zcord":"z"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/Moe.Diamond');
However when I execute the above command in Hive I get the error below: -
java.lang.NoClassDefFoundError: com/mongodb/util/JSON
at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:110)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:210)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:268)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:261)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:587)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:573)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3784)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:256)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:155)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: com.mongodb.util.JSON
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 23 more
FAILED: Execution Error, return code -101 from
org.apache.hadoop.hive.ql.exec.DDLTask
I have tried the following: - - placing the jars in every possible directory with no effect - The class that is supposed to be missing, is pretty much present in the jar file. - oh yes and the MongoStorageHandler class is very much in the jar.
I am done breaking my head with this !! If anyone can shed some light on what I could do to alleviate my anxiety, it would be great.
Thanks again. Mario
I identified what the issue was. To connect from HIVE to MongoDB, the MongoDb Driver uses invokes a java class in a hive jar library **
** Now this class is supposed to be found in the jar file - hive-exec-0.11.0.1.3.2.0-111.jar. However it is available only in more recent versions of HIVE and not older ones.
It is not available in 0.11.0.1.3.2.0-111 but is visibly detectable in 0.13.0.2.1.7.0-784.
The solution here was to connect to a version of HIVE that is supported by the driver. MongoDB does state that its driver supports a certain version of Hadoop, but doesn't drill down to the individual Application (HIVE / SQOOP).