Spark JobServer JDBC-ClassNotFound error

696 Views Asked by At

I have: - Hadoop - Spark JobServer - SQL Database

I have created a file to access my SQL Database from a local instance of the Spark JobServer. In order to do this, I first have to load my JDBC-driver with this command: Class.forName("com.mysql.jdbc.Driver");. However, when I try to execute the file on Spark JobServer, I get a classNotFound error:

"message": "com.mysql.jdbc.Driver",
"errorClass": "java.lang.ClassNotFoundException",

I have read that in order to load the JDBC driver, you have to change some configurations in either the application.conf file of the Spark JobServer or its server_start.sh file. I have done this as follows. In server_start.sh I have changed the cmd value which is sent with as spark-submit command:

cmd='$SPARK_HOME/bin/spark-submit --class $MAIN --driver-memory $JOBSERVER_MEMORY
  --conf "spark.executor.extraJavaOptions=$LOGGING_OPTS spark.executor.extraClassPath = hdfs://quickstart.cloudera:8020/user/cloudera/mysql-connector-java-5.1.38-bin.jar"
  --driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES"
  --driver-class-path "hdfs://quickstart.cloudera:8020/user/cloudera/mysql-connector-java-5.1.38-bin.jar"
  --jars "hdfs://quickstart.cloudera:8020/user/cloudera/mysql-connector-java-5.1.38-bin.jar"
  $@ $appdir/spark-job-server.jar $conffile'

I also changed some lines of the application.conf file of the Spark JobServer which is used when starting the instance:

# JDBC driver, full classpath
jdbc-driver = com.mysql.jdbc.Driver

# dependent-jar-uris = ["hdfs://quickstart.cloudera:8020/user/cloudera/mysql-connector-java-5.1.38-bin.jar"]

But the error that JDBC class cannot be found still comes back.

Already checked for the following errors:

ERROR1: In case somebody thinks that I just have the wrong file path (which could very well be the case as far as I know myself), I have checked for the correct file on HDFS with hadoop fs -ls hdfs://quickstart.cloudera:8020/user/cloudera/ and the file was there:

-rw-r--r--   1 cloudera cloudera     983914 2016-01-26 02:23 hdfs://quickstart.cloudera:8020/user/cloudera/mysql-connector-java-5.1.38-bin.jar

ERROR2: I have the necessary dependency loaded in my build.sbt file: libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.+" and the import command in my scala-file import java.sql._.

How can I solve this ClassNotFound error? Are there any good alternatives to JDBC to connect to SQL?

2

There are 2 best solutions below

1
On BEST ANSWER

We have something like this in local.conf

  # JDBC driver, full classpath
  jdbc-driver = org.postgresql.Driver

  # Directory where default H2 driver stores its data. Only needed for H2.
  rootdir = "/var/spark-jobserver/sqldao/data"

  jdbc {
    url = "jdbc:postgresql://dbserver/spark_jobserver"
    user = "****"
    password = "****"
  }

  dbcp {
    maxactive = 20
    maxidle = 10
    initialsize = 10
  }

And in the start script I have

EXTRA_JARS="/opt/spark-jobserver/lib/*"

CLASSPATH="$appdir:$appdir/spark-job-server.jar:$EXTRA_JARS:$(dse spark-classpath)"

And all dependent files that are used by Spark Jobserver is put in /opt/spark-jobserver/lib

I have not used HDFS to load jar for job-server.

But if you need mysql driver to be loaded on spark worker nodes then you should do it via dependent-jar-uris. I think that is what you are doing now.

0
On

I have packaged the project using sbt assembly and it finally works and I am happy.

But it's actually not working to have HDFS files in your dependent-jar-uri. So don't use HDFS links as your dependent-jar-uris.

Also, read this link in case you are curious: https://github.com/spark-jobserver/spark-jobserver/issues/372