Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via
spark.catalog.setCurrentDatabase("test")
spark.catalog.listTables
However when I submit a job via spark-submit I get a fatal error
ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: Database 'test' does not exist.;
I am creating my SparkSession within the job being submitted via spark-submit via
SparkSession.builder.enableHiveSupport.getOrCreate
EMR 5.9.0 has just been released - please give it a shot, it should work for you.
Relevant documentation:
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html