GeoSpark using Maven UDF running Databricks on Azure?

767 Views Asked by At

I'm trying out Databricks@Azure with spark cluster: Spark 3.0.0, Scala 2.12 On the cluster(!) I installed: geospark:1.3.1 geospark-sql_2.3:1.3.1 Inspired by https://databricks.com/notebooks/geospark-notebook.html I love SQL and would like to run GeoSpark queries.

I run this (from Notebook):

%scala

import com.vividsolutions.jts.geom.{Coordinate, Geometry, GeometryFactory}
import org.datasyslab.geospark.formatMapper.shapefileParser.ShapefileReader
import org.datasyslab.geospark.spatialRDD.SpatialRDD
import org.datasyslab.geosparksql.utils.{Adapter, GeoSparkSQLRegistrator}
GeoSparkSQLRegistrator.registerAll(sqlContext)

When I run this check:

%scala 
import org.apache.spark.sql.SparkSession
val spark = SparkSession
  .builder()
  .appName("Spark SQL UDF scalar example")
  .getOrCreate()


spark.catalog.listFunctions().filter("name like 'ST%P%' ").show(false)


/** spark.catalog.listTables().show() 
spark.sql("SELECT ST_Point(0,0) FROM ( VALUES (42) ) AS t(a); ").show() */

The output is:

|name                       |database|description|className                                                                |isTemporary|

|ST_NPoints                 |null    |null       |org.apache.spark.sql.geosparksql.expressions.ST_NPoints$                 |true       |

|ST_Point                   |null    |null       |org.apache.spark.sql.geosparksql.expressions.ST_Point$                   |true       |

...

But this

    %sql
SELECT t.a, ST_Point(0,0) as p
FROM (VALUES (42)) AS t(a);

Fails:

Error in SQL statement: NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/codegen/CodegenFallback$class

What is it I'm doing wrong?

P.S. I also tried:

CREATE FUNCTION ST_Point AS 'org.apache.spark.sql.geosparksql.expressions.ST_Point$';

with and without ending dollar sign. the create function statement returns OK; however running the select including ST_point then returns:

 Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.geosparksql.expressions.ST_Point$'; line 1 pos 12
1

There are 1 best solutions below

0
On

geospark 1.3.1 seems to be built for Spark 2.x, see [1], if you need to use Spark 3.x then try and upgrade to geospark 1.3.2 otherwise try to downgrade to spark 2.x.

[1] http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/