pyspark: namespace in jar file not found

265 Views Asked by At

I'm trying to import classes in external jar with PySpark, I'm running the spark-shell with --jars and the path to the jar that contains the classes I want to use.

However, when I import a class inside my code, the namespace is not found:

from io.warp10.spark import WarpScriptFilterFunction

The error:

 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 Traceback (most recent call last):
   File "warp10-test.py", line 1, in <module>
     from io.warp10.spark import WarpScriptFilterFunction
 ImportError: No module named warp10.spark
1

There are 1 best solutions below

1
On BEST ANSWER

You have to use a WarpScriptâ„¢ UDF if you want to run on Spark.

Here is an example:

from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql.types import StringType
from pyspark.sql.types import ArrayType

spark = SparkSession.builder.appName("WarpScript Spark Test").getOrCreate()
sc = spark.sparkContext

sqlContext = SQLContext(sc)

sqlContext.registerJavaFunction("foo", "io.warp10.spark.WarpScriptUDF3", ArrayType(StringType()))

print sqlContext.sql("SELECT foo('SNAPSHOT \"Easy!\"', 3.14, 'pi')").collect()

For more information, see: https://www.warp10.io/content/05_Ecosystem/04_Data_Science/06_Spark/02_WarpScript_PySpark