Include package in Spark local mode

1.8k Views Asked by Cody Canning At 22 June 2015 at 15:56

I'm writing some unit tests for my Spark code in python. My code depends on spark-csv. In production I use spark-submit --packages com.databricks:spark-csv_2.10:1.0.3 to submit my python script.

I'm using pytest to run my tests with Spark in local mode:

conf = SparkConf().setAppName('myapp').setMaster('local[1]')
sc = SparkContext(conf=conf)

My question is, since pytest isn't using spark-submit to run my code, how can I provide my spark-csv dependency to the python process?

Original Q&A

There are 1 best solutions below

Abhishek Choudhary On 22 June 2015 at 20:18 BEST ANSWER

you can use your config file spark.driver.extraClassPath to sort out the problem. Spark-default.conf

and add the property

 spark.driver.extraClassPath /Volumes/work/bigdata/CHD5.4/spark-1.4.0-bin-hadoop2.6/lib/spark-csv_2.11-1.1.0.jar:/Volumes/work/bigdata/CHD5.4/spark-1.4.0-bin-hadoop2.6/lib/commons-csv-1.1.jar

After setting the above you even don't need packages flag while running from shell.

sqlContext = SQLContext(sc)
    df = sqlContext.read.format('com.databricks.spark.csv').options(header='false').load(BASE_DATA_PATH + '/ssi.csv')

Both the jars are important, as spark-csv depends on commons-csv apache jar. The spark-csv jar you can either build or download from mvn-site.

Include package in Spark local mode

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in APACHE-SPARK

Related Questions in PYTEST

Related Questions in PYSPARK

Trending Questions

Popular # Hahtags

Popular Questions