I'm writing some unit tests for my Spark code in python. My code depends on spark-csv. In production I use spark-submit --packages com.databricks:spark-csv_2.10:1.0.3
to submit my python script.
I'm using pytest to run my tests with Spark in local
mode:
conf = SparkConf().setAppName('myapp').setMaster('local[1]')
sc = SparkContext(conf=conf)
My question is, since pytest
isn't using spark-submit
to run my code, how can I provide my spark-csv
dependency to the python process?
you can use your config file spark.driver.extraClassPath to sort out the problem. Spark-default.conf
and add the property
After setting the above you even don't need packages flag while running from shell.
Both the jars are important, as spark-csv depends on
commons-csv
apache jar. Thespark-csv
jar you can either build or download from mvn-site.