Enable package in local Spark mode

Question

Enable package in local Spark mode

I am writing some unit tests for my Spark code in python. My code depends on spark-csv . In production, I am using spark-submit --packages com.databricks:spark-csv_2.10:1.0.3

to submit my python script.

I am using pytest to run my tests with Spark in local

:

conf = SparkConf().setAppName('myapp').setMaster('local[1]')
sc = SparkContext(conf=conf)

My question is, since it pytest

doesn't use spark-submit

to run my code, how can I provide a dependency spark-csv

on the python process?

+3

python apache-spark pyspark py.test

Cody canning June 22. 15 at 15:56

source to share

1 answer

Abhishek choudhary · Accepted Answer · 2015-06-22T20:18:20+0000

you can use your spark.driver.extraClassPath config file to solve the problem. Spark-default.conf

and add the property

 spark.driver.extraClassPath /Volumes/work/bigdata/CHD5.4/spark-1.4.0-bin-hadoop2.6/lib/spark-csv_2.11-1.1.0.jar:/Volumes/work/bigdata/CHD5.4/spark-1.4.0-bin-hadoop2.6/lib/commons-csv-1.1.jar

After installing above, you don't even need the packages flag while running from the shell.

sqlContext = SQLContext(sc)
    df = sqlContext.read.format('com.databricks.spark.csv').options(header='false').load(BASE_DATA_PATH + '/ssi.csv')

Both jars are important as spark-csv depends on the commons-csv

apache jar. In the bank, spark-csv

you can create or download from the mvn site.

Enable package in local Spark mode

More articles: