KeyError: SPARK_HOME during SparkConf initialization

I am new to the realm of spark and I want to run a Python script from the command line. I have tested pyspark interactively and it works. I am getting this error when trying to create sc:

File "test.py", line 10, in <module>
    conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway
    SPARK_HOME = os.environ["SPARK_HOME"]
  File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'SPARK_HOME'

      

+3


source to share


1 answer


There seem to be two problems.

The first one is the path you are using. SPARK_HOME

should point to the root directory of the Spark installation, so in your case it should be /home/dirk/spark-1.4.1-bin-hadoop2.6

not /home/dirk/spark-1.4.1-bin-hadoop2.6/bin

.

The second problem is the way it is used setSparkHome

. If you check docstring its purpose is

set path where Spark is installed on worker nodes



Constructor

SparkConf

assumes that SPARK_HOME

master is already installed. It causes pyspark.context.SparkContext._ensure_initialized

that cause pyspark.java_gateway.launch_gateway

, which is trying to go SPARK_HOME

and does not work.

To deal with this, you must install SPARK_HOME

before creating SparkConf

.

import os
os.environ["SPARK_HOME"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6"
conf = (SparkConf().setMaster('local').setAppName('a'))

      

+10


source







All Articles