PySpark Installation - Could not find Spark jars directory

I have a lot of problems with Spark on Windows . So, explaining the error:

There are many tutorials to install and solve many problems, however I have tried for hours and still cannot get it to work.

I have Java 8 which I have onSystem Path

C:\>java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

      

I also have Python 2.7 with Anaconda 4.4

C:\Program Files (x86)\Spark\python\dist>python -V
Python 2.7.13 :: Anaconda 4.4.0 (64-bit)

      

Just in case, I have Scale , SBT and GOW .

C:\>scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

C:\>gow -version
Gow 0.8.0 - The lightweight alternative to Cygwin

C:\>sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
> about
[info] This is sbt 0.13.15

      

So, let's move on to the installation:

  • I first downloaded Spark 2.1.1 with the package type Prebuilt for Apache Hadoop 2.7 and later

  • I extract it to a specific folder, say C:\Programs\Spark

  • In the Python folder, I ran python setup.py sdist

    which should make a suitable tgz file for pip

    which it did.

  • Turning in dist

    , I ran pip install NAME_OF_PACKAGE.tgz

    . This installed it because if conda list

    :

    C:\>conda list
    # packages in environment at C:\Program Files (x86)\Anaconda2:
    #
    ...
    pyspark                   2.1.1+hadoop2.7           <pip>
    ...
    
          

    I had some doubts, so I went to Anaconda Scripts

    and site-packages

    . Both had what I expected. There are scripts pyspark

    spark-shell

    and so on. The folder pyspark

    in site-packages

    also has everything in the jars folder in its own bin folder , which also has the scripts above.

  • Oh hadoop, I downloaded winutils.exe and pasted it in Spark bin folder

    , which also pointed to python pyspark bin folder

    .

  • With this in mind, I imported pyspark without issue:

    C:\Users\Rolando Casanueva>python
    Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    Anaconda is brought to you by Continuum Analytics.
    Please check out: http://continuum.io/thanks and https://anaconda.org
    >>> import pyspark
    >>> 
    
          


FIRST QUESTION: Do I need to paste winutils.exe also in the python scripts folder?

When going to the main situation, the problem occurs in use pyspark

and raises this exception.

C:\Users\Rolando Casanueva>python
Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import pyspark
>>> pyspark.SparkContext()
C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark
"Files" no se reconoce como un comando interno o externo, 
programa o archivo por lotes ejecutable.
Failed to find Spark jars directory.
You need to build Spark before running this program.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\context.py", line 259, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\java_gateway.py", line 96, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>>

      

https://www.youtube.com/watch?v=omlwDosMGVk

  • I installed spark as a jupyter add-on.

https://mas-dse.github.io/DSE230/installation/windows/

  • And finally I tried as described above.

The same error is displayed on every installation.

SECOND QUESTION: How to solve this problem?

ADDITIONAL QUESTION: Any other advice on how to install it?

+3


source to share





All Articles