Msgstr "Error while instantiating"

I installed spark 2.2 with winutils in windows 10. When I go to run pyspark I am faced with a stricter exception

pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'

      

I already tried 777 command permissions in tmp / hive folder but now it doesn't work

winutils.exe chmod -R 777 C:\tmp\hive

      

after applying this problem remains the same. I am using pyspark 2.2 in my windows 10. Its spark env wrapper enter image description here

Here is the pyspark shell enter image description here

Please help me figure out Thankyou

+3


source to share


9 replies


Port 9000 ?! It must be Hadoop related as I don't remember the port for Spark. I would recommend using it first spark-shell

to eliminate any additional "transitions", i.e. spark-shell

does not require two lead times for Spark and Python itself.

Given the exception, I'm pretty sure the problem is that you have multiple Hive or Hadoop lying around somewhere, and Spark is using that apparently.

"Caused by" seems to indicate that 9000 is in use when Spark SQL is generated when the subsystem that supports Hive is loaded.



Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.net.ConnectException: Call from DESKTOP-SDNSD47 / 192.168.10.143 to 0.0.0.0:9000 failed when disconnecting connection: java. net.ConnectException: Connection refused

Review the environment variables on Windows 10 (perhaps using a command set

on the command line) and remove any Hadoop related.

+1


source


I had the same problem using "pyspark" command as well as "spark-shell" (for scala) on my mac os with apache-spark 2.2. Based on some research, I figured this out because of my JDK 9.0.1 version not working with Apache-Spark. Both bugs were resolved by switching from Java JDK 9 to JDK 8.



Maybe it can help when installing your windows too.

+2


source


Posting this answer for posterity. I faced the same error. The way I solved it was to try spark sheath first instead of pyspark. The error message was more direct.

This gave a better idea; there was an S3 access error. Following; I checked the ec / role / instance role profile for this instance; it has S3 admin access.

Then I did grep for s3: // in all conf files in the / etc / directory. Then I found that there is a property in core-site.xml called

<!-- URI of NN. Fully qualified. No IP.--> <name>fs.defaultFS</name> <value>s3://arvind-glue-temp/</value> </property>

Then I remembered. I removed HDFS as my default filesystem and installed it on S3. I created an ec2 instance from an earlier AMI and forgot to update the S3 bucket to match the new account.

Once I have updated the s3 bucket to one that is available to the current profile of the ec2 instance; he worked.

+1


source


To use Spark on Windows OS, you can follow this tutorial.

NOTE. Make sure you correctly resolve your IP address against your hostname as well as localhost, not resolving localhost has caused problems in the past.

In addition, you should provide a full stack trace as it helps you quickly debug the problem and save guesswork.

Let me know if it helps. Greetings.

0


source


Try it. It worked for me !. Open a command prompt in administrator mode and then run the "pyspark" command. This should help open the spark session without errors.

0


source


I am also facing an error on Unbuntu 16.04:

raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'

      

this is because i have already completed ./bin/spark-shell

So just kill this spark-shell

and run./bin/pyspark

0


source


I also ran into the bug on MacOS10 and I solved it with Java8 instead of Java9.

When Java 9 is the default version allowed in the environment, pyspark will throw below error and you will see the name "xx" is undefined error when trying to access sc, spark, etc. from shell / Jupyter.

more details you can see this link

0


source


You should have a file hive-site.xml

in your spark config directory. Change the port from 9000

to 9083

, solving the problem for me.

Please make sure the property is updated in the files hive-site.xml

to be placed in the hive config and spark config directory .

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://localhost:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>   </property>

      

For me in ubuntu the place is for hive-site.xml

:

/ home / hadoop / hive / conf /

and

/ home / Hadoop / spark / conf /

0


source


I had the same error on spark 2.2.0, so I checked the ports and found that some other spark processes are holding the port. so all you have to do is go to your shell and type "JPS" to see if another spark process exists

user@server:~$ JPS

      

so you will see a list of Java processes that were running on your system, for example:

78849 Jps
53409 RemoteInterpreterServer
78627 RemoteInterpreterServer
78515 RemoteInterpreterServer
76244 RemoteInterpreterServer
58566 SparkSubmit
77510 NameNode
74601 ZeppelinServer
17912 Master
77755 SecondaryNameNode
74684 LivyServer
77854 SparkSubmit

      

enter image description here

then you have to terminate spark session by their process id using the following command:

sudo kill -9 #process id#
ex : sudo kill -9 58566

      

or you can use this

 sudo killall -9 SparkSubmit

      

if there is no open spark session in your memory, you will get: "process not found"

Now try running your code again!

Bon Voyage

0


source







All Articles