Scala is a must for Spark?

I am new to Iskra . It says in its docs that it is available in either Scala or Python.

And some blogs say the spark depends on the scalar (e.g. http://cn.soulmachine.me/blog/20130614/ ). So I'm wondering: is scala a required parameter for Spark ? (Do I have to install scala first because of a dependency?)

+3


source to share


2 answers


Java is a prerequisite for Spark + many other transitive dependencies (the scala compiler is just a library for the JVM). PySpark just connects remotely (via socket) to the JVM using Py4J (Python-Java Interop). Py4J is included in PySpark.

PySpark requires Python 2.6 or higher. PySpark applications are executed using the standard CPython interpreter to support Python Modules that use C extensions. We have not tested PySpark with Python 3 or with alternative Python interpreters such as PyPy or Jython.

All PySparks library dependencies, including Py4J, are bundled with PySpark and automatically imported.

PySpark standalone applications should be run using the bin / pyspark script, which automatically configures the Java and Python environment using the settings in conf / spark-env.sh or .cmd. The script will automatically add the bin / pyspark package to the PYTHONPATH.

https://spark.apache.org/docs/0.9.1/python-programming-guide.html - this tutorial shows how to build and run it all with Scala / Java Build Tool (SBT) which will automatically download everything dependencies (including scala) from a remote repository. Yo can also use Maven.



If you don't need Java on your computer, you can run it on any other and configure PySpark to use it (via SparkConf().setMaster

).

So you need Java for the master node with Spark itself (and all java dependencies like scala) and Python 2.6 for the py-client

0


source


The Scala API has the following language bindings:

  • Scala
  • Java
  • Python

Scala is a natural fit because it supports strictly functional programming, which is obviously useful in the big data field. Most of the tutorials and code snippets that you find on the net are written in Scala.



As for the runtimne dependencies, please take a look at the project download page

"Spark runs on Java 6+ and Python 2.6+. For the Scala API, Spark 1.2.0 uses Scala 2.10. You will need to use a compatible version of Scala (2.10.x)."

0


source







All Articles