How do I run a Scala script using spark-submit (similar to a Python script)?
I am trying to execute a simple Scala script using Spark as described in the Spark Quick Start Tutorial . I have no problem executing the following Python code:
"""SimpleApp.py"""
from pyspark import SparkContext
logFile = "tmp.txt" # Should be some file on your system
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print "Lines with a: %i, lines with b: %i" % (numAs, numBs)
I am executing this code using the following command:
/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.py
However, if I try to do the same with Scala, I run into technical problems. In more detail, the code I'm trying to follow is the following:
* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "tmp.txt" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
I am trying to execute it like this:
/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.scala
As a result, you receive the following error message:
Error: Cannot load main class from JAR file
Does anyone know what I am doing wrong?
source to share
Use spark-submit --help
to find out parameters and arguments.
$ ./bin/spark-submit --help
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
As you can see the first use spark-submit
is required <app jar | python file>
.
The argument app jar
is the drum of the Spark application with the main object ( SimpleApp
in your case).
You can create a jar application using sbt or maven, which you can read in the official Self-Contained Applications documentation :
Let's say we want to write a standalone application using the Spark API. We'll walk through a simple application in Scala (with sbt), Java (with Maven), and Python.
and then in the section:
we can create a JAR package containing the application code and then use a spark-submit script to run our program.
ps Use Spark 2.1.1 .
source to share
I want to add to @JacekLaskowski an alternative solution that I sometimes use for POC or test purposes.
It would be used script.scala
internally spark-shell
with :load
.
:load /path/to/script.scala
You don't need to define SparkContext
/ SparkSession
as the script will use the variables defined in the REPL scope.
You also don't need to wrap your code in a Scala object.
PS: I see it more as a hack and not use for production purposes.
source to share