Best practice for creating a SparkSession object in Scala for use in both unittest and spark-submit
I tried to write a conversion method from DataFrame to DataFrame. And I also want to test it with scalatest.
As you know, in Spark 2.x with the Scala API, you can create a SparkSession object like this:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.bulider
.config("spark.master", "local[2]")
.getOrCreate()
This code works great with unit tests. But, when I run this code with spark-submit, the cluster parameters were not working. For example,
spark-submit --master yarn --deploy-mode client --num-executors 10 ...
does not create any performers.
I found that spark-submit arguments are applied when I remove config("master", "local[2]")
some of the above code. But without installing the wizard, the unit test code didn't work.
I tried to separate the generating part of the spark object (SparkSession) for testing and main. But so many blocks of code require a spark like import spark.implicit,_
and spark.createDataFrame(rdd, schema)
.
Is there any best practice for writing code to create a spark object both for testing and running spark-submit?
source to share
One way is to create a trait that SparkContext / SparkSession provides and use it in your test cases, for example:
trait SparkTestContext {
private val master = "local[*]"
private val appName = "testing"
System.setProperty("hadoop.home.dir", "c:\\winutils\\")
private val conf: SparkConf = new SparkConf()
.setMaster(master)
.setAppName(appName)
.set("spark.driver.allowMultipleContexts", "false")
.set("spark.ui.enabled", "false")
val ss: SparkSession = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
val sc: SparkContext = ss.sparkContext
val sqlContext: SQLContext = ss.sqlContext
}
And your test class header looks like this:
class TestWithSparkTest extends BaseSpec with SparkTestContext with Matchers{
source to share
I made a version where Spark will correctly close after tests.
import org.apache.spark.sql.{SQLContext, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}
import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
trait SparkTest extends FunSuite with BeforeAndAfterAll with Matchers {
var ss: SparkSession = _
var sc: SparkContext = _
var sqlContext: SQLContext = _
override def beforeAll(): Unit = {
val master = "local[*]"
val appName = "MyApp"
val conf: SparkConf = new SparkConf()
.setMaster(master)
.setAppName(appName)
.set("spark.driver.allowMultipleContexts", "false")
.set("spark.ui.enabled", "false")
ss = SparkSession.builder().config(conf).getOrCreate()
sc = ss.sparkContext
sqlContext = ss.sqlContext
super.beforeAll()
}
override def afterAll(): Unit = {
sc.stop()
super.afterAll()
}
}
source to share