Can't run Spark inside scala desktop in Intellij Idea
The following code runs without issue if I put it inside an object that extends the app trait and launches it using the Idea command run
.
However, when I try to run it from sheet, I come across one of these scenarios:
1- If the first line is present, I get:
The task is not serializable: java.io.NotSerializableException: A $ A34 $ A $ A34
2- If the first line is commented out I get:
It is not possible to create an encoder for an internal class A $ A35 $ A $ A35 $ A12 without access to the scope in which the class was defined.
//First line!
org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
case class AClass(id: Int, f1: Int, f2: Int)
val spark = SparkSession.builder()
.master("local[*]")
.appName("Test App")
.getOrCreate()
import spark.implicits._
val schema = StructType(Array(
StructField("id", IntegerType),
StructField("f1", IntegerType),
StructField("f2", IntegerType)))
val df = spark.read.schema(schema)
.option("header", "true")
.csv("dataset.csv")
// Displays the content of the DataFrame to stdout
df.show()
val ads = df.as[AClass]
//This is the line that causes serialization error
ads.foreach(x => println(x))
The project was built using the Idea Scala plugin and this is my build.sbt:
...
scalaVersion := "2.10.6"
scalacOptions += "-unchecked"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "2.1.0",
"org.apache.spark" % "spark-sql_2.10" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
)
I tried the solution in this answer. But it doesn't work for Idea Ultimate 2017.1 which I am using and also when I use worksheets I prefer not to add an extra object to the worksheet if at all possible.
if I use a method collect()
on a dataset object and get an array of Aclass instances, there are no more errors. It tries to work with DS directly, which causes an error.
source to share
Use eclipse compatibility mode (open Preferences-> type scala -> in languages ββand Framework, select scala -> Choose Worksheet -> select eclipse compatibility mode) see https://gist.github.com/RAbraham/585939e5390d46a7d6f8
source to share