Sparks don't detect special Scala methods
The problem is that every job fails with the following exception:
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
at ps.sparkapp.Classification$.main(Classification.scala:35)
at ps.sparkapp.Classification.main(Classification.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This exception means that the task cannot find the method. I am developing using intelij community edition. I have no problem compiling the package. All dependencies are packaged correctly. Here's my build.sbt:
name := "SparkApp"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.1"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.1.1"
scala -version
Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL
I found out that this error has something to do with scala because it happens when I use functionality that is native to scala, such as scala for a loop, .map or .drop (2). The class and are still written in scala, but if I avoid functions like .map or drop (2) then everything is fine.
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.linalg.Vector
object Classification {
def main(args: Array[String]) {
...
//df.printSchema()
var dataset = df.groupBy("user_id","measurement_date").pivot("rank").min()
val col = dataset.schema.fieldNames.drop(2) // <- here the error happens
// take all features and put them into one vector
val assembler = new VectorAssembler()
.setInputCols(col)
.setOutputCol("features")
val data = assembler.transform(dataset)
data.printSchema()
data.show()
sc.stop()
}
}
As said, if I don't use .drop (2) everything works fine, but avoiding these methods is not an option as it hurts a lot.
I couldn't find any solution online, any ideas?
BTW: I can use these methods in a spark shell, which I find strange.
Thanks in advance.
NOTE 1)
I am using: SPARK version 2.1.1
Using scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
source to share