Sparks don't detect special Scala methods

Question

Sparks don't detect special Scala methods

The problem is that every job fails with the following exception:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
at ps.sparkapp.Classification$.main(Classification.scala:35)
at ps.sparkapp.Classification.main(Classification.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

This exception means that the task cannot find the method. I am developing using intelij community edition. I have no problem compiling the package. All dependencies are packaged correctly. Here's my build.sbt:

name := "SparkApp"
version := "1.0"

scalaVersion := "2.11.6"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.1"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.1.1"


 scala -version 
 Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL

I found out that this error has something to do with scala because it happens when I use functionality that is native to scala, such as scala for a loop, .map or .drop (2). The class and are still written in scala, but if I avoid functions like .map or drop (2) then everything is fine.

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.linalg.Vector

object Classification {

  def main(args: Array[String]) {
    ...
    //df.printSchema()
    var dataset = df.groupBy("user_id","measurement_date").pivot("rank").min()

    val col = dataset.schema.fieldNames.drop(2) // <- here the error happens

    // take all features and put them into one vector
    val assembler = new VectorAssembler()
      .setInputCols(col)
      .setOutputCol("features")

    val data = assembler.transform(dataset)
    data.printSchema()
    data.show()

    sc.stop()
  }

}

As said, if I don't use .drop (2) everything works fine, but avoiding these methods is not an option as it hurts a lot.

I couldn't find any solution online, any ideas?

BTW: I can use these methods in a spark shell, which I find strange.

Thanks in advance.

NOTE 1)

I am using: SPARK version 2.1.1

Using scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

+3

java scala sbt apache-spark apache-spark-mllib

user4054919 Jul 28 17 at 15:47

source to share

1 answer

Brian · Answer 1 · 2017-07-30T01:57:59+0000

Try adding actual Scala libraries etc. depending on the project. For example:.

libraryDependencies + = "org.scala-lang"% "scala -library"% "2.11.6"

Sparks don't detect special Scala methods

More articles: