Correct way to create Spark Fat Jar using SBT
I need a Fat Jar with Spark because I am creating a custom node for Knime. This is basically a self-contained jar made inside Knime and I believe the Fat Jar is the only way to spawn a local Spark Job. We will eventually continue submitting the job to the remote cluster, but now I need it to appear this way.
However, I made a Fat Jar using this: https://github.com/sbt/sbt-assembly
I made an empty sbt project, included Spark-core in dependencies, and built a Jar. I added it to the manifest of my custom Knime node and tried to create a simple job (pararellize collection, collect it and print it). It starts up, but I get this error:
Missing config setting for key 'akka.version'
I have no idea how to solve it.
Edit: this is my build.sbt
name := "SparkFatJar"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.3.0"
)
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.3.8"
assemblyJarName in assembly := "SparkFatJar.jar"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
I found this mergestrategy for Spark somewhere on the internet, but I can't find the source right now.
source to share
I think the problem is with how you installed assemblyMergeStrategy
. Try the following:
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case "reference.conf" => MergeStrategy.concat
case x =>
val baseStrategy = (assemblyMergeStrategy in assembly).value
baseStrategy(x)
}
source to share