How to Spark Work Programmatically

When submitting with spark-submit, it works.

But when trying to send it programmatically with below command

mvn exec:java -Dexec.mainClass="org.cybergen.SubmitJobExample" -Dexec.args="/opt/spark/current/README.md Please"

      

I get the following error when trying to do this

Application log

15/05/12 17:19:46 INFO AppClient$ClientActor: Connecting to master spark://cyborg:7077...
15/05/12 17:19:46 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@cyborg:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/05/12 17:20:06 INFO AppClient$ClientActor: Connecting to master spark://cyborg:7077...
15/05/12 17:20:06 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@cyborg:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].

      

Spark Master Log

15/05/12 17:33:22 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@cyborg:7077] <- [akka.tcp://sparkDriver@10.18.26.116:49592]: Error [org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 7674242335164700840, local class serialVersionUID = 2596819202403185464] [
java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 7674242335164700840, local class serialVersionUID = 2596819202403185464
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
    at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
    at scala.util.Try$.apply(Try.scala:161)
    at akka.serialization.Serialization.deserialize(Serialization.scala:98)
    at akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
    at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
    at scala.util.Try$.apply(Try.scala:161)
    at akka.serialization.Serialization.deserialize(Serialization.scala:98)
    at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
    at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
    at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
    at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
    at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
    at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
]
15/05/12 17:33:22 INFO Master: akka.tcp://sparkDriver@10.18.26.116:49592 got disassociated, removing it.
15/05/12 17:33:22 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@10.18.26.116:49592] has failed, address is now gated for [5000] ms. Reason is: [org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 7674242335164700840, local class serialVersionUID = 2596819202403185464].
15/05/12 17:33:22 INFO LocalActorRef: Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkDriver%4010.18.26.116%3A49592-6/endpointWriter/endpointReader-akka.tcp%3A%2F%2FsparkDriver%4010.18.26.116%3A49592-0#1749840468] was not delivered. [10] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
15/05/12 17:33:22 INFO LocalActorRef: Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40127.0.0.1%3A50366-7#-1224275483] was not delivered. [11] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
15/05/12 17:33:42 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@cyborg:7077] <- [akka.tcp://sparkDriver@10.18.26.116:49592]: Error [org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 7674242335164700840, local class serialVersionUID = 2596819202403185464] [
java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 7674242335164700840, local class serialVersionUID = 2596819202403185464
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)

      

SparkTestJob: Spark Job Class

class SparkTestJob(val filePath:String="",val filter:String ="") extends Serializable{
  def runWordCount() :Long = {
  val conf = new SparkConf()
    .setAppName("word count for the word"+filter)
    .setMaster("spark://cyborg:7077")
    .setJars(Seq("/tmp/spark-example-1.0-SNAPSHOT-driver.jar"))
    .setSparkHome("/opt/spark/current")
  val sc = new SparkContext(conf)
  val file = sc.textFile(filePath)
  file.filter(line => line.contains(filter)).count()
 }
}

      

SubmitJobExample is the object that initiates the SparkTestJob class.

object SubmitJobExample {

  def main(args: Array[String]):Unit={
    if(args.length==2){
      val fileName = args(0)
      val filterByWord = args(1)
      println("Reading file "+fileName+" for word "+filterByWord)
      val jobObject = new SparkTestJob(fileName,filterByWord)
      println("word count for the file "+fileName+" is "+jobObject.runWordCount())
    }else{
      val jobObject = new SparkTestJob("/opt/spark/current/README.md","Please")
      println("word count for the file /opt/spark/current/README.md is "+jobObject.runWordCount())
    }
  }
}

      

+3


source to share


2 answers


The actual problem was that the spark versions of one of the dependencies did not match. Changing all dependencies on the same spark version. Fixed problem.



The reason it was firing when executing spark-submit has to do with the priority of the java java class that was using the correct version of the spark jar.

+1


source


Your code looks good to me. To debug serialization issues, run -Dsun.io.serialization.extendedDebugInfo=true

. This will print additional output to NotSerializableException

, and you can see what it tries to serialize.



0


source







All Articles