Strange org.apache.spark.SparkException: Spark is aborted due to phase failure

I am trying to deploy a spark app offline. In this application, I am exploring the Naive Bayes classifier using tf-idf vectors.

I wrote an app similar to this post ( Spark MLLib TFIDF implementation for LogisticRegression ) The difference is that I take each document and also a marker and normalize it.

JavaRDD<Document> termDocsRdd = sc.parallelize(fileNameList).flatMap(new FlatMapFunction<String, Document>() {
        @Override
        public Iterable<Document> call(String fileName) 
        { 
            return Arrays.asList(parsingFunction(fileName)); 
        }        
    });

      

Thus, each copy of the document has a textField that contains the normalized text of the document as a list of strings (a list of words) and a labelField that contains the document's label as a double. parsingFunction doesn't have Spark functions like map or flatMap etc. Thus, it does not contain data distribution functions.

When I run my application in local mode - it works fine and in predictive mode the classifier classifies the test documents correctly, but when I try to run it in stanalone mode - I have some problems -

When I run the master and worker nodes on the same computer, the application works, but the predictions are worse than in local mode. When I run the master on one machine and the worker on another, the application crashes with the following error:

14/12/02 11:19:17 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 3, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:17 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 1]
14/12/02 11:19:17 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 0.0 (TID 4, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:17 INFO scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 2]
14/12/02 11:19:17 INFO scheduler.TaskSetManager: Starting task 2.1 in stage 0.0 (TID 5, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 3) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 3]
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 0.0 (TID 6, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 2.1 in stage 0.0 (TID 5) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 4]
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Starting task 2.2 in stage 0.0 (TID 7, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 1.1 in stage 0.0 (TID 4) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 5]
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Starting task 1.2 in stage 0.0 (TID 8, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 (TID 6) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 6]
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 0.0 (TID 9, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 1.2 in stage 0.0 (TID 8) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 7]
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Starting task 1.3 in stage 0.0 (TID 10, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 2.2 in stage 0.0 (TID 7) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 8]
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Starting task 2.3 in stage 0.0 (TID 11, fujitsu10.inevm.ru, PROCESS_LOCAL, 1298 bytes)
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 2.3 in stage 0.0 (TID 11) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 9]
14/12/02 11:19:18 ERROR scheduler.TaskSetManager: Task 2 in stage 0.0 failed 4 times; aborting job
14/12/02 11:19:18 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
14/12/02 11:19:18 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 1.3 in stage 0.0 (TID 10) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 10]
14/12/02 11:19:18 INFO scheduler.DAGScheduler: Failed to run reduce at RDDFunctions.scala:111
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 11, fujitsu10.inevm.ru): java.lang.NullPointerException: 
    maven.maven1.App$3.call(App.java:178)
    maven.maven1.App$3.call(App.java:1)
    org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:923)
    scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:236)
    org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
    org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
    org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
    org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
    org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
    org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
    org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
    org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
    org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
    org.apache.spark.scheduler.Task.run(Task.scala:54)
    org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/12/02 11:19:18 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 0.0 (TID 9) on executor fujitsu10.inevm.ru: java.lang.NullPointerException (null) [duplicate 11]
14/12/02 11:19:18 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 

      

In the logs I found:

14/12/02 11:19:20 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@fujitsu11:7077] -> [akka.tcp://sparkDriver@fujitsu11.inevm.ru:54481]: Error [Association failed with         [akka.tcp://sparkDriver@fujitsu11.inevm.ru:54481]] [
akka.remote.EndpointAssociationException: Association failed with     [akka.tcp://sparkDriver@fujitsu11.inevm.ru:54481]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection     refused: no further information: fujitsu11.inevm.ru/192.168.3.5:54481
]

      

I am debugging the application and found that it crashes after this code:

IDFModel idfModel = new IDF().fit(hashedData);

      

Can anyone know what's going on?

Thank.

PS I am using Spark 1.1.0 for Windows 7 64-bit. Both machines have 8 cores and 16GB of RAM.

+3


source to share





All Articles