With Spark, how to connect the wizard or solve the error: "WARN TaskSchedulerImpl: initial work did not accept any resources"

Please tell me how to accomplish the following problem.

First, I have confirmed that the following code runs when master is "local".

Then I started two instances of EC2 (m1.large). However, when the wizard "sparks: // MASTER_PUBLIC_DNS: 7077" the error message "TaskSchedulerImpl" appears and it doesn't work.

When I go to INVALID as master (spark: // INVALID_DNS: 7077) from VALID, the same error message appears.

Namely: "WARN TaskSchedulerImpl: The initial task did not accept any resources, check your cluster UI to make sure the workers are registered and have sufficient memory."

It seems this . As this comment, I assigned 12G memory to this cluster, but it fails.

#!/usr/bin/env python                                                                                     
# -*- coding: utf-8 -*- 
from pyspark import SparkContext, SparkConf 
from pyspark.mllib.classification import LogisticRegressionWithSGD 
from pyspark.mllib.regression import LabeledPoint 
from numpy import array 

# Load and parse the data 
def parsePoint(line): 
  values = [float(x) for x in line.split(' ')] 
  return LabeledPoint(values[0], values[1:]) 
appName = "testsparkapp" 
master = "spark://MASTER_PUBLIC_DNS:7077" 
#master = "local" 


conf = SparkConf().setAppName(appName).setMaster(master) 
sc = SparkContext(conf=conf) 

data = sc.textFile("/root/spark/mllib/data/sample_svm_data.txt") 
parsedData = data.map(parsePoint) 

# Build the model 
model = LogisticRegressionWithSGD.train(parsedData) 

# Evaluating the model on training data 
labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features))) 
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count()) 
print("Training Error = " + str(trainErr))     

      

Additional

I did three assignments that my friend advised me to do.

1.I opened the main port, 7077.

2. In the main URL, enter the hostname, not the ip address.

-> So, I was able to connect to the main server (I tested it with Cluster UI).

3.I tried to set worker_max_heap as below but it fails.

ScalaConf (). set ("spark.executor.memory", "4g"). set ("worker_max_heapsize", "2g")

the worker allows me to use 6.3 GB (I tested it using the UI). This is m1.large.

-> I found out a warning in my run log and an error in the working stderr.

my execution log

14/08/08 06:11:59 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

      

working stderr

14/08/08 06:14:04 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@PRIVATE_HOST_NAME1:52011/user/Worker
14/08/08 06:15:07 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@PRIVATE_HOST_NAME1:52201] -> [akka.tcp://spark@PRIVATE_HOST_NAME2:38286] disassociated! Shutting down.

      

+3


source to share


1 answer


The spark-ec2 script configures the Spark Cluster in EC2 to be standalone, which means it cannot handle remote dispatches. I was struggling with the same error you described a few days before it was found out that it is not supported. The error message is unfortunately not correct.



So, you need to copy your materials and enter the master to complete your spark task.

+5


source







All Articles