CDH-5.4.0, Spark Yarn, Cluster Mode and Java

I have a cluster CDH-5.4.0

, 4 knots with spark yarn.
I have an environment variable YARN_CONF_DIR

pointing to a directory that contains a copy of the config files taken from one of the cluster members (where is the address of the yarn resource manager).
I want to run spark jobs from java:

SparkConf sparkConf = new SparkConf().
            setMaster("yarn-cluster").// "yarn-cluster" or "yarn-client"
            setAppName("SparX");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
String path = "hdfs://virtual-machine-12.local:8020/mockRecords.csv";
JavaRDD<String> textFile = sc.textFile(path);
System.out.println(textFile.count());

      

If I run the program in mode yarn-cluster

I get a NullPointerException

:

at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:580)

      

If I run the program in yarn-client

, the code hangs afternew JavaSparkContext(sparkConf)

Any idea what I'm missing?
Thank!

+3


source to share


1 answer


Make sure HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory that contains the (client-side) configuration files for the Hadoop cluster.



For your CDH-5.4.0 cluster, you can download the configuration from Cluster/yarn/Actions/Download Client Configuration

, unzip and install in it HADOOP_CONF_DIR

or YARN_CONF_DIR

.

+1


source







All Articles