CDH-5.4.0, Spark Yarn, Cluster Mode and Java
I have a cluster CDH-5.4.0
, 4 knots with spark yarn.
I have an environment variable YARN_CONF_DIR
pointing to a directory that contains a copy of the config files taken from one of the cluster members (where is the address of the yarn resource manager).
I want to run spark jobs from java:
SparkConf sparkConf = new SparkConf().
setMaster("yarn-cluster").// "yarn-cluster" or "yarn-client"
setAppName("SparX");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
String path = "hdfs://virtual-machine-12.local:8020/mockRecords.csv";
JavaRDD<String> textFile = sc.textFile(path);
System.out.println(textFile.count());
If I run the program in mode yarn-cluster
I get a NullPointerException
:
at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:580)
If I run the program in yarn-client
, the code hangs afternew JavaSparkContext(sparkConf)
Any idea what I'm missing?
Thank!
source to share
Make sure HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory that contains the (client-side) configuration files for the Hadoop cluster.
For your CDH-5.4.0 cluster, you can download the configuration from Cluster/yarn/Actions/Download Client Configuration
, unzip and install in it HADOOP_CONF_DIR
or YARN_CONF_DIR
.
source to share