Spark fails to save in hadoop (permission denied for user)

Question

Spark fails to save in hadoop (permission denied for user)

I am creating a spark app for counting the number of words in a file. I run the app on the cloudera quickstart VM, everything is fine when I use the cloudera user directory, but when I want to write or read in another user directory, I have permission denied from hadoop. I would like to know how to change the hadoop user to sparks.

package user1.item1

import user1.{Article}
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.SparkContext._
import scala.util.{Try, Success, Failure}

object WordCount {
  def main(args: Array[String]) {
    Context.User = 'espacechange'
    val filename = "hdfs://quickstart.cloudera:8020/user/user1/test/wiki_test/wikipedia.txt"
    val conf = new SparkConf().setAppName("word count")
    val sc = new SparkContext(conf)
    val wikipedia = sc.textFile(filename).map(Article.parseWikipediaArticle)
    val counts = wikipedia.flatMap(line => line.text.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)

    counts.saveAsTextFile("hdfs://quickstart.cloudera:8020/user/user1/test/word_count")
  }
}

+3

scala apache-spark cloudera-cdh

scauglog Dec 11. 14 at 15:55

source to share

2 answers

It depends on the authentication of your cluster. By default, you can set the following environment variable:

$ export HADOOP_USER_NAME=hdfs

Try the above before sending sparks.

+8

gonbe Dec 12. 14 at 12:40 am

source to share

Nicola ferraro · Accepted Answer · 2014-12-11T23:12:46+0000

You need to run spark-submit script with another OS user.

For example, use the following command to run spark app as (and get permissions) as HDFS user:

sudo -u hdfs spark-submit ....

Spark fails to save in hadoop (permission denied for user)

More articles: