Error reading file in Spark

I'm having a hard time figuring out why Spark doesn't have access to the file that I'm adding to the context. Below is my code in repl:

scala> sc.addFile("/home/ubuntu/my_demo/src/main/resources/feature_matrix.json")

scala> val featureFile = sc.textFile(SparkFiles.get("feature_matrix.json"))

featureFile: org.apache.spark.rdd.RDD[String] = /tmp/spark/ubuntu/spark-d7a13d92-2923-4a04-a9a5-ad93b3650167/feature_matrix.json MappedRDD[1] at textFile at <console>:60

scala> featureFile.first()
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: cfs://172.30.26.95/tmp/spark/ubuntu/spark-d7a13d92-2923-4a04-a9a5-ad93b3650167/feature_matrix.json

      

The file does exist in /tmp/spark/ubuntu/spark-d7a13d92-2923-4a04-a9a5-ad93b3650167/feature_matrix.json

Any help was appreciated.

+3


source to share


1 answer


If you are using addFile

, you need to use get

to get it. Also, the method is addFile

lazy, so it is very possible that it was not placed in the place you find until you actually name it first

, so you create a circle like this.



All that has been said, I don't know what the use is SparkFiles

, since the first action will always be a smart idea. Use something like --files

c SparkSubmit

and the files will be placed in your working directory.

+1


source







All Articles