Spark saveAsNewAPIHadoopFile works in local mode but not in cluster mode

After upgrading to CDH5.4 and Spark streaming 1.3, I ran into a strange issue where saveAsNewAPIHadoopFile no longer saves files to HDFS as it is supposed to. I can see that a _temp directory has been generated, but when Save is finished, the _temp is removed and leaves the directory empty with just a SUCCESS file. I got the feeling that the files were generated, but then they could not be removed from the _temp directory until the _temp was removed.

This issue only occurs when working in Spark Cluster (offline). If I run the job using local spark, the files are saved as expected.

Some help would be appreciated.

+3


source to share


1 answer


Are you running this on your laptop / desktop?



One way can happen if the path you are using for your output is a relative path in NFS. In this case, Spark assumes relative paths hdfs: // not file: // and cannot be written to disk.

0


source







All Articles