Block missing exception while processing data from hdfs in spark autonomous cluster
I fired a spark on a Haupe with 2 workers and 2 datanodes. The first machine contains: sparkmaster, namenode, worker-1, datanode-1. The second machine contains: worker2, datanode2
The hasoop cluster has 2 files in the / usr directory on datanode-1: Notice.txt and on datanode-2: README.txt
I want to create an rdd from these two files and do a line count.
on the first machine I started the spark shell with the spark: // masterIP: 7077 [Offline] master
Then an RDD is created in scala command line with val rdd = sc.textFile ("/ usr /") but when I went to count operation rdd.count () it throws the following error
(TID 2, masterIP, executor 1): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1313298757-masterIP-1499412323227:blk_1073741827_1003 file=/usr/README.txt
worker-1 selects NOTICE.txt , but worker-2 does not select README.txt
I didn't have a problem, any help would be appreciated, thanks
source to share
No one has answered this question yet
Check out similar questions: