Block missing exception while processing data from hdfs in spark autonomous cluster

I fired a spark on a Haupe with 2 workers and 2 datanodes. The first machine contains: sparkmaster, namenode, worker-1, datanode-1. The second machine contains: worker2, datanode2

The hasoop cluster has 2 files in the / usr directory on datanode-1: Notice.txt and on datanode-2: README.txt

I want to create an rdd from these two files and do a line count.

on the first machine I started the spark shell with the spark: // masterIP: 7077 [Offline] master

Then an RDD is created in scala command line with val rdd = sc.textFile ("/ usr /") but when I went to count operation rdd.count () it throws the following error

(TID 2, masterIP, executor 1): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1313298757-masterIP-1499412323227:blk_1073741827_1003 file=/usr/README.txt

      

worker-1 selects NOTICE.txt , but worker-2 does not select README.txt

I didn't have a problem, any help would be appreciated, thanks

+3


source to share





All Articles