No space left to exclude device, amazon EMR and S3 instances

I am running a MapReduce job on Amazon EMR which generates 40 output files, about 130MB each. The 9 most recent failed tasks are excluded from the "Out of space on device" exception. Is it a spurious cluster configuration issue? The job is done seamlessly with fewer input files, fewer output files, and fewer reducers. Any help would be much appreciated. Thank you! Full stack:

Error: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.write(MultipartUploadOutputStream.java:135)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:60)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:83)
at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:105)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

      

EDIT

I made some further attempts, but unfortunately I am still getting errors. I thought that I might run out of memory on my instances due to the replication factor mentioned in the comment below, so I tried to use the large ones rather than the medium ones that I have experimented with so far. But this time I got another exception:

Error: java.io.IOException: Error closing multipart upload
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadMultiParts(MultipartUploadOutputStream.java:207)
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.close(MultipartUploadOutputStream.java:222)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.util.concurrent.ExecutionException:       com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received. (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188) 

      

As a result, only about 70% of the expected output files are obtained, the rest of the reduction tasks are not performed. Then I tried to load a large file into my S3 bucket if there was not enough memory there, but that doesn't seem to be the problem.

I am using aws Elastic MapReduce service. Any ideas?

+3


source to share


2 answers


The problem means there is no room to store the output (or temporary output) of your MapReduce job.

Some things to check:



  • Have you removed unnecessary files from HDFS? Run the command hadoop dfs -ls /

    to check the files stored on HDFS. (If you are using a shopping cart, make sure you empty it too.)
  • Are you using compression to store the output (or temporary output) of your jobs? You can do this by setting the output format to SequenceFileOutputFormat or by settingsetCompressMapOutput(true);

  • What is replication rate? The default is set to 3, but if there is a space issue, you can risk setting it to 2 or 1 to get your program running.

The problem might be that some of your reducers are producing significantly more data than others, so check your code as well.

+1


source


I got out of space errors on AMI 3.2.x where I am not on AMI 3.1.x. Switch AMI and see what happens.



0


source







All Articles