No space left to exclude device, amazon EMR and S3 instances

Question

No space left to exclude device, amazon EMR and S3 instances

I am running a MapReduce job on Amazon EMR which generates 40 output files, about 130MB each. The 9 most recent failed tasks are excluded from the "Out of space on device" exception. Is it a spurious cluster configuration issue? The job is done seamlessly with fewer input files, fewer output files, and fewer reducers. Any help would be much appreciated. Thank you! Full stack:

Error: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.write(MultipartUploadOutputStream.java:135)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:60)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:83)
at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:105)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

EDIT

I made some further attempts, but unfortunately I am still getting errors. I thought that I might run out of memory on my instances due to the replication factor mentioned in the comment below, so I tried to use the large ones rather than the medium ones that I have experimented with so far. But this time I got another exception:

Error: java.io.IOException: Error closing multipart upload
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadMultiParts(MultipartUploadOutputStream.java:207)
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.close(MultipartUploadOutputStream.java:222)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.util.concurrent.ExecutionException:       com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received. (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)

As a result, only about 70% of the expected output files are obtained, the rest of the reduction tasks are not performed. Then I tried to load a large file into my S3 bucket if there was not enough memory there, but that doesn't seem to be the problem.

I am using aws Elastic MapReduce service. Any ideas?

+3

amazon-s3 amazon-web-services hadoop emr storage

Katerina A. Sep 16 14 at 11:59

source to share

2 answers

vefthym · Answer 1 · 2014-09-16T15:00:41+0000

The problem means there is no room to store the output (or temporary output) of your MapReduce job.

Some things to check:

Have you removed unnecessary files from HDFS? Run the command hadoop dfs -ls /

to check the files stored on HDFS. (If you are using a shopping cart, make sure you empty it too.)
Are you using compression to store the output (or temporary output) of your jobs? You can do this by setting the output format to SequenceFileOutputFormat or by settingsetCompressMapOutput(true);
What is replication rate? The default is set to 3, but if there is a space issue, you can risk setting it to 2 or 1 to get your program running.

The problem might be that some of your reducers are producing significantly more data than others, so check your code as well.

verve · Answer 2 · 2014-09-25T00:38:46+0000

I got out of space errors on AMI 3.2.x where I am not on AMI 3.1.x. Switch AMI and see what happens.

No space left to exclude device, amazon EMR and S3 instances

More articles: