CDH5.2: MR, unable to initialize any output collector

Question

CDH5.2: MR, unable to initialize any output collector

Cloudera CDH5.2 VM Quick Start Cloudera Manager shows all nodes state = GREEN

I got it working on Eclipse with MR, including all the relevant cloudera clusters in the build path: Avro-1.7.6-cdh5.2.0.jar, Avro-mapred-1.7.6-cdh5.2.0-hadoop2.jar, Hadoop-in-phase 2.5. 0-cdh5.2.0.jar, Hadoop-MapReduce-client-core-2.5.0-cdh5.2.0.jar

I completed the following task

hadoop jar jproject1.jar avro00.AvroUserPrefCount -libjars ${LIBJARS} avro/00/in avro/00/out

I am getting the following error, this is a Java heap issue, any comments? Thank you in advance

14/11/14 01:02:40 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
14/11/14 01:02:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/14 01:02:43 INFO mapreduce.JobSubmitter: number of splits:1
14/11/14 01:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415950730849_0001
14/11/14 01:02:45 INFO impl.YarnClientImpl: Submitted application application_1415950730849_0001
14/11/14 01:02:45 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1415950730849_0001/
14/11/14 01:02:45 INFO mapreduce.Job: Running job: job_1415950730849_0001
14/11/14 01:03:04 INFO mapreduce.Job: Job job_1415950730849_0001 running in uber mode : false
14/11/14 01:03:04 INFO mapreduce.Job:  map 0% reduce 0%
14/11/14 01:03:11 INFO mapreduce.Job: Task Id : attempt_1415950730849_0001_m_000000_0, Status : FAILED
Error: java.io.IOException: Unable to initialize any output collector
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:412)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:695)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
...
...

+3

java mapreduce hadoop yarn cloudera-cdh

guzu92 14 nov. '14 at 9:19

source to share

2 answers

I ran into the same question yesterday. I checked the syslog for a specific map task that was unsuccessful and assumed that I was getting another exception on that task that was causing this error. In my case, it was incorrect parsing, and when I fixed this issue, this error was fixed.

Taking a closer look at the failed task log should give you the root cause of the problem.

+1

vpv 17 Mar 15 at 12:11 am

source to share

Harsh J · Accepted Answer · 2015-01-16T16:25:50+0000

Checking the complete task log of the failed attempt attempt_1415950730849_0001_m_000000_0

will help tell you why you encountered this exception.

The most common reason for seeing this error is a misconfigured value io.sort.mb

in your job. Its value should never be anywhere close (or higher) to the heap size of the given map configuration, nor should it exceed ~2000 MB

(maximum Java array size).

Improvement of the above described improvement to the error in true failure has also been reported and resolved recently through MAPREDUCE-6194 .

CDH5.2: MR, unable to initialize any output collector

More articles: