Writing output from the Hadoop combiner

In Hadoop 2.2.0 I have multiple values ​​that can be output by the combiner rather than send them to the reducer.

I tried MultipleOutputs to output to a specific file from a combiner. Unfortunately, I am getting an exception when multiple combiners are created for the same cartographer as they are trying to access the same file.

Is there a way to create different sections for each combiner?

Simplified code:

    public class Combiner1 extends Reducer {

        String FILENAME = "tmp_round1_comb.txt";

        protected MultipleOutputs mos;

        @Override
        protected void setup (Context context) throws IOException, InterruptedException {
            super.setup (context);

            String s = String.format ("Combiner-% 05d-% d",
                    context.getTaskAttemptID (). getTaskID (). getId (),
                    context.getTaskAttemptID (). getId ());
            LOG.info (s);

            mos = new MultipleOutputs (context);

        }


        @Override
        protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {

            if (specialCase) {
                ...
                mos.write (out1, out2, FILENAME);

            } else {
                ....
                context.write (key, out);

            }
        }

        @Override
        protected void cleanup (Context context) throws IOException, InterruptedException {
            mos.close ();
        }
    }

An exception:

     2014-10-13 14: 53: 00,045 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
      2014-10-13 14: 53: 00,045 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector@61f2bf35
      java.io.IOException: Spill failed
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.checkSpillException (MapTask.java:1535)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.flush (MapTask.java:1444)
      at org.apache.hadoop.mapred.MapTask $ NewOutputCollector.close (MapTask.java:700)
      at org.apache.hadoop.mapred.MapTask.closeQuietly (MapTask.java:1990)
      at org.apache.hadoop.mapred.MapTask.runNewMapper (MapTask.java:774)
      at org.apache.hadoop.mapred.MapTask.run (MapTask.java:340)
      at org.apache.hadoop.mapred.YarnChild $ 2.run (YarnChild.java:167)
      at java.security.AccessController.doPrivileged (Native Method)
      at javax.security.auth.Subject.doAs (Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
      at org.apache.hadoop.mapred.YarnChild.main (YarnChild.java:162)
      Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: failed to create file /user/cloudera/scratch/Profiling_q4/20141013_145034/tmp/TMP_0_Out1/_temporary/1/_temporary/attempt_1412109710756_0057_mp_000000 onward .0.1 because the file exists
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2307)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (FSNamesystem.java:2235)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (FSNamesystem.java:2188)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (NameNodeRpcServer.java:505)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create (ClientNamenodeProtocolServerSideTranslatorPB.java:354)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:585)
      at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java:1026)
      at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1986)
      at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1982)
      at java.security.AccessController.doPrivileged (Native Method)
      at javax.security.auth.Subject.doAs (Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
      at org.apache.hadoop.ipc.Server $ Handler.run (Server.java:1980)

      at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:57)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance (Constructor.java:526)
      at org.apache.hadoop.ipc.RemoteException.instantiateException (RemoteException.java:106)
      at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException (RemoteException.java:73)
      at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate (DFSOutputStream.java:1603)
      at org.apache.hadoop.hdfs.DFSClient.create (DFSClient.java:1461)
      at org.apache.hadoop.hdfs.DFSClient.create (DFSClient.java:1386)
      at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall (DistributedFileSystem.java:394)
      at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall (DistributedFileSystem.java:390)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve (FileSystemLinkResolver.java:81)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create (DistributedFileSystem.java:390)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create (DistributedFileSystem.java:334)
      at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:906)
      at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:887)
      at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:784)
      at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter (TextOutputFormat.java:132)
      at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter (MultipleOutputs.java:475)
      at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write (MultipleOutputs.java:457)
      at mapreduce.guardedfragment.executor.hadoop.combiners.GFCombiner1.reduce (GFCombiner1.java:139)
      at mapreduce.guardedfragment.executor.hadoop.combiners.GFCombiner1.reduce (GFCombiner1.java:31)
      at org.apache.hadoop.mapreduce.Reducer.run (Reducer.java:171)
      at org.apache.hadoop.mapred.Task $ NewCombinerRunner.combine (Task.java:1645)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.sortAndSpill (MapTask.java:1611)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 900 (MapTask.java:853)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer $ SpillThread.run (MapTask.java:1505)
      Caused by: org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.fs.FileAlreadyExistsException): failed to create file / user / cloudera / scratch / Profiling_q4 / 20141013_145034 / tmp / TMP_0_Out1 / _temporal / 11200_1097007 / attempt60000 /tmp_round1_comb.txt-m-00000 on client 127.0.0.1 because the file exists
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2307)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (FSNamesystem.java:2235)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (FSNamesystem.java:2188)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (NameNodeRpcServer.java:505)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create (ClientNamenodeProtocolServerSideTranslatorPB.java:354)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:585)
      at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java:1026)
      at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1986)
      at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1982)
      at java.security.AccessController.doPrivileged (Native Method)
      at javax.security.auth.Subject.doAs (Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
      at org.apache.hadoop.ipc.Server $ Handler.run (Server.java:1980)

      at org.apache.hadoop.ipc.Client.call (Client.java:1409)
      at org.apache.hadoop.ipc.Client.call (Client.java:1362)
      at org.apache.hadoop.ipc.ProtobufRpcEngine $ Invoker.invoke (ProtobufRpcEngine.java:206)
      at com.sun.proxy. $ Proxy10.create (Unknown Source)
      at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke (Method.java:606)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java:186)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java:102)
      at com.sun.proxy. $ Proxy10.create (Unknown Source)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create (ClientNamenodeProtocolTranslatorPB.java:258)
      at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate (DFSOutputStream.java:1599)
      ... 20 more
      2014-10-13 14: 53: 00,048 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as: cloudera (auth: SIMPLE) cause: java.io.IOException: Spill failed
      2014-10-13 14: 53: 00,049 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child: java.io.IOException: Spill failed
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.checkSpillException (MapTask.java:1535)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 300 (MapTask.java:853)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer $ Buffer.write (MapTask.java:1349)
      at java.io.DataOutputStream.write (DataOutputStream.java:107)
      at org.apache.hadoop.io.Text.write (Text.java:324)
      at org.apache.hadoop.io.serializer.WritableSerialization $ WritableSerializer.serialize (WritableSerialization.java:98)
      at org.apache.hadoop.io.serializer.WritableSerialization $ WritableSerializer.serialize (WritableSerialization.java:82)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.collect (MapTask.java:1126)
      at org.apache.hadoop.mapred.MapTask $ NewOutputCollector.write (MapTask.java:692)
      at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write (TaskInputOutputContextImpl.java:89)
      at org.apache.hadoop.mapreduce.lib.map.WrappedMapper $ Context.write (WrappedMapper.java:112)
      at mapreduce.guardedfragment.executor.hadoop.mappers.GFMapper1Guard.map (GFMapper1Guard.java:98)
      at mapreduce.guardedfragment.executor.hadoop.mappers.GFMapper1Guard.map (GFMapper1Guard.java:37)
      at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:145)
      at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run (DelegatingMapper.java:55)
      at org.apache.hadoop.mapred.MapTask.runNewMapper (MapTask.java:764)
      at org.apache.hadoop.mapred.MapTask.run (MapTask.java:340)
      at org.apache.hadoop.mapred.YarnChild $ 2.run (YarnChild.java:167)
      at java.security.AccessController.doPrivileged (Native Method)
      at javax.security.auth.Subject.doAs (Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
      at org.apache.hadoop.mapred.YarnChild.main (YarnChild.java:162)
      Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: failed to create file /user/cloudera/scratch/Profiling_q4/20141013_145034/tmp/TMP_0_Out1/_temporary/1/_temporary/attempt_1412109710756_0057_mp_000000 onward .0.1 because the file exists
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2307)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (FSNamesystem.java:2235)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (FSNamesystem.java:2188)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (NameNodeRpcServer.java:505)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create (ClientNamenodeProtocolServerSideTranslatorPB.java:354)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:585)
      at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java:1026)
      at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1986)
      at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1982)
      at java.security.AccessController.doPrivileged (Native Method)
      at javax.security.auth.Subject.doAs (Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
      at org.apache.hadoop.ipc.Server $ Handler.run (Server.java:1980)

      at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:57)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance (Constructor.java:526)
      at org.apache.hadoop.ipc.RemoteException.instantiateException (RemoteException.java:106)
      at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException (RemoteException.java:73)
      at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate (DFSOutputStream.java:1603)
      at org.apache.hadoop.hdfs.DFSClient.create (DFSClient.java:1461)
      at org.apache.hadoop.hdfs.DFSClient.create (DFSClient.java:1386)
      at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall (DistributedFileSystem.java:394)
      at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall (DistributedFileSystem.java:390)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve (FileSystemLinkResolver.java:81)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create (DistributedFileSystem.java:390)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create (DistributedFileSystem.java:334)
      at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:906)
      at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:887)
      at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:784)
      at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter (TextOutputFormat.java:132)
      at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter (MultipleOutputs.java:475)
      at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write (MultipleOutputs.java:457)
      at mapreduce.guardedfragment.executor.hadoop.combiners.GFCombiner1.reduce (GFCombiner1.java:139)
      at mapreduce.guardedfragment.executor.hadoop.combiners.GFCombiner1.reduce (GFCombiner1.java:31)
      at org.apache.hadoop.mapreduce.Reducer.run (Reducer.java:171)
      at org.apache.hadoop.mapred.Task $ NewCombinerRunner.combine (Task.java:1645)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.sortAndSpill (MapTask.java:1611)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 900 (MapTask.java:853)
      at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer $ SpillThread.run (MapTask.java:1505)
      Caused by: org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.fs.FileAlreadyExistsException): failed to create file / user / cloudera / scratch / Profiling_q4 / 20141013_145034 / tmp / TMP_0_Out1 / _temporal / 11200_1097007 / attempt60000 /tmp_round1_comb.txt-m-00000 on client 127.0.0.1 because the file exists
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2307)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (FSNamesystem.java:2235)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (FSNamesystem.java:2188)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (NameNodeRpcServer.java:505)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create (ClientNamenodeProtocolServerSideTranslatorPB.java:354)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:585)
      at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java:1026)
      at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1986)
      at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1982)
      at java.security.AccessController.doPrivileged (Native Method)
      at javax.security.auth.Subject.doAs (Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
      at org.apache.hadoop.ipc.Server $ Handler.run (Server.java:1980)

      at org.apache.hadoop.ipc.Client.call (Client.java:1409)
      at org.apache.hadoop.ipc.Client.call (Client.java:1362)
      at org.apache.hadoop.ipc.ProtobufRpcEngine $ Invoker.invoke (ProtobufRpcEngine.java:206)
      at com.sun.proxy. $ Proxy10.create (Unknown Source)
      at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke (Method.java:606)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java:186)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java:102)
      at com.sun.proxy. $ Proxy10.create (Unknown Source)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create (ClientNamenodeProtocolTranslatorPB.java:258)
      at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate (DFSOutputStream.java:1599)
      ... 20 more

      2014-10

-13 14: 53: 00,065 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
+3


source to share