Writing output from the Hadoop combiner
In Hadoop 2.2.0 I have multiple values ββthat can be output by the combiner rather than send them to the reducer.
I tried MultipleOutputs to output to a specific file from a combiner. Unfortunately, I am getting an exception when multiple combiners are created for the same cartographer as they are trying to access the same file.
Is there a way to create different sections for each combiner?
Simplified code:
public class Combiner1 extends Reducer {
String FILENAME = "tmp_round1_comb.txt";
protected MultipleOutputs mos;
@Override
protected void setup (Context context) throws IOException, InterruptedException {
super.setup (context);
String s = String.format ("Combiner-% 05d-% d",
context.getTaskAttemptID (). getTaskID (). getId (),
context.getTaskAttemptID (). getId ());
LOG.info (s);
mos = new MultipleOutputs (context);
}
@Override
protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {
if (specialCase) {
...
mos.write (out1, out2, FILENAME);
} else {
....
context.write (key, out);
}
}
@Override
protected void cleanup (Context context) throws IOException, InterruptedException {
mos.close ();
}
}
An exception:
2014-10-13 14: 53: 00,045 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2014-10-13 14: 53: 00,045 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector@61f2bf35
java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.checkSpillException (MapTask.java:1535)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.flush (MapTask.java:1444)
at org.apache.hadoop.mapred.MapTask $ NewOutputCollector.close (MapTask.java:700)
at org.apache.hadoop.mapred.MapTask.closeQuietly (MapTask.java:1990)
at org.apache.hadoop.mapred.MapTask.runNewMapper (MapTask.java:774)
at org.apache.hadoop.mapred.MapTask.run (MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild $ 2.run (YarnChild.java:167)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main (YarnChild.java:162)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: failed to create file /user/cloudera/scratch/Profiling_q4/20141013_145034/tmp/TMP_0_Out1/_temporary/1/_temporary/attempt_1412109710756_0057_mp_000000 onward .0.1 because the file exists
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2307)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (FSNamesystem.java:2235)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (FSNamesystem.java:2188)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (NameNodeRpcServer.java:505)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create (ClientNamenodeProtocolServerSideTranslatorPB.java:354)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java:1026)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1986)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1982)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
at org.apache.hadoop.ipc.Server $ Handler.run (Server.java:1980)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance (Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException (RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException (RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate (DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSClient.create (DFSClient.java:1461)
at org.apache.hadoop.hdfs.DFSClient.create (DFSClient.java:1386)
at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall (DistributedFileSystem.java:394)
at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall (DistributedFileSystem.java:390)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve (FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create (DistributedFileSystem.java:390)
at org.apache.hadoop.hdfs.DistributedFileSystem.create (DistributedFileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:784)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter (TextOutputFormat.java:132)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter (MultipleOutputs.java:475)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write (MultipleOutputs.java:457)
at mapreduce.guardedfragment.executor.hadoop.combiners.GFCombiner1.reduce (GFCombiner1.java:139)
at mapreduce.guardedfragment.executor.hadoop.combiners.GFCombiner1.reduce (GFCombiner1.java:31)
at org.apache.hadoop.mapreduce.Reducer.run (Reducer.java:171)
at org.apache.hadoop.mapred.Task $ NewCombinerRunner.combine (Task.java:1645)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.sortAndSpill (MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 900 (MapTask.java:853)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer $ SpillThread.run (MapTask.java:1505)
Caused by: org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.fs.FileAlreadyExistsException): failed to create file / user / cloudera / scratch / Profiling_q4 / 20141013_145034 / tmp / TMP_0_Out1 / _temporal / 11200_1097007 / attempt60000 /tmp_round1_comb.txt-m-00000 on client 127.0.0.1 because the file exists
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2307)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (FSNamesystem.java:2235)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (FSNamesystem.java:2188)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (NameNodeRpcServer.java:505)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create (ClientNamenodeProtocolServerSideTranslatorPB.java:354)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java:1026)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1986)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1982)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
at org.apache.hadoop.ipc.Server $ Handler.run (Server.java:1980)
at org.apache.hadoop.ipc.Client.call (Client.java:1409)
at org.apache.hadoop.ipc.Client.call (Client.java:1362)
at org.apache.hadoop.ipc.ProtobufRpcEngine $ Invoker.invoke (ProtobufRpcEngine.java:206)
at com.sun.proxy. $ Proxy10.create (Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java:102)
at com.sun.proxy. $ Proxy10.create (Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create (ClientNamenodeProtocolTranslatorPB.java:258)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate (DFSOutputStream.java:1599)
... 20 more
2014-10-13 14: 53: 00,048 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as: cloudera (auth: SIMPLE) cause: java.io.IOException: Spill failed
2014-10-13 14: 53: 00,049 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child: java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.checkSpillException (MapTask.java:1535)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 300 (MapTask.java:853)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer $ Buffer.write (MapTask.java:1349)
at java.io.DataOutputStream.write (DataOutputStream.java:107)
at org.apache.hadoop.io.Text.write (Text.java:324)
at org.apache.hadoop.io.serializer.WritableSerialization $ WritableSerializer.serialize (WritableSerialization.java:98)
at org.apache.hadoop.io.serializer.WritableSerialization $ WritableSerializer.serialize (WritableSerialization.java:82)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.collect (MapTask.java:1126)
at org.apache.hadoop.mapred.MapTask $ NewOutputCollector.write (MapTask.java:692)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write (TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper $ Context.write (WrappedMapper.java:112)
at mapreduce.guardedfragment.executor.hadoop.mappers.GFMapper1Guard.map (GFMapper1Guard.java:98)
at mapreduce.guardedfragment.executor.hadoop.mappers.GFMapper1Guard.map (GFMapper1Guard.java:37)
at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:145)
at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run (DelegatingMapper.java:55)
at org.apache.hadoop.mapred.MapTask.runNewMapper (MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run (MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild $ 2.run (YarnChild.java:167)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main (YarnChild.java:162)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: failed to create file /user/cloudera/scratch/Profiling_q4/20141013_145034/tmp/TMP_0_Out1/_temporary/1/_temporary/attempt_1412109710756_0057_mp_000000 onward .0.1 because the file exists
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2307)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (FSNamesystem.java:2235)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (FSNamesystem.java:2188)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (NameNodeRpcServer.java:505)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create (ClientNamenodeProtocolServerSideTranslatorPB.java:354)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java:1026)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1986)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1982)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
at org.apache.hadoop.ipc.Server $ Handler.run (Server.java:1980)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance (Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException (RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException (RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate (DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSClient.create (DFSClient.java:1461)
at org.apache.hadoop.hdfs.DFSClient.create (DFSClient.java:1386)
at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall (DistributedFileSystem.java:394)
at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall (DistributedFileSystem.java:390)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve (FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create (DistributedFileSystem.java:390)
at org.apache.hadoop.hdfs.DistributedFileSystem.create (DistributedFileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create (FileSystem.java:784)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter (TextOutputFormat.java:132)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter (MultipleOutputs.java:475)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write (MultipleOutputs.java:457)
at mapreduce.guardedfragment.executor.hadoop.combiners.GFCombiner1.reduce (GFCombiner1.java:139)
at mapreduce.guardedfragment.executor.hadoop.combiners.GFCombiner1.reduce (GFCombiner1.java:31)
at org.apache.hadoop.mapreduce.Reducer.run (Reducer.java:171)
at org.apache.hadoop.mapred.Task $ NewCombinerRunner.combine (Task.java:1645)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.sortAndSpill (MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 900 (MapTask.java:853)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer $ SpillThread.run (MapTask.java:1505)
Caused by: org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.fs.FileAlreadyExistsException): failed to create file / user / cloudera / scratch / Profiling_q4 / 20141013_145034 / tmp / TMP_0_Out1 / _temporal / 11200_1097007 / attempt60000 /tmp_round1_comb.txt-m-00000 on client 127.0.0.1 because the file exists
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2307)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (FSNamesystem.java:2235)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (FSNamesystem.java:2188)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (NameNodeRpcServer.java:505)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create (ClientNamenodeProtocolServerSideTranslatorPB.java:354)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java:1026)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1986)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java:1982)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1554)
at org.apache.hadoop.ipc.Server $ Handler.run (Server.java:1980)
at org.apache.hadoop.ipc.Client.call (Client.java:1409)
at org.apache.hadoop.ipc.Client.call (Client.java:1362)
at org.apache.hadoop.ipc.ProtobufRpcEngine $ Invoker.invoke (ProtobufRpcEngine.java:206)
at com.sun.proxy. $ Proxy10.create (Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java:102)
at com.sun.proxy. $ Proxy10.create (Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create (ClientNamenodeProtocolTranslatorPB.java:258)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate (DFSOutputStream.java:1599)
... 20 more
2014-10
-13 14: 53: 00,065 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
+3
source to share
No one has answered this question yet
Check out similar questions: