Hasoop, python, subprocess code 127
I am trying to accomplish a very simple task with mapreduce.
mapper.py:
#!/usr/bin/env python
import sys
for line in sys.stdin:
print line
my txt file:
qwerty
asdfgh
zxc
Command line to start the job:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper /home/cloudera/Documents/map.py \
-file /home/cloudera/Documents/map.py
Mistake:
INFO mapreduce.Job: Task Id : attempt_1490617885665_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
How do I fix this and run the code? When I use cat /home/cloudera/Documents/test.txt | python /home/cloudera/Documents/map.py
it works fine
!!!!! UPDATE
There is something wrong with my * .py file. I copied the file from github 'tom white hadoop book' and everything works fine.
But I cannot understand what is the reason. These are not permissions and encodings (if I'm not mistaken). What else could it be?
source to share
In the argument, -mapper
you must set the command to run on the cluster nodes. So the file /home/cloudera/Documents/map.py is missing. The files you pass with the option -files
are placed in the working directory, so you can simply use it like this:./map.py
I don't remember what permissions are set for this file, so if there is no execute permission, use it like python map.py
so the full command
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper "python map.py" \
-file /home/cloudera/Documents/map.py
source to share
Check first python --version
. If the output python --version
is
Command 'python' not found, but can be installed with:
sudo apt install python3
sudo apt install python
sudo apt install python-minimal
You also have python3 installed, you can run 'python3' instead.
Install python with sudo apt install python
and run hadoop job
It worked on my PC and finally it worked
source to share