Hasoop, python, subprocess code 127

Question

Hasoop, python, subprocess code 127

I am trying to accomplish a very simple task with mapreduce.

mapper.py:

#!/usr/bin/env python
import sys
for line in sys.stdin:
    print line

my txt file:

qwerty
asdfgh
zxc

Command line to start the job:

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper /home/cloudera/Documents/map.py \
-file /home/cloudera/Documents/map.py

Mistake:

INFO mapreduce.Job: Task Id : attempt_1490617885665_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

How do I fix this and run the code? When I use cat /home/cloudera/Documents/test.txt | python /home/cloudera/Documents/map.py

it works fine

!!!!! UPDATE

There is something wrong with my * .py file. I copied the file from github 'tom white hadoop book' and everything works fine.

But I cannot understand what is the reason. These are not permissions and encodings (if I'm not mistaken). What else could it be?

+4

python mapreduce hadoop hadoop-streaming cloudera

Headmaster 27 Mar 17 at 14:06

source to share

4 answers

akshay lad · Answer 1 · 2018-04-02T11:58:40+0000

I faced the same problem.

Problem: When the python file is created in Windows environment, the newline character is CRLF . My hadoop is running on Linux which understands newline as LF

enter image description here

Solution: after changing CRLF to LF, the step was successful.

enter image description here

fi11er · Answer 2 · 2017-03-28T16:11:07+0000

In the argument, -mapper

you must set the command to run on the cluster nodes. So the file /home/cloudera/Documents/map.py is missing. The files you pass with the option -files

are placed in the working directory, so you can simply use it like this:./map.py

I don't remember what permissions are set for this file, so if there is no execute permission, use it like python map.py

so the full command

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper "python map.py" \
-file /home/cloudera/Documents/map.py

Mohamadreza Rostami · Answer 3 · 2019-03-24T18:26:40+0000

You have an error in your mapper.py or reducer.py.for like:

Do not use #!/usr/bin/env python

over files.
Syntax or logical error in your Python codes. (e.g. print has different syntax in python2 and python3.)

Nikita Bhanderi · Answer 4 · 2019-07-17T07:52:30+0000

Check first python --version

. If the output python --version

is

Command 'python' not found, but can be installed with:

sudo apt install python3       
sudo apt install python        
sudo apt install python-minimal

You also have python3 installed, you can run 'python3' instead.

Install python with sudo apt install python

and run hadoop job

It worked on my PC and finally it worked

Hasoop, python, subprocess code 127

More articles: