Hasoop, python, subprocess code 127

I am trying to accomplish a very simple task with mapreduce.

mapper.py:

#!/usr/bin/env python
import sys
for line in sys.stdin:
    print line

      

my txt file:

qwerty
asdfgh
zxc

      

Command line to start the job:

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper /home/cloudera/Documents/map.py \
-file /home/cloudera/Documents/map.py

      

Mistake:

INFO mapreduce.Job: Task Id : attempt_1490617885665_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

      

How do I fix this and run the code? When I use cat /home/cloudera/Documents/test.txt | python /home/cloudera/Documents/map.py

it works fine

!!!!! UPDATE

There is something wrong with my * .py file. I copied the file from github 'tom white hadoop book' and everything works fine.

But I cannot understand what is the reason. These are not permissions and encodings (if I'm not mistaken). What else could it be?

+4


source to share


4 answers


I faced the same problem.

Problem: When the python file is created in Windows environment, the newline character is CRLF . My hadoop is running on Linux which understands newline as LF

enter image description here



Solution: after changing CRLF to LF, the step was successful.

enter image description here

+7


source


In the argument, -mapper

you must set the command to run on the cluster nodes. So the file /home/cloudera/Documents/map.py is missing. The files you pass with the option -files

are placed in the working directory, so you can simply use it like this:./map.py

I don't remember what permissions are set for this file, so if there is no execute permission, use it like python map.py



so the full command

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper "python map.py" \
-file /home/cloudera/Documents/map.py

      

0


source


You have an error in your mapper.py or reducer.py.for like:

  1. Do not use #!/usr/bin/env python

    over files.
  2. Syntax or logical error in your Python codes. (e.g. print has different syntax in python2 and python3.)
0


source


Check first python --version

. If the output python --version

is

Command 'python' not found, but can be installed with:

sudo apt install python3       
sudo apt install python        
sudo apt install python-minimal

You also have python3 installed, you can run 'python3' instead.

      

Install python with sudo apt install python

and run hadoop job

It worked on my PC and finally it worked

-1


source







All Articles