Python child script consumes all stdin

I found strange behavior with raw_input / readline when running python scripts in bash scripts.

In short, by passing the entire stdin at once (each record separated by a newline) to the parent script, child bash scripts will only accept the stdin they need, while child python scripts will consume all stdin, leaving nothing for the next children. I came up with a simple example to demonstrate what I mean:

Parent script (parent.sh)

#!/bin/bash

./child.sh
./child.sh
./child.py
./child.py

      

Bash child script (child.sh)

#!/bin/bash

read -a INPUT
echo "sh: got input: ${INPUT}"

      

Child Python script (child.py)

#!/usr/bin/python -B

import sys

INPUT = raw_input()
print "py: got input: {}".format(INPUT)

      

Expected Result

./parent.sh <<< $'aa\nbb\ncc\ndd'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> py: got input: dd

      

Actual result

./parent.sh <<< $'aa\nbb\ncc\ndd\n'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> Traceback (most recent call last):
>>   File "./child.py", line 5, in <module>
>>     INPUT = raw_input()
>> EOFError: EOF when reading a line

      

raw_input seems to flush all remaining lines to stdin. Using sys.stdin.readline instead of raw_input does not raise an EOFError, however the input received is an empty string and not the expected 'dd'.

What's going on here? How can I avoid this behavior so that the last child of the script receives the expected input?

edit . To be sure, I added a few more lines to stdin and the result is the same:

./parent.sh <<< $'aa\nbb\ncc\ndd\nff\nee\n'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> Traceback (most recent call last):
>>   File "./child.py", line 5, in <module>
>>     INPUT = raw_input()
>> EOFError: EOF when reading a line

      

+3


source to share


2 answers


Here's an easier way to demonstrate the same problem:

printf "%s\n" foo bar | {
    head -n 1
    head -n 1
}

      

By all accounts, it looks like it should print two lines, but is bar

mysteriously missing.

This is because the read lines are false. The UNIX programming model does not support it.

Instead, what basically all tools do is consume the entire buffer, cut the first line, and leave the rest of the buffer for the next call. This is true for head

, Python raw_input()

, C fgets()

, Java BufferedReader.readLine()

and pretty much everything else.

Since UNIX counts the entire buffer as consumed, no matter how much the program actually ends, the rest of the buffer is discarded when the program exits.

bash

however works around it: it reads bytes by bytes until it reaches a line. This is very inefficient, but read

only allows one line from a stream to be consumed, leaving the rest in place for the next process.



You can do the same in Python by opening a raw, unbuffered reader:

import sys
import os
f = os.fdopen(sys.stdin.fileno(), 'rb', 0)
line=f.readline()[:-1]
print "Python read: ", line

      

We can check this as follows:

printf "%s\n" foo bar | {
    python myscript
    python myscript
}

      

prints

Python read: foo
Python read: bar

      

+3


source


By default, the python interpreter will buffer standard input. You can use an option -u

to disable this behavior, although it is less efficient.

parent.sh

/bin/bash

./child.sh
./child.sh
python -u child.py
python -u child.py

      



Output

./parent.sh <<< $'aa\nbb\ncc\ndd'
sh: got input: aa
sh: got input: bb
py: got input: cc 
py: got input: dd

      

+1


source







All Articles