Python child script consumes all stdin
I found strange behavior with raw_input / readline when running python scripts in bash scripts.
In short, by passing the entire stdin at once (each record separated by a newline) to the parent script, child bash scripts will only accept the stdin they need, while child python scripts will consume all stdin, leaving nothing for the next children. I came up with a simple example to demonstrate what I mean:
Parent script (parent.sh)
#!/bin/bash
./child.sh
./child.sh
./child.py
./child.py
Bash child script (child.sh)
#!/bin/bash
read -a INPUT
echo "sh: got input: ${INPUT}"
Child Python script (child.py)
#!/usr/bin/python -B
import sys
INPUT = raw_input()
print "py: got input: {}".format(INPUT)
Expected Result
./parent.sh <<< $'aa\nbb\ncc\ndd'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> py: got input: dd
Actual result
./parent.sh <<< $'aa\nbb\ncc\ndd\n'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> Traceback (most recent call last):
>> File "./child.py", line 5, in <module>
>> INPUT = raw_input()
>> EOFError: EOF when reading a line
raw_input seems to flush all remaining lines to stdin. Using sys.stdin.readline instead of raw_input does not raise an EOFError, however the input received is an empty string and not the expected 'dd'.
What's going on here? How can I avoid this behavior so that the last child of the script receives the expected input?
edit . To be sure, I added a few more lines to stdin and the result is the same:
./parent.sh <<< $'aa\nbb\ncc\ndd\nff\nee\n'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> Traceback (most recent call last):
>> File "./child.py", line 5, in <module>
>> INPUT = raw_input()
>> EOFError: EOF when reading a line
source to share
Here's an easier way to demonstrate the same problem:
printf "%s\n" foo bar | {
head -n 1
head -n 1
}
By all accounts, it looks like it should print two lines, but is bar
mysteriously missing.
This is because the read lines are false. The UNIX programming model does not support it.
Instead, what basically all tools do is consume the entire buffer, cut the first line, and leave the rest of the buffer for the next call. This is true for head
, Python raw_input()
, C fgets()
, Java BufferedReader.readLine()
and pretty much everything else.
Since UNIX counts the entire buffer as consumed, no matter how much the program actually ends, the rest of the buffer is discarded when the program exits.
bash
however works around it: it reads bytes by bytes until it reaches a line. This is very inefficient, but read
only allows one line from a stream to be consumed, leaving the rest in place for the next process.
You can do the same in Python by opening a raw, unbuffered reader:
import sys
import os
f = os.fdopen(sys.stdin.fileno(), 'rb', 0)
line=f.readline()[:-1]
print "Python read: ", line
We can check this as follows:
printf "%s\n" foo bar | {
python myscript
python myscript
}
prints
Python read: foo
Python read: bar
source to share
By default, the python interpreter will buffer standard input. You can use an option -u
to disable this behavior, although it is less efficient.
parent.sh
/bin/bash
./child.sh
./child.sh
python -u child.py
python -u child.py
Output
./parent.sh <<< $'aa\nbb\ncc\ndd'
sh: got input: aa
sh: got input: bb
py: got input: cc
py: got input: dd
source to share