Lazy (unbuffered) shell processing
I am trying to figure out how to perform the laziest possible processing on a standard UNIX shell pipeline. For example, let's say I have a command that does some computation and outputs along the way, but the computation becomes more and more expensive so that the first few lines of output come in quickly and then subsequent lines get slower. If I'm only interested in the first few lines, I want to get them through lazy evaluation , completing the calculations as soon as possible before they get too expensive.
This can be achieved with a straight-line shell pipeline, for example:
./expensive | head -n 2
However, this does not work optimally. Let's simulate the computation with a script that's exponentially slower:
#!/bin/sh
i=1
while true; do
echo line $i
sleep $(( i ** 4 ))
i=$(( i+1 ))
done
Now when I connect this script through head -n 2
, I observe the following:
-
line 1
... - After sleeping, one second is displayed
line 2
. - Even though it
head -n 2
has already received two (\n
-terminated) lines and exits,expensive
continues to run and now waits another 16 seconds (2 ** 4
) to complete, after which the pipeline also exits.
Obviously, this is not as lazy as we would like, because it ideally expensive
exits as soon as the process head
receives two lines. However, this does not happen; IIUC actually ends after trying to write its third line, because at this point it is trying to write to its own STDOUT
, which is connected through a pipe to a process STDIN
head
that has already exited and therefore no longer reads the input from the pipe. This results in what it expensive
gets SIGPIPE
, which forces the interpreter bash
to run the script to invoke its handler SIGPIPE
, which exits the script by default (although this can be changed with trap
).
So the question is, how can I make it so that it expensive
completes immediately upon completion head
, not just when it expensive
tries to write its third line to a pipe that no longer has a listener on the other end? Since the pipeline is built and driven by an interactive shell process, I typed in the command ./expensive | head -n 2
, presumably the interactive shell is where there would be any solution to this problem, not in any modification expensive
or head
? Is there any native trick or additional utility that can construct pipelines with the behavior I want? Or perhaps it is simply not possible to achieve what I want in bash
orzsh
and the only way would be to write my own pipeline manager (in Ruby or Python for example) that pops up when the reader quits and immediately quits the writer?
source to share
If all you care about is foreground control, you can run expensive
in process override ; it still blocks until the next write attempt, but head
exits immediately (and the script's flow control can continue) after it has received its input
head -n 2 < <(exec ./expensive)
# expensive still runs 16 seconds in the background, but doesn't block your program
In bash 4.4, they store their PIDs in $!
and allow a process to be manipulated in the same way as other background processes.
# REQUIRES BASH 4.4 OR NEWER
exec {expensive_fd}< <(exec ./expensive); expensive_pid=$!
head -n 2 <&"$expensive_fd" # read the content we want
exec {expensive_fd}<&- # close the descriptor
kill "$expensive_pid" # and kill the process
Another approach is coprocess, which has the advantage that only bash 4.0 is required:
# magic: store stdin and stdout FDs in an array named "expensive", and PID in expensive_PID
coproc expensive { exec ./expensive }
# read two lines from input FD...
head -n 2 <&"${expensive[0]}"
# ...and kill the process.
kill "$expensive_PID"
source to share
I will answer with a POSIX shell.
What you can do is use a fifo instead of a pipe and kill the first link when the second one completes.
If the costly process is a leafy process or if it cares about killing its children, you can use simple kill. If it is a shell creating a process script, you must start it on a process group (doable with set -m
) and kill it with a process group kill.
Sample code:
#!/bin/sh -e
expensive()
{
i=1
while true; do
echo line $i
sleep 0.$i #sped it up a little
echo >&2 slept
i=$(( i+1 ))
done
}
echo >&2 NORMAL
expensive | head -n2
#line 1
#slept
#line 2
#slept
echo >&2 SPED-UP
mkfifo pipe
exec 3<>pipe
rm pipe
set -m; expensive >&3 & set +m
<&3 head -n 2
kill -- -$!
#line 1
#slept
#line 2
If you run this, the second run should not contain the second line slept
, which means the first link was killed at the time of completion head
, not when the first link tried to output after completion head
.
source to share