Lazy (unbuffered) shell processing

I am trying to figure out how to perform the laziest possible processing on a standard UNIX shell pipeline. For example, let's say I have a command that does some computation and outputs along the way, but the computation becomes more and more expensive so that the first few lines of output come in quickly and then subsequent lines get slower. If I'm only interested in the first few lines, I want to get them through lazy evaluation , completing the calculations as soon as possible before they get too expensive.

This can be achieved with a straight-line shell pipeline, for example:

./expensive | head -n 2

      

However, this does not work optimally. Let's simulate the computation with a script that's exponentially slower:

#!/bin/sh

i=1
while true; do
    echo line $i
    sleep $(( i ** 4 ))
    i=$(( i+1 ))
done

      

Now when I connect this script through head -n 2

, I observe the following:

  • line 1

    ...
  • After sleeping, one second is displayed line 2

    .
  • Even though it head -n 2

    has already received two ( \n

    -terminated) lines and exits, expensive

    continues to run and now waits another 16 seconds ( 2 ** 4

    ) to complete, after which the pipeline also exits.

Obviously, this is not as lazy as we would like, because it ideally expensive

exits as soon as the process head

receives two lines. However, this does not happen; IIUC actually ends after trying to write its third line, because at this point it is trying to write to its own STDOUT

, which is connected through a pipe to a process STDIN

head

that has already exited and therefore no longer reads the input from the pipe. This results in what it expensive

gets SIGPIPE

, which forces the interpreter bash

to run the script to invoke its handler SIGPIPE

, which exits the script by default (although this can be changed with trap

).

So the question is, how can I make it so that it expensive

completes immediately upon completion head

, not just when it expensive

tries to write its third line to a pipe that no longer has a listener on the other end? Since the pipeline is built and driven by an interactive shell process, I typed in the command ./expensive | head -n 2

, presumably the interactive shell is where there would be any solution to this problem, not in any modification expensive

or head

? Is there any native trick or additional utility that can construct pipelines with the behavior I want? Or perhaps it is simply not possible to achieve what I want in bash

orzsh

and the only way would be to write my own pipeline manager (in Ruby or Python for example) that pops up when the reader quits and immediately quits the writer?

+3


source to share


2 answers


If all you care about is foreground control, you can run expensive

in process override ; it still blocks until the next write attempt, but head

exits immediately (and the script's flow control can continue) after it has received its input

head -n 2 < <(exec ./expensive)
# expensive still runs 16 seconds in the background, but doesn't block your program

      

In bash 4.4, they store their PIDs in $!

and allow a process to be manipulated in the same way as other background processes.



# REQUIRES BASH 4.4 OR NEWER
exec {expensive_fd}< <(exec ./expensive); expensive_pid=$!
head -n 2 <&"$expensive_fd"  # read the content we want
exec {expensive_fd}<&-       # close the descriptor
kill "$expensive_pid"        # and kill the process

      

Another approach is coprocess, which has the advantage that only bash 4.0 is required:

# magic: store stdin and stdout FDs in an array named "expensive", and PID in expensive_PID
coproc expensive { exec ./expensive }

# read two lines from input FD...
head -n 2 <&"${expensive[0]}"

# ...and kill the process.
kill "$expensive_PID"

      

+4


source


I will answer with a POSIX shell.

What you can do is use a fifo instead of a pipe and kill the first link when the second one completes.

If the costly process is a leafy process or if it cares about killing its children, you can use simple kill. If it is a shell creating a process script, you must start it on a process group (doable with set -m

) and kill it with a process group kill.



Sample code:

#!/bin/sh -e
expensive()
{
    i=1
    while true; do
        echo line $i
        sleep 0.$i     #sped it up a little
        echo >&2 slept 
        i=$(( i+1 ))
    done
}
echo >&2 NORMAL
expensive | head -n2
#line 1
#slept
#line 2
#slept

echo >&2 SPED-UP
mkfifo pipe
exec 3<>pipe 
rm pipe
set -m; expensive  >&3 & set +m
<&3 head -n 2
kill -- -$!
#line 1
#slept
#line 2

      

If you run this, the second run should not contain the second line slept

, which means the first link was killed at the time of completion head

, not when the first link tried to output after completion head

.

+1


source







All Articles