Call shell commands from a command list until all commands have completed

I have a list of shell commands that I would name. Up to four processes must be running at the same time.

My main idea would be to send commands to the shell until 4 commands are active. The script then checks the process count of all processes by looking at the total line, e.g. "nohup scrapy crawl urlMonitor".

Once the number of processes drops below 4, the next command will be sent to the shell until the entire command has finished executing.

Is there a way to do this using a shell script? I'm guessing this will involve some kind of infinite loop and also an interrupt condition as well as a method to check for active processes. Unfortunately I am not that good at shell scripting, so maybe someone can point me in the right direction?

nohup scrapy crawl urlMonitor -a slice=0 &
nohup scrapy crawl urlMonitor -a slice=1 &
nohup scrapy crawl urlMonitor -a slice=2 &
nohup scrapy crawl urlMonitor -a slice=3 &
nohup scrapy crawl urlMonitor -a slice=4 &
nohup scrapy crawl urlMonitor -a slice=5 &
nohup scrapy crawl urlMonitor -a slice=6 &
nohup scrapy crawl urlMonitor -a slice=7 &
nohup scrapy crawl urlMonitor -a slice=8 &
nohup scrapy crawl urlMonitor -a slice=9 &
nohup scrapy crawl urlMonitor -a slice=10 &
nohup scrapy crawl urlMonitor -a slice=11 &
nohup scrapy crawl urlMonitor -a slice=12 &
nohup scrapy crawl urlMonitor -a slice=13 &
nohup scrapy crawl urlMonitor -a slice=14 &
nohup scrapy crawl urlMonitor -a slice=15 &
nohup scrapy crawl urlMonitor -a slice=16 &
nohup scrapy crawl urlMonitor -a slice=17 &
nohup scrapy crawl urlMonitor -a slice=18 &
nohup scrapy crawl urlMonitor -a slice=19 &
nohup scrapy crawl urlMonitor -a slice=20 &
nohup scrapy crawl urlMonitor -a slice=21 &
nohup scrapy crawl urlMonitor -a slice=22 &
nohup scrapy crawl urlMonitor -a slice=23 &
nohup scrapy crawl urlMonitor -a slice=24 &
nohup scrapy crawl urlMonitor -a slice=25 &
nohup scrapy crawl urlMonitor -a slice=26 &
nohup scrapy crawl urlMonitor -a slice=27 &
nohup scrapy crawl urlMonitor -a slice=28 &
nohup scrapy crawl urlMonitor -a slice=29 &
nohup scrapy crawl urlMonitor -a slice=30 &
nohup scrapy crawl urlMonitor -a slice=31 &
nohup scrapy crawl urlMonitor -a slice=32 &
nohup scrapy crawl urlMonitor -a slice=33 &
nohup scrapy crawl urlMonitor -a slice=34 &
nohup scrapy crawl urlMonitor -a slice=35 &
nohup scrapy crawl urlMonitor -a slice=36 &
nohup scrapy crawl urlMonitor -a slice=37 &
nohup scrapy crawl urlMonitor -a slice=38 &

      

+3


source to share


4 answers


Here is a general method that always ensures that there are less than 4 jobs before any other jobs are started (but there can be more than 4 jobs at the same time if a row starts multiple jobs at once):

#!/bin/bash

max_nb_jobs=4
commands_file=$1

while IFS= read -r line; do
   while :; do
      mapfile -t jobs < <(jobs -pr)
      ((${#jobs[@]}<max_nb_jobs)) && break
      wait -n
   done
   eval "$line"
done < "$commands_file"

wait

      

Use this script with your file as first argument.

How it works? for each line, line

read, first make sure to run less max_nb_jobs

by counting the number of jobs being run (obtained from jobs -pr

). If there is more than max_nb_jobs

, we wait for the next job to complete ( wait -n

) and count the number of jobs in progress again. If less is satisfied max_nb_jobs

, we eval

string line

.




Update

Here's a similar script that doesn't use wait -n

. Everything seems to work well (tested on Debian with Bash 4.2):

#!/bin/bash

set -m

max_nb_jobs=4
file_list=$1

sleep_jobs() {
   # This function sleeps until there are less than $1 jobs running
   # Make sure that you have set -m before using this function!
   local n=$1 jobs
   while mapfile -t jobs < <(jobs -pr) && ((${#jobs[@]}>=n)); do
      coproc read
      trap "echo >&${COPROC[1]}; trap '' SIGCHLD" SIGCHLD
      wait $COPROC_PID
   done
}

while IFS= read -r line; do
   sleep_jobs $max_nb_jobs
   eval "$line"
done < "$file_list"

wait

      

0


source


If you need 4 times in a row, try something like:

max_procs=4
active_procs=0

for proc_num in {0..38}; do
    nohup your_cmd_here &

    # If we have more than max procs running, wait for one to finish
    if ((active_procs++ >= max_procs)); then
        wait -n
        ((active_procs--))
    fi
done

# Wait for all remaining procs to finish
wait

      



This is a variation on sputnick's answer that works up to max_procs

concurrently. As soon as one ends, it starts the next. The command is wait -n

waiting for the next process to complete, not waiting for them to complete.

+1


source


Try this:

for i in {0..38}; do
    nohup scrapy crawl urlMonitor -a slice=$i & _pid=$!
    ((++i%4==0)) && wait $_pid
done

      

help wait

:

wait: wait [-n] [id ...]
Wait for job completion and return exit status.

Waits for each process identified by an ID, which may be a process ID or a
job specification, and reports its termination status.  If ID is not
given, waits for all currently active child processes, and the return
status is zero.  If ID is a a job specification, waits for all processes
in that job pipeline.

If the -n option is supplied, waits for the next job to terminate and
returns its exit status.

Exit Status:
Returns the status of the last ID; fails if ID is invalid or an invalid
option is given.

      

0


source


You can do this easily with GNU parallel or even just xargs . To wit:

declare -i i=0
while sleep 1; do
    printf 'slice=%d\n' $((i++))
done | xargs -n1 -P3 nohup scrapy crawl urlMonitor -a

      

The loop while

will run forever; if there is a real hard limit that you know about, you can just do a loop for

like:

for i in {0..100}…

      

It is also sleep 1

useful because it allows for more efficient handling of shell signals.

0


source







All Articles