Call shell commands from a command list until all commands have completed
I have a list of shell commands that I would name. Up to four processes must be running at the same time.
My main idea would be to send commands to the shell until 4 commands are active. The script then checks the process count of all processes by looking at the total line, e.g. "nohup scrapy crawl urlMonitor".
Once the number of processes drops below 4, the next command will be sent to the shell until the entire command has finished executing.
Is there a way to do this using a shell script? I'm guessing this will involve some kind of infinite loop and also an interrupt condition as well as a method to check for active processes. Unfortunately I am not that good at shell scripting, so maybe someone can point me in the right direction?
nohup scrapy crawl urlMonitor -a slice=0 &
nohup scrapy crawl urlMonitor -a slice=1 &
nohup scrapy crawl urlMonitor -a slice=2 &
nohup scrapy crawl urlMonitor -a slice=3 &
nohup scrapy crawl urlMonitor -a slice=4 &
nohup scrapy crawl urlMonitor -a slice=5 &
nohup scrapy crawl urlMonitor -a slice=6 &
nohup scrapy crawl urlMonitor -a slice=7 &
nohup scrapy crawl urlMonitor -a slice=8 &
nohup scrapy crawl urlMonitor -a slice=9 &
nohup scrapy crawl urlMonitor -a slice=10 &
nohup scrapy crawl urlMonitor -a slice=11 &
nohup scrapy crawl urlMonitor -a slice=12 &
nohup scrapy crawl urlMonitor -a slice=13 &
nohup scrapy crawl urlMonitor -a slice=14 &
nohup scrapy crawl urlMonitor -a slice=15 &
nohup scrapy crawl urlMonitor -a slice=16 &
nohup scrapy crawl urlMonitor -a slice=17 &
nohup scrapy crawl urlMonitor -a slice=18 &
nohup scrapy crawl urlMonitor -a slice=19 &
nohup scrapy crawl urlMonitor -a slice=20 &
nohup scrapy crawl urlMonitor -a slice=21 &
nohup scrapy crawl urlMonitor -a slice=22 &
nohup scrapy crawl urlMonitor -a slice=23 &
nohup scrapy crawl urlMonitor -a slice=24 &
nohup scrapy crawl urlMonitor -a slice=25 &
nohup scrapy crawl urlMonitor -a slice=26 &
nohup scrapy crawl urlMonitor -a slice=27 &
nohup scrapy crawl urlMonitor -a slice=28 &
nohup scrapy crawl urlMonitor -a slice=29 &
nohup scrapy crawl urlMonitor -a slice=30 &
nohup scrapy crawl urlMonitor -a slice=31 &
nohup scrapy crawl urlMonitor -a slice=32 &
nohup scrapy crawl urlMonitor -a slice=33 &
nohup scrapy crawl urlMonitor -a slice=34 &
nohup scrapy crawl urlMonitor -a slice=35 &
nohup scrapy crawl urlMonitor -a slice=36 &
nohup scrapy crawl urlMonitor -a slice=37 &
nohup scrapy crawl urlMonitor -a slice=38 &
source to share
Here is a general method that always ensures that there are less than 4 jobs before any other jobs are started (but there can be more than 4 jobs at the same time if a row starts multiple jobs at once):
#!/bin/bash
max_nb_jobs=4
commands_file=$1
while IFS= read -r line; do
while :; do
mapfile -t jobs < <(jobs -pr)
((${#jobs[@]}<max_nb_jobs)) && break
wait -n
done
eval "$line"
done < "$commands_file"
wait
Use this script with your file as first argument.
How it works? for each line, line
read, first make sure to run less max_nb_jobs
by counting the number of jobs being run (obtained from jobs -pr
). If there is more than max_nb_jobs
, we wait for the next job to complete ( wait -n
) and count the number of jobs in progress again. If less is satisfied max_nb_jobs
, we eval
string line
.
Update
Here's a similar script that doesn't use wait -n
. Everything seems to work well (tested on Debian with Bash 4.2):
#!/bin/bash
set -m
max_nb_jobs=4
file_list=$1
sleep_jobs() {
# This function sleeps until there are less than $1 jobs running
# Make sure that you have set -m before using this function!
local n=$1 jobs
while mapfile -t jobs < <(jobs -pr) && ((${#jobs[@]}>=n)); do
coproc read
trap "echo >&${COPROC[1]}; trap '' SIGCHLD" SIGCHLD
wait $COPROC_PID
done
}
while IFS= read -r line; do
sleep_jobs $max_nb_jobs
eval "$line"
done < "$file_list"
wait
source to share
If you need 4 times in a row, try something like:
max_procs=4
active_procs=0
for proc_num in {0..38}; do
nohup your_cmd_here &
# If we have more than max procs running, wait for one to finish
if ((active_procs++ >= max_procs)); then
wait -n
((active_procs--))
fi
done
# Wait for all remaining procs to finish
wait
This is a variation on sputnick's answer that works up to max_procs
concurrently. As soon as one ends, it starts the next. The command is wait -n
waiting for the next process to complete, not waiting for them to complete.
source to share
Try this:
for i in {0..38}; do
nohup scrapy crawl urlMonitor -a slice=$i & _pid=$!
((++i%4==0)) && wait $_pid
done
help wait
:
wait: wait [-n] [id ...]
Wait for job completion and return exit status.
Waits for each process identified by an ID, which may be a process ID or a
job specification, and reports its termination status. If ID is not
given, waits for all currently active child processes, and the return
status is zero. If ID is a a job specification, waits for all processes
in that job pipeline.
If the -n option is supplied, waits for the next job to terminate and
returns its exit status.
Exit Status:
Returns the status of the last ID; fails if ID is invalid or an invalid
option is given.
source to share
You can do this easily with GNU parallel or even just xargs . To wit:
declare -i i=0
while sleep 1; do
printf 'slice=%d\n' $((i++))
done | xargs -n1 -P3 nohup scrapy crawl urlMonitor -a
The loop while
will run forever; if there is a real hard limit that you know about, you can just do a loop for
like:
for i in {0..100}…
It is also sleep 1
useful because it allows for more efficient handling of shell signals.
source to share