Complete batch job before killing through the wall

I am running a batch job using SLURM. The process I start in the job file is iterative. After each iteration, the program can be killed softly by creating a file called stop. I would like such a stop command to be issued automatically one hour before the job is killed through the wall time limit.

+3


source to share


1 answer


You can use Slurm to set a given amount of time before expiration with the option --signal

from the man page sbatch

:

- signal = [B:] [@] When the job is within sig_time seconds of its end time, send it the sig_num signal. Due to enabling event processing by SLURM, the signal can be sent up to 60 seconds earlier than specified. sig_num can be either a number or a signal name (for example, "10" or "USR1"). sig_time must be an integer value between zero and 65535. By default, no signals are sent until jobs expire. If sig_num is specified without any sig_time, the default time will be 60 seconds. Use option "B:" to signal only the batch shell, none of the other processes will be signaled. By default, all work steps will be signaled, but not the batch shell itself.

If you can modify your program to catch this signal to stop rather than search for the file, then this is the best option.



If you cannot add something like

trap  "touch ./stop"  SIGUSR1

      

in your script view. With --signal=B:SIGUSR1@3600

this, this will cause the script to catch the signal SIGUSR1

and create the file stop

one hour before the allocation ends.

Note that only the latest Slurm versions have a parameter B:

in --signal

. If your version doesn't have this, you'll need to set up an observer dog. Examples are here .

+8


source







All Articles