Ticket 1685 - Add an sbatch option to block until the batch has finished executing
Summary: Add an sbatch option to block until the batch has finished executing
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 14.11.0
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-05-21 07:42 MDT by Marios Hadjieleftheriou
Modified: 2015-09-24 09:07 MDT (History)
3 users (show)

See Also:
Site: Lion Cave Capital
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 16.05.0-pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
sbatch --wait patch for Slurm v14.08 (7.83 KB, patch)
2015-09-24 09:06 MDT, Moe Jette
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Marios Hadjieleftheriou 2015-05-21 07:42:25 MDT
By default sbatch submits a script and returns immediately.

Currently, there are two simple ways to know when a job has finished:
1. Notify by email.
2. Run a depended job with afterany,afternotok,afterok.

#1 is mostly useful when humans run a batch script.

#2 could be used by a workflow manager, but has issues. First afterok and afternotok cannot be used in practice, because if a batch finishes successfully, then the afternotok job blocks indefinitely, and vice versa if the batch finishes with an error. afterany works well, but then the command invoked does not know what was the exit status of the batch.

There are other ways to identify that a batch has finished. All of them use some sort of polling (poll over the output files of the job and wait for a signal that each job has completed, or poll calling scontrol show job until the job status changes, etc.). All of these approaches are error prone, and impose a heavy burden on the user.

But there is a simple way to fix this problem. Introduce an sbatch parameter that instructs sbatch to block and wait until the batch has finished. The exit code of sbatch can be the exit code of the batch script or for job arrays the largest exit code across all jobs.
Comment 1 Moe Jette 2015-09-04 09:51:18 MDT
It was too late to get this into version 15.08, but it will be in the next major release (16.05, May 2016) and this patch will apply cleanly to version 15.08 if you care to use it.
https://github.com/SchedMD/slurm/commit/6638cafa62ab93e92eb9623449999d55a160bddc
Comment 2 Rodney Mach 2015-09-24 06:28:57 MDT
could I get a batch against 14.11.9 we also need this.
Comment 3 Moe Jette 2015-09-24 09:06:37 MDT
Created attachment 2246 [details]
sbatch --wait patch for Slurm v14.08
Comment 4 Moe Jette 2015-09-24 09:07:14 MDT
(In reply to Rodney Mach from comment #2)
> could I get a batch against 14.11.9 we also need this.

Done, see attachment