By default sbatch submits a script and returns immediately. Currently, there are two simple ways to know when a job has finished: 1. Notify by email. 2. Run a depended job with afterany,afternotok,afterok. #1 is mostly useful when humans run a batch script. #2 could be used by a workflow manager, but has issues. First afterok and afternotok cannot be used in practice, because if a batch finishes successfully, then the afternotok job blocks indefinitely, and vice versa if the batch finishes with an error. afterany works well, but then the command invoked does not know what was the exit status of the batch. There are other ways to identify that a batch has finished. All of them use some sort of polling (poll over the output files of the job and wait for a signal that each job has completed, or poll calling scontrol show job until the job status changes, etc.). All of these approaches are error prone, and impose a heavy burden on the user. But there is a simple way to fix this problem. Introduce an sbatch parameter that instructs sbatch to block and wait until the batch has finished. The exit code of sbatch can be the exit code of the batch script or for job arrays the largest exit code across all jobs.
It was too late to get this into version 15.08, but it will be in the next major release (16.05, May 2016) and this patch will apply cleanly to version 15.08 if you care to use it. https://github.com/SchedMD/slurm/commit/6638cafa62ab93e92eb9623449999d55a160bddc
could I get a batch against 14.11.9 we also need this.
Created attachment 2246 [details] sbatch --wait patch for Slurm v14.08
(In reply to Rodney Mach from comment #2) > could I get a batch against 14.11.9 we also need this. Done, see attachment