From mailing list: -------- Forwarded Message -------- Subject: Bug: incorrect output directory fails silently Date: Thu, 8 Jul 2021 16:51:43 +0200 From: Dries Boers <dries.boers@slu.se> To: slurm-users@lists.schedmd.com Dear Slurm users, I wanted to discuss a bug with you, which has troubled me several times. I could not find a discussion about it through Google nor DDG, although it is mentioned [here](https://github.com/snakemake/snakemake/issues/134#issuecomment-561185825) in a Snakemake Issue. When scheduling the following file through sbatch: ``` #SBATCH --output=dir/slurm-out.log echo "Hello world" ``` If `dir/` does not exist, Slurm fails silently after queuing. 'Silently' is bold there, because this is unexpected behaviour to me. I understand that there is no output file to write an error message to, but it might be good to check the `--output` path during the scheduling, just like `--account` is checked. Does anybody know a workaround to be warned about the error? With kind regards, Dries Boers PhD candidate at Swedish University of Agricultural Sciences
Reactions via the mailing list: -------- Forwarded Message -------- Subject: Re: [slurm-users] Bug: incorrect output directory fails silently Date: Thu, 8 Jul 2021 10:58:53 -0400 From: Jeffrey T Frey <frey@udel.edu> To: Slurm User Community List <slurm-users@lists.schedmd.com> I would make a feature request of SchedMD to fix the issue, then I would write a cli_filter plugin to validate the --output/--error/--input paths as desired until Slurm itself handles it. -------- Forwarded Message -------- Subject: Re: [slurm-users] Bug: incorrect output directory fails silently Date: Thu, 8 Jul 2021 17:51:42 +0200 From: Marcus Boden <mboden@gwdg.de> To: slurm-users@lists.schedmd.com I already answered tons of tickets due to this, when our users are confused, that the job silently fails. The problem is, you cannot solve this with a job_submit or cli_filter, as you do not know the situation of the file system at job runtime. Or even on the node in the end. At lest the slurmd gives an error, so you could scan the logs for this error and maybe use that to automate something. Best, Marcus -------- Forwarded Message -------- Subject: Re: [slurm-users] Bug: incorrect output directory fails silently Date: Thu, 8 Jul 2021 17:10:28 +0100 From: Killian Murphy <killian.murphy@york.ac.uk> Reply-To: Slurm User Community List <slurm-users@lists.schedmd.com> To: Slurm User Community List <slurm-users@lists.schedmd.com> You can't know the file system state at job runtime, but you can catch the case where the output path can't be resolved at job submission time - I expect this will catch the majority of issues (we also see this come up fairly regularly!).
When creating this report, multiple similar reports were pointed out to me: * [Bug 2661 - Missing output file](https://bugs.schedmd.com/show_bug.cgi?id=2661) * [Bug 3508 - sbatch --output expands path options (%) in cwd leading to job failure](https://bugs.schedmd.com/show_bug.cgi?id=3508) * [Bug 8895 - Slurm job output to non-existent directory result into silent job failure](https://bugs.schedmd.com/show_bug.cgi?id=8895) * [Bug 9236 - sbatch fails if --output/--error are specified and submitted under a directory that contains percent "%"](https://bugs.schedmd.com/show_bug.cgi?id=9236) However, I have not found those when doing some online searching, as I mention in my original email.
As an outcome of this bug report, I would appreciate if a solution was implemented or an easy workaround was described. I also write this to increase the visibility of this bug. As Marcus Boden mentions, "tons of tickets", I am not the only one to have fallen for this. Especially for less experienced users, this leads to frustration, and that's not good for Slurm.