Ticket 12017 - Incorrect output directory fails silently
Summary: Incorrect output directory fails silently
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: - Unsupported Older Versions
Hardware: Linux Linux
: --- 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-07-13 03:31 MDT by dries.boers
Modified: 2021-08-25 08:09 MDT (History)
4 users (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: UPPMAX
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description dries.boers 2021-07-13 03:31:56 MDT
From mailing list:

-------- Forwarded Message --------
Subject: 	Bug: incorrect output directory fails silently
Date: 	Thu, 8 Jul 2021 16:51:43 +0200
From: 	Dries Boers <dries.boers@slu.se>
To: 	slurm-users@lists.schedmd.com


Dear Slurm users,

I wanted to discuss a bug with you, which has troubled me several times. I could not find a discussion about it through Google nor DDG, although it is mentioned [here](https://github.com/snakemake/snakemake/issues/134#issuecomment-561185825) in a Snakemake Issue.


When scheduling the following file through sbatch:

```
#SBATCH --output=dir/slurm-out.log

echo "Hello world"
```

If `dir/` does not exist, Slurm fails silently after queuing. 'Silently' is bold there, because this is unexpected behaviour to me.

I understand that there is no output file to write an error message to, but it might be good to check the `--output` path during the scheduling, just like `--account` is checked.

Does anybody know a workaround to be warned about the error?


With kind regards,
Dries Boers
PhD candidate at Swedish University of Agricultural Sciences
Comment 1 dries.boers 2021-07-13 03:37:40 MDT
Reactions via the mailing list:

-------- Forwarded Message --------
Subject: 	Re: [slurm-users] Bug: incorrect output directory fails silently
Date: 	Thu, 8 Jul 2021 10:58:53 -0400
From: 	Jeffrey T Frey <frey@udel.edu>
To: 	Slurm User Community List <slurm-users@lists.schedmd.com>

I would make a feature request of SchedMD to fix the issue, then I would write a cli_filter plugin to validate the --output/--error/--input paths as desired until Slurm itself handles it.



-------- Forwarded Message --------
Subject: 	Re: [slurm-users] Bug: incorrect output directory fails silently
Date: 	Thu, 8 Jul 2021 17:51:42 +0200
From: 	Marcus Boden <mboden@gwdg.de>
To: 	slurm-users@lists.schedmd.com

I already answered tons of tickets due to this, when our users are confused, that the job silently fails.
The problem is, you cannot solve this with a job_submit or cli_filter, as you do not know the situation of the file system at job runtime. Or even on the node in the end.

At lest the slurmd gives an error, so you could scan the logs for this error and maybe use that to automate something.

Best,
Marcus



-------- Forwarded Message --------
Subject: 	Re: [slurm-users] Bug: incorrect output directory fails silently
Date: 	Thu, 8 Jul 2021 17:10:28 +0100
From: 	Killian Murphy <killian.murphy@york.ac.uk>
Reply-To: 	Slurm User Community List <slurm-users@lists.schedmd.com>
To: 	Slurm User Community List <slurm-users@lists.schedmd.com>

You can't know the file system state at job runtime, but you can catch the case where the output path can't be resolved at job submission time - I expect this will catch the majority of issues (we also see this come up fairly regularly!).
Comment 2 dries.boers 2021-07-13 03:49:08 MDT
When creating this report, multiple similar reports were pointed out to me:

* [Bug 2661 - Missing output file](https://bugs.schedmd.com/show_bug.cgi?id=2661) * [Bug 3508 - sbatch --output expands path options (%) in cwd leading to job failure](https://bugs.schedmd.com/show_bug.cgi?id=3508)
* [Bug 8895 - Slurm job output to non-existent directory result into silent job failure](https://bugs.schedmd.com/show_bug.cgi?id=8895)
* [Bug 9236 - sbatch fails if --output/--error are specified and submitted under a directory that contains percent "%"](https://bugs.schedmd.com/show_bug.cgi?id=9236)

However, I have not found those when doing some online searching, as I mention in my original email.
Comment 3 dries.boers 2021-07-13 04:02:25 MDT
As an outcome of this bug report, I would appreciate if a solution was implemented or an easy workaround was described. I also write this to increase the visibility of this bug. As Marcus Boden mentions, "tons of tickets", I am not the only one to have fallen for this. Especially for less experienced users, this leads to frustration, and that's not good for Slurm.