Summary: | Incorrect output directory fails silently | ||
---|---|---|---|
Product: | Slurm | Reporter: | dries.boers |
Component: | Scheduling | Assignee: | Jacob Jenson <jacob> |
Status: | RESOLVED INVALID | QA Contact: | |
Severity: | 6 - No support contract | ||
Priority: | --- | CC: | bryank+slurm, kilian, lyeager, mboden |
Version: | - Unsupported Older Versions | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | -Other- | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | UPPMAX | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
dries.boers
2021-07-13 03:31:56 MDT
Reactions via the mailing list: -------- Forwarded Message -------- Subject: Re: [slurm-users] Bug: incorrect output directory fails silently Date: Thu, 8 Jul 2021 10:58:53 -0400 From: Jeffrey T Frey <frey@udel.edu> To: Slurm User Community List <slurm-users@lists.schedmd.com> I would make a feature request of SchedMD to fix the issue, then I would write a cli_filter plugin to validate the --output/--error/--input paths as desired until Slurm itself handles it. -------- Forwarded Message -------- Subject: Re: [slurm-users] Bug: incorrect output directory fails silently Date: Thu, 8 Jul 2021 17:51:42 +0200 From: Marcus Boden <mboden@gwdg.de> To: slurm-users@lists.schedmd.com I already answered tons of tickets due to this, when our users are confused, that the job silently fails. The problem is, you cannot solve this with a job_submit or cli_filter, as you do not know the situation of the file system at job runtime. Or even on the node in the end. At lest the slurmd gives an error, so you could scan the logs for this error and maybe use that to automate something. Best, Marcus -------- Forwarded Message -------- Subject: Re: [slurm-users] Bug: incorrect output directory fails silently Date: Thu, 8 Jul 2021 17:10:28 +0100 From: Killian Murphy <killian.murphy@york.ac.uk> Reply-To: Slurm User Community List <slurm-users@lists.schedmd.com> To: Slurm User Community List <slurm-users@lists.schedmd.com> You can't know the file system state at job runtime, but you can catch the case where the output path can't be resolved at job submission time - I expect this will catch the majority of issues (we also see this come up fairly regularly!). When creating this report, multiple similar reports were pointed out to me: * [Bug 2661 - Missing output file](https://bugs.schedmd.com/show_bug.cgi?id=2661) * [Bug 3508 - sbatch --output expands path options (%) in cwd leading to job failure](https://bugs.schedmd.com/show_bug.cgi?id=3508) * [Bug 8895 - Slurm job output to non-existent directory result into silent job failure](https://bugs.schedmd.com/show_bug.cgi?id=8895) * [Bug 9236 - sbatch fails if --output/--error are specified and submitted under a directory that contains percent "%"](https://bugs.schedmd.com/show_bug.cgi?id=9236) However, I have not found those when doing some online searching, as I mention in my original email. As an outcome of this bug report, I would appreciate if a solution was implemented or an easy workaround was described. I also write this to increase the visibility of this bug. As Marcus Boden mentions, "tons of tickets", I am not the only one to have fallen for this. Especially for less experienced users, this leads to frustration, and that's not good for Slurm. |