Ticket 2489 - Modify how sacct shows a COMPLETED job if any of its steps didn't complete successfully
Summary: Modify how sacct shows a COMPLETED job if any of its steps didn't complete su...
Status: RESOLVED DUPLICATE of ticket 3214
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 16.05.x
Hardware: Linux Linux
: --- 5 - Enhancement
Assignee: Alejandro Sanchez
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-02-25 19:51 MST by Alejandro Sanchez
Modified: 2016-11-02 09:05 MDT (History)
0 users

See Also:
Site: SchedMD
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Alejandro Sanchez 2016-02-25 19:51:23 MST
This comes from Bug #2429. 

Problem was a job with one step. The step finished in a CANCELLED state due to exceeded memory and job finished in a COMPLETED state.

When querying off the information about the job with sacct -j <jobid>, the output indicates the job was COMPLETED, which gives the impression to the user that "all went fine" with the job, despite its step was CANCELLED due to exceeded memory.

It's true that if you query off sacct -j <jobid>.<stepid> it shows CANCELLED, but we should somehow indicate when sacct -j <jobid> if all their steps finished COMPLETED or if any of them failed.

Proposals:

a) When querying off sacct -j <jobid>, only flag state COMPLETED if job was completed and all its steps also finished in a COMPLETED state.

b) Add an extra field to sacct -j <jobid> indicating whether any of its steps failed or not.

c) Open for more proposals...