Ticket 6000 - sacct -X on job arrays shows all jobs instead of summary
Summary: sacct -X on job arrays shows all jobs instead of summary
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 18.08.3
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-11-08 01:46 MST by Jurij Pečar
Modified: 2019-01-09 08:36 MST (History)
0 users

See Also:
Site: EMBL
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Jurij Pečar 2018-11-08 01:46:44 MST
Hi,

Up to and including slurm 17.11 sacct -X reported job summary in one line, even for array jobs. We used that to develop some scripts that got executed on job end to tell user how well their jobs did.

Now with 18.08 sacct -X on job array shows all array elements one per line. As a consequence, our script is broken. Also seff from contribs is broken.

Is this intentional change and our script needs to be fixed or is this a bug?
Comment 4 Michael Hinton 2018-11-14 17:30:53 MST
Hi Jurij,

Can you paste the output you are seeing with Slurm 18.08 and 17.11? I have not been able to reproduce this. The pending tasks in the array in both versions don't expand.

E.g. after running `sbatch --array=0-15 --wrap x-seconds.sh` on my machine,

In 18.08:
$ sacct -X
...
6148_[10-15]       wrap hintron-d+       root          1    PENDING      0:0 
6148_0             wrap hintron-d+       root          1  COMPLETED      0:0 
6148_1             wrap hintron-d+       root          1  COMPLETED      0:0 
6148_2             wrap hintron-d+       root          1  COMPLETED      0:0 
6148_3             wrap hintron-d+       root          1  COMPLETED      0:0 
6148_4             wrap hintron-d+       root          1  COMPLETED      0:0 
6148_5             wrap hintron-d+       root          1    RUNNING      0:0 
6148_6             wrap hintron-d+       root          1    RUNNING      0:0 
6148_7             wrap hintron-d+       root          1    RUNNING      0:0 
6148_8             wrap hintron-d+       root          1    RUNNING      0:0 
6148_9             wrap hintron-d+       root          1    RUNNING      0:0 

In 17.11:
$ sacct -X
...
816_[4-15]         wrap      debug       root          1    PENDING      0:0 
816_0              wrap      debug       root          1  COMPLETED      0:0 
816_1              wrap      debug       root          1  COMPLETED      0:0 
816_2              wrap      debug       root          1    RUNNING      0:0 
816_3              wrap      debug       root          1    RUNNING      0:0 

The batch steps are omitted, but no array expansion takes place anywhere. 

-X omits step info, but it doesn't mention anything about summarizing job arrays in the documentation.

Thanks,
-Michael
Comment 5 Jurij Pečar 2018-11-15 01:48:01 MST
Our script does sacct --jobs=$jobid --allocations --parsable2 --format JobName,NodeList,Partition,User,Account,ExitCode,Timelimit,NCPUS,NNodes,ReqMem,Submit,Start,End,Elapsed,State,Comment

And in Slurm 17.11 this returned one line if jobid was an array job. Now in 18 it returns a line for each array element.

Based on your comment can I assume that this was a bug in 17 that is now fixed in 18? :)

Still, for the purpose of reporting to the user how well their jobs did, I would like to have an ability to summarize whole job array as one job in terms of cpus and memory allocated and used.

If you check contribs, seff script there also depended on this functionality and is now broken. I see various weird behaviours from it, varying from "Badly formatted array jobid <array job> with task_id = -2" to reporting 0 for cpu or memory usage. So something definitely changed from 17 to 18 in how accounting data is presented.
Comment 6 Michael Hinton 2018-12-19 16:57:49 MST
(In reply to Jurij Pečar from comment #5)
Hi Jurij, sorry for the delayed response.

> Our script does sacct --jobs=$jobid --allocations --parsable2 --format
> JobName,NodeList,Partition,User,Account,ExitCode,Timelimit,NCPUS,NNodes,
> ReqMem,Submit,Start,End,Elapsed,State,Comment
Ok. I ran this command and got the same results as my previous comment.
 
> And in Slurm 17.11 this returned one line if jobid was an array job. Now in
> 18 it returns a line for each array element.
In both 17.11 and 18.08 (and probably before), this is only true if the array job is pending. Once any job in the array starts running, that job will get its own job ID.

The way job arrays are implemented (since pre-17.02) is that the array job is a ‘meta’ job or placeholder for all the jobs in the array. Once a job is started, it leaves the meta job and becomes its own ‘real’ job with a separate job ID.

To see this in action, simply add “JobId,JobIdRaw” to --format in your sacct command. You’ll see that they all have different job ids.

If you still aren’t convinced, then would you mind showing me the output to prove that 17.11 returned one line for a running array job, and that it wasn’t pending?

In squeue, there is an “--array-unique” option to show one unique pending job array element per line. Maybe you are thinking of that?

> Based on your comment can I assume that this was a bug in 17 that is now
> fixed in 18? :)
No, sorry; I was trying to show that in both 17.11 and 18.08 the job arrays break out into individual jobs when they are running.

> Still, for the purpose of reporting to the user how well their jobs did, I
> would like to have an ability to summarize whole job array as one job in
> terms of cpus and memory allocated and used.
I don’t think Slurm can do that natively.

> If you check contribs, seff script there also depended on this functionality
> and is now broken. I see various weird behaviours from it, varying from
> "Badly formatted array jobid <array job> with task_id = -2" to reporting 0
> for cpu or memory usage. So something definitely changed from 17 to 18 in
> how accounting data is presented.
We don't officially support anything in contribs/, including the seff script. However, looking at lines 110-120, I don't see anything wrong. It's iterating over jobs, not job arrays. Each job optionally can be part of a job array, and if so, it will have an array_job_id value. You’ll have to show me the incorrect output.

Let me know if that answers your questions.
Thanks,
-Michael
Comment 7 Jurij Pečar 2018-12-20 02:20:50 MST
Ok, thanks for input. In January I'll do our review & stats for 2018 and then I'll take a closer look at what exactly is going on. Will come back on this in  a month or so.
Comment 8 Michael Hinton 2018-12-20 11:19:33 MST
Great. Please feel free to open this back up when you do.

Thanks,
Michael
Comment 9 Jurij Pečar 2019-01-07 01:58:18 MST
Just one example of what annoys me:

$ seff -d 24595250                                                                                                                                              
Slurm data: JobID ArrayJobID User Group State Clustername Ncpus Nnodes Ntasks Reqmem PerNode Cput Walltime Mem ExitStatus
Slurm data: 24595250  pecar users COMPLETED hof 4 1 0 8388608 0 34925 659 0 0

Job ID: 24595250
Cluster: hof
User/Group: pecar/users
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 4
CPU Utilized: 09:42:05
CPU Efficiency: 1324.92% of 00:43:56 core-walltime
Job Wall-clock time: 00:10:59
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 8.00 GB (2.00 GB/core)

This isn't even an array job, just a regular job that compiled some software. Notice the discrepancy between wall-clock time and cpu time, which results in impossible cpu efficiency numbers.

This started happening with upgrade to 18.08.

Also the memory numbers in the slurm db went to 0 with that upgrade.

With the wallclock time of 659 seconds maximum possible cpu time on 4 cores would be 2636 secons. I've no clue how slurm came to 34925 seconds ... but it could be that the number is multiplied by number of cores a few times too many? 4*4*4*659 is the first larger number than 34925 and dividing these two results in 82.8% efficiency which is what I would expect from a compilation job.


We use JobAcctGatherType=jobacct_gather/cgroup.
Comment 10 Jurij Pečar 2019-01-09 05:34:11 MST
I see now there's another bug #6095 describing the same issue, as well as a thread on mailing list. So it's not just us observing this problem. Since it's impacting our statistics, I'd like to have two things:
* workaround (switching to jobacct_gather/linux ?)
* understanding how this came to be so we can identify and fix wrong values in the database

Thanks.
Comment 11 Michael Hinton 2019-01-09 08:36:09 MST
Hey Jurij,

That does seem like a problem, and I'll look into it. But could you please file a new bug? This has turned into a completely different issue than the one originally reported.

If the original issue has not been resolved, feel free to reopen this.

Thanks,
Michael