Ticket 6344 - Incorrect accounting reported by sacct
Summary: Incorrect accounting reported by sacct
Status: RESOLVED DUPLICATE of ticket 6332
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 18.08.1
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-01-11 07:58 MST by Steve Ford
Modified: 2019-01-15 14:39 MST (History)
0 users

See Also:
Site: MSU
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Steve Ford 2019-01-11 07:58:19 MST
We are seeing sacct report CPU usage that is far to high given jobs allocated CPU's and runtime:

sacct -j 6628825 --format=JobID%20,MaxRSS,Elapsed,NCPUS%2,SystemCPU,UserCPU%15,TotalCPU%15
               JobID     MaxRSS    Elapsed NC  SystemCPU         UserCPU        TotalCPU 
-------------------- ---------- ---------- -- ---------- --------------- --------------- 
             6628825              01:25:52  8   11:22:54     30-12:03:29     30-23:26:23 
       6628825.batch   5958536K   01:25:52  8   11:22:54     30-12:03:29     30-23:26:23 
      6628825.extern          0   01:25:52  8  00:00.001       00:00.001       00:00.002

We are using JobAcctGatherType=jobacct_gather/cgroup.

I see there is a similar issue reported in https://bugs.schedmd.com/show_bug.cgi?id=6332 where the recommendation is made to switch to using JobAcctGatherType=jobacct_gather/linux or to set JobAcctGatherFrequency=task=0. We prefer to continue using JobAcctGatherType=jobacct_gather/cgroup.

We will try setting JobAcctGatherFrequency=task=0 to see if it solves this issue for us.
Comment 1 Marshall Garey 2019-01-11 09:13:52 MST
This is definitely a duplicate of bug 6332. I'll mark it as such. Feel free to CC yourself on that bug and comment on it.

Also, we fixed several other problems with jobacct_gather/cgroup in commit 5847bd71d0b, which is in 18.08.4, so I also advise upgrading to 18.08.4 if you want to keep using jobacct_gather/cgroup.

https://github.com/schedmd/slurm/commit/5847bd71d0b

*** This ticket has been marked as a duplicate of ticket 6332 ***
Comment 2 Michael Hinton 2019-01-15 14:39:49 MST
Hey Steve,

We just committed a patch that will fix this. See bug 6332.

Thanks,
Michael