We are seeing sacct report CPU usage that is far to high given jobs allocated CPU's and runtime: sacct -j 6628825 --format=JobID%20,MaxRSS,Elapsed,NCPUS%2,SystemCPU,UserCPU%15,TotalCPU%15 JobID MaxRSS Elapsed NC SystemCPU UserCPU TotalCPU -------------------- ---------- ---------- -- ---------- --------------- --------------- 6628825 01:25:52 8 11:22:54 30-12:03:29 30-23:26:23 6628825.batch 5958536K 01:25:52 8 11:22:54 30-12:03:29 30-23:26:23 6628825.extern 0 01:25:52 8 00:00.001 00:00.001 00:00.002 We are using JobAcctGatherType=jobacct_gather/cgroup. I see there is a similar issue reported in https://bugs.schedmd.com/show_bug.cgi?id=6332 where the recommendation is made to switch to using JobAcctGatherType=jobacct_gather/linux or to set JobAcctGatherFrequency=task=0. We prefer to continue using JobAcctGatherType=jobacct_gather/cgroup. We will try setting JobAcctGatherFrequency=task=0 to see if it solves this issue for us.
This is definitely a duplicate of bug 6332. I'll mark it as such. Feel free to CC yourself on that bug and comment on it. Also, we fixed several other problems with jobacct_gather/cgroup in commit 5847bd71d0b, which is in 18.08.4, so I also advise upgrading to 18.08.4 if you want to keep using jobacct_gather/cgroup. https://github.com/schedmd/slurm/commit/5847bd71d0b *** This ticket has been marked as a duplicate of ticket 6332 ***
Hey Steve, We just committed a patch that will fix this. See bug 6332. Thanks, Michael