Ticket 6072 - error: acct_policy_handle_accrue_time: for specific account
Summary: error: acct_policy_handle_accrue_time: for specific account
Status: RESOLVED DUPLICATE of ticket 6016
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 18.08.3
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-11-21 09:01 MST by Cineca HPC Systems
Modified: 2018-11-22 02:43 MST (History)
0 users

See Also:
Site: Cineca
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Cineca HPC Systems 2018-11-21 09:01:13 MST
Hi,

we're seeing a lot of these messages in the slurmctld logs:

Nov 21 16:25:14 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: Assoc 11448(iscrb_comred/lmonacel/(null)) accrue_cnt underflow
Nov 21 16:25:14 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: Assoc 10966(iscrb_comred/(null)/(null)) accrue_cnt underflow
Nov 21 16:25:14 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: Assoc 204(iscrb/(null)/(null)) accrue_cnt underflow
Nov 21 16:25:14 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: Assoc 1(root/(null)/(null)) accrue_cnt underflow
Nov 21 16:25:15 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: QOS knl_usr_prod accrue_cnt underflow
Nov 21 16:25:15 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: QOS knl_usr_prod acct iscrb_comred accrue_cnt underflow
Nov 21 16:25:15 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: QOS knl_usr_prod user 30146 accrue_cnt underflow

They seem to be harmless, although we'd like to understand if there's some wrong association as they appear only for some specific accounts which doesn't seem to be different from the others:

[root@mgmt01 ~]# sacctmgr show associations cluster=marconi 1 204 10966 11448 format=cluster,account%20,share,QOS%50 
   Cluster              Account     Share                                                QOS 
---------- -------------------- --------- -------------------------------------------------- 
   marconi                 root         0                                             normal 
   marconi                iscrb   4734664                                             normal 
   marconi         iscrb_comred     20833          knl_qos_bprod,knl_qos_prod,normal,qos_rcm 
   marconi         iscrb_comred    parent          knl_qos_bprod,knl_qos_prod,normal,qos_rcm 


Do you have any suggestions?

Best Regards,
Marcello
Comment 2 Felip Moll 2018-11-22 02:43:06 MST
Hi Marcello,

This is a duplicate of bug 6016.

We have already a fix and we plan to commit it next week for 18.08.4.

This is happening when you have job arrays running and the controller is
reconfigured or restarted, then when the job array finishes or is cancelled
the error may happen.

It is not anything directly related to an association, but a difference
which make this error to happen/not happen is if you have an accrue limit
set.

The consequences are just that the accrue is not working correctly.

I am closing this bug as a duplicate, please, keep track the status in bug 6016.

Felip

*** This ticket has been marked as a duplicate of ticket 6016 ***