Hi, we're seeing a lot of these messages in the slurmctld logs: Nov 21 16:25:14 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: Assoc 11448(iscrb_comred/lmonacel/(null)) accrue_cnt underflow Nov 21 16:25:14 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: Assoc 10966(iscrb_comred/(null)/(null)) accrue_cnt underflow Nov 21 16:25:14 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: Assoc 204(iscrb/(null)/(null)) accrue_cnt underflow Nov 21 16:25:14 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: Assoc 1(root/(null)/(null)) accrue_cnt underflow Nov 21 16:25:15 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: QOS knl_usr_prod accrue_cnt underflow Nov 21 16:25:15 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: QOS knl_usr_prod acct iscrb_comred accrue_cnt underflow Nov 21 16:25:15 r000u17l01 slurmctld[21994]: error: acct_policy_handle_accrue_time: QOS knl_usr_prod user 30146 accrue_cnt underflow They seem to be harmless, although we'd like to understand if there's some wrong association as they appear only for some specific accounts which doesn't seem to be different from the others: [root@mgmt01 ~]# sacctmgr show associations cluster=marconi 1 204 10966 11448 format=cluster,account%20,share,QOS%50 Cluster Account Share QOS ---------- -------------------- --------- -------------------------------------------------- marconi root 0 normal marconi iscrb 4734664 normal marconi iscrb_comred 20833 knl_qos_bprod,knl_qos_prod,normal,qos_rcm marconi iscrb_comred parent knl_qos_bprod,knl_qos_prod,normal,qos_rcm Do you have any suggestions? Best Regards, Marcello
Hi Marcello, This is a duplicate of bug 6016. We have already a fix and we plan to commit it next week for 18.08.4. This is happening when you have job arrays running and the controller is reconfigured or restarted, then when the job array finishes or is cancelled the error may happen. It is not anything directly related to an association, but a difference which make this error to happen/not happen is if you have an accrue limit set. The consequences are just that the accrue is not working correctly. I am closing this bug as a duplicate, please, keep track the status in bug 6016. Felip *** This ticket has been marked as a duplicate of ticket 6016 ***