Ticket 2781 - Can JobAcctGatherType be safely changed when running jobs in production?
Summary: Can JobAcctGatherType be safely changed when running jobs in production?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 15.08.8
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Alejandro Sanchez
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-05-30 13:05 MDT by Chris Samuel
Modified: 2016-05-30 19:21 MDT (History)
1 user (show)

See Also:
Site: VLSCI
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Chris Samuel 2016-05-30 13:05:34 MDT
Hi there,

Trying to catch up on a few things here, Danny recommended we switch JobAcctGatherType from jobacct_gather/cgroup to jobacct_gather/linux.

Is that something that can be safely updated whilst Slurm is running or will the slurmd's be unhappy to swap how they do things on the fly?

All the best,
Chris
Comment 1 Alejandro Sanchez 2016-05-30 19:16:41 MDT
Hi Chris. Unfortunately this parameter can't be modified on the fly with just 'scontrol reconfigure'. A slurmd restart while there are no running job steps in the node is required. Copying from slurm.conf man page a footnote on a change to this parameter:

NOTE: Changing this configuration parameter changes the contents of the messages  between Slurm daemons. Any previously running job steps are managed by a slurmstepd daemon that will persist through the lifetime of that job step and not change it's communication protocol. Only change this configuration parameter when there are no running job steps.
Comment 2 Chris Samuel 2016-05-30 19:21:43 MDT
(In reply to Alejandro Sanchez from comment #1)

> Hi Chris. Unfortunately this parameter can't be modified on the fly with
> just 'scontrol reconfigure'. A slurmd restart while there are no running job
> steps in the node is required. Copying from slurm.conf man page a footnote
> on a change to this parameter:

Thanks & so sorry I missed that in the man page before! :-(

OK, so basically we'll need to put this off until we can take an outage.

Thanks!