Bug 11339

Summary: slurmctld systemd service file does not properly reload via `ExecReload`
Product: Slurm Reporter: Gordian Edenhofer <gordian.edenhofer>
Component: slurmctldAssignee: Jacob Jenson <jacob>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: C - Contributions    
Priority: ---    
Version: 21.08.x   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Gordian Edenhofer 2021-04-10 02:56:18 MDT
As discussed in this comment https://aur.archlinux.org/packages/slurm-llnl/#comment-795099 by Jakub Klinkovský in the Arch Linux User Repository the slurmctld systemd service does fully apply the new configuration on reload. The `ExecReload` line in the service file reads `ExecReload=/bin/kill -HUP $MAINPID` which should instruct slurmctld to reload its configuration. However, configuration changes like increasing or decreasing the time limit of a partition are not applied after a reload. Jakub Klinkovský, the original reporter of this issue, suggested to change the reload command to `ExecReload=/usr/bin/scontrol reconfigure`.
Comment 1 Tim Wickberg 2021-04-12 12:10:10 MDT
(In reply to Gordian Edenhofer from comment #0)
> As discussed in this comment
> https://aur.archlinux.org/packages/slurm-llnl/#comment-795099 by Jakub
> Klinkovský in the Arch Linux User Repository the slurmctld systemd service
> does fully apply the new configuration on reload. The `ExecReload` line in
> the service file reads `ExecReload=/bin/kill -HUP $MAINPID` which should
> instruct slurmctld to reload its configuration. However, configuration
> changes like increasing or decreasing the time limit of a partition are not
> applied after a reload. Jakub Klinkovský, the original reporter of this
> issue, suggested to change the reload command to
> `ExecReload=/usr/bin/scontrol reconfigure`.

Thank you for the report, but I would not recommend that change. (a) 'scontrol reconfigure' issues a reconfiguration request to the entire cluster, which is likely not intended, and (b) it will not actually permit changes to certain other settings at this time.

This is a limitation we're aware of and may address in the future, but in the meantime SIGHUP is the closest conceptually to what systemd is trying to do there.
Comment 2 Gordian Edenhofer 2021-04-12 12:31:59 MDT
Makes sense. Thanks for the explanation!