Bug 4393

Summary: Recommended logrotate config no longer correct
Product: Slurm Reporter: David Gloe <david.gloe>
Component: DocumentationAssignee: Felip Moll <felip.moll>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 17.02.9   
Hardware: Linux   
OS: Linux   
Site: CRAY Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: Cray Internal
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 17.11 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description David Gloe 2017-11-16 13:14:37 MST
The logrotate configuration mentioned in man slurm.conf (https://slurm.schedmd.com/slurm.conf.html) no longer works correctly as of 17.02. It mentions using /etc/init.d/slurm reconfig, but /etc/init.d/slurm is no longer installed.

Please update the documentation with command(s) which will cause slurmctld/slurmd to reload their log files.
Comment 4 Felip Moll 2017-11-21 04:23:50 MST
(In reply to David Gloe from comment #0)
> The logrotate configuration mentioned in man slurm.conf
> (https://slurm.schedmd.com/slurm.conf.html) no longer works correctly as of
> 17.02. It mentions using /etc/init.d/slurm reconfig, but /etc/init.d/slurm
> is no longer installed.
> 
> Please update the documentation with command(s) which will cause
> slurmctld/slurmd to reload their log files.

Hi David,

I changed the documentation accordingly. You can find it in the commit b7cec940facc2b7631411bf38c8d7b7b271c729d. It will be available on 17.11.0 release and up. Documentation in the webpage will be refreshed when 17.11 is released in about a week.

Thanks for reporting!,
Felip M
Comment 5 David Gloe 2017-11-21 11:57:41 MST
slurmctld isn't reloading the log file when I send it SIGUSR2. Is this a separate bug?

opal-p2:/home/users/dgloe # mv /var/spool/slurm/slurmctld.log /var/spool/slurm/slurmctld.log.rotated
opal-p2:/home/users/dgloe # killall -SIGUSR2 slurmctld
opal-p2:/home/users/dgloe # ls /var/spool/slurm/slurmctld.log
ls: cannot access '/var/spool/slurm/slurmctld.log': No such file or directory
opal-p2:/home/users/dgloe # tail /var/spool/slurm/slurmctld.log.rotated

[2017-11-21T12:55:54.029] error: _bb_get_pools: json parser failed on DataWarp REST API error: /opt/cray/dws/default/bin/dwgateway exited 1: dwgateway: Gateway retrieval failed

[2017-11-21T12:55:54.030] error: _load_state: failed to find DataWarp entries, what now?
[2017-11-21T12:56:06.186] burst_buffer/cray: bb_p_job_try_stage_in
[2017-11-21T12:56:24.168] error: _bb_get_pools: pools status:256 response:DataWarp REST API error: /opt/cray/dws/default/bin/dwgateway exited 1: dwgateway: Gateway retrieval failed

[2017-11-21T12:56:24.168] error: _bb_get_pools: json parser failed on DataWarp REST API error: /opt/cray/dws/default/bin/dwgateway exited 1: dwgateway: Gateway retrieval failed

[2017-11-21T12:56:24.169] error: _load_state: failed to find DataWarp entries, what now?
Comment 6 Felip Moll 2017-11-21 13:31:21 MST
(In reply to David Gloe from comment #5)
> slurmctld isn't reloading the log file when I send it SIGUSR2. Is this a
> separate bug?
> 
> opal-p2:/home/users/dgloe # mv /var/spool/slurm/slurmctld.log
> /var/spool/slurm/slurmctld.log.rotated
> opal-p2:/home/users/dgloe # killall -SIGUSR2 slurmctld
> opal-p2:/home/users/dgloe # ls /var/spool/slurm/slurmctld.log
> ls: cannot access '/var/spool/slurm/slurmctld.log': No such file or directory
> opal-p2:/home/users/dgloe # tail /var/spool/slurm/slurmctld.log.rotated
> 
> [2017-11-21T12:55:54.029] error: _bb_get_pools: json parser failed on
> DataWarp REST API error: /opt/cray/dws/default/bin/dwgateway exited 1:
> dwgateway: Gateway retrieval failed
> 
> [2017-11-21T12:55:54.030] error: _load_state: failed to find DataWarp
> entries, what now?
> [2017-11-21T12:56:06.186] burst_buffer/cray: bb_p_job_try_stage_in
> [2017-11-21T12:56:24.168] error: _bb_get_pools: pools status:256
> response:DataWarp REST API error: /opt/cray/dws/default/bin/dwgateway exited
> 1: dwgateway: Gateway retrieval failed
> 
> [2017-11-21T12:56:24.168] error: _bb_get_pools: json parser failed on
> DataWarp REST API error: /opt/cray/dws/default/bin/dwgateway exited 1:
> dwgateway: Gateway retrieval failed
> 
> [2017-11-21T12:56:24.169] error: _load_state: failed to find DataWarp
> entries, what now?

Sorry, I forgot to comment that for versions previous to 17.11 the signal to be sent has to be SIGHUP instead of SIGUSR2.

Regards,
Felip M