Ticket 12249 - No syslog output from slurmstepd when slurmd runs in foreground
Summary: No syslog output from slurmstepd when slurmd runs in foreground
Status: RESOLVED DUPLICATE of ticket 10625
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmstepd (show other tickets)
Version: 20.11.5
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks: 7231
  Show dependency treegraph
 
Reported: 2021-08-10 14:22 MDT by David Gloe
Modified: 2021-10-22 12:41 MDT (History)
0 users

See Also:
Site: CRAY
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: Cray Internal
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description David Gloe 2021-08-10 14:22:27 MDT
We have Slurm configured to log to syslog, by not setting SlurmdLogFile in slurm.conf. When we upgraded to Slurm 20.11, we were no longer seeing any logging from slurmd at all, most likely due to the change to the systemd unit file to run slurmd in the foreground.

Adding SlurmdSyslogDebug=info to slurm.conf results in getting slurmd logs, but still nothing from slurmstepd. For example, here's what we get in syslog for a typical job step:

Aug 10 15:19:09 nid001418 slurmd[74577]: launch task StepId=80870.0 request from
 UID:0 GID:0 HOST:10.100.1.96 PORT:51726
Aug 10 15:19:09 nid001418 slurmd[74577]: task/affinity: lllp_distribution: JobId=80870 auto binding off: mask_cpu

And here's what we get in a log file:

[2021-08-10T15:19:09.989] launch task StepId=80870.0 request from UID:0 GID:0 HOST:10.100.1.96 PORT:51726
[2021-08-10T15:19:09.989] task/affinity: lllp_distribution: JobId=80870 auto binding off: mask_cpu
[2021-08-10T15:19:10.777] [80870.0] task/cgroup: _memcg_initialize: /slurm/uid_0/job_80870: alloc=227328MB mem.limit=227328MB memsw.limit=unlimited
[2021-08-10T15:19:10.777] [80870.0] task/cgroup: _memcg_initialize: /slurm/uid_0/job_80870/step_0: alloc=227328MB mem.limit=227328MB memsw.limit=unlimited
[2021-08-10T15:19:10.859] [80870.0] done with job

This looks similar to bug 2631.
Comment 2 Michael Hinton 2021-08-11 11:22:45 MDT
Hi David,

This is a known issue when running the slurmd systemd service in the foreground, which is what the slurmd service file does starting in 20.11. To revert to the old behavior, change `Type` back to `forking`, remove `-D` from the ExecStart command, and add `PIDFile=/var/run/slurmd.pid` back in. See https://bugs.schedmd.com/show_bug.cgi?id=10625#c10 for more context.

We are still looking into a permanent fix for users who rely on syslog output, but for now, let me know if this workaround works for you.

Thanks!
-Michael
Comment 5 Michael Hinton 2021-10-22 12:41:08 MDT
Hi David,

I'm going to go ahead and merge this into bug 10625. Feel free to continue engaging with us there.

Thanks!
-Michael

*** This ticket has been marked as a duplicate of ticket 10625 ***