Ticket 2631 - slurmstepd output lost when running slurmd in foreground
Summary: slurmstepd output lost when running slurmd in foreground
Status: CONFIRMED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmstepd (show other tickets)
Version: 16.05.x
Hardware: Linux Linux
: --- 5 - Enhancement
Assignee: Unassigned Developer
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-04-12 20:17 MDT by Janne Blomqvist
Modified: 2021-08-10 14:53 MDT (History)
1 user (show)

See Also:
Site: Institut Pasteur
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Janne Blomqvist 2016-04-12 20:17:48 MDT
Hi,

it seems that when one runs slurmd in the foreground (-D command-line option), quite a lot of the initial diagnostics output from slurmstepd is lost. Later on the expected output starts to appear, so I suspect the issue is that the I/O redirection for stdout/stderr isn't set up early enough, and before that is done the output disappears into some black hole?

This happens both for the built-in diagnostics output, and for "printf debugging" I have added myself. syslog() calls I have added myself do appear in the system log, so it's not a question of the functions I'm investigating not being called at all. FWIW, I have noticed this issue with src/common/cpu_frequency.c:{cpu_freq_cgroup_validate,cpu_freq_set}, although there are surely others as well. By the time the code gets to calling cpu_freq_reset(), the I/O redirection has been made and the output appears.
Comment 1 Nicolas Joly 2016-11-02 07:51:42 MDT
Hi,

We just got hit by this one on our TARS cluster running 16.05.5 ... All slurm daemons are managed by supervisord (http://supervisord.org/) which wants all
tools to be launched in foreground mode :

[...]
Programs meant to be run under supervisor should not daemonize themselves. Instead, they should run in the foreground. They should not detach from the terminal from which they are started.
[...]

And the only way to allow this behavior for slurm daemons is to start them in debug mode (-D). Unfortunately this alters logging too ... sending messages to stdout/stderr instead of syslog.

It would be great to have an option (-F) to launch slurm daemons in foreground mode without altering the logging. Having slurmstepd logs when daemons run in debug mode would be good too ...

Thanks.
Comment 2 Danny Auble 2016-11-02 09:09:19 MDT
Hey Nick, yeah, for some reason if you don't set the slurmdlogfile it doesn't get to syslog in Daemon mode.  I am guessing that is by design as most people running in daemon mode are doing debugging and perhaps don't want to see the debug in syslog.

I have verified if you set slurmdlogfile=/var/log/syslog it does show up there though.  Perhaps we could have an option that makes this happen by default when it is not set (hence changing this to a sev 5).

In any case I hope the work around is sufficient for now.

Having the slurmstepd log out of the stdout of the slurmd is a different story all together as it is a much harder issue since we have to grab the output of a separate process and add it to the slurmd's.
Comment 3 Nicolas Joly 2016-11-02 10:02:30 MDT
(In reply to Danny Auble from comment #2)
> Hey Nick, yeah, for some reason if you don't set the slurmdlogfile it
> doesn't get to syslog in Daemon mode.  I am guessing that is by design as
> most people running in daemon mode are doing debugging and perhaps don't
> want to see the debug in syslog.
> 
> I have verified if you set slurmdlogfile=/var/log/syslog it does show up
> there though.  Perhaps we could have an option that makes this happen by
> default when it is not set (hence changing this to a sev 5).
> 
> In any case I hope the work around is sufficient for now.

We are aggregating all slurmd logs in a central location for our 180 compute nodes with syslog. Now, we are facing some slurmstepd crashes (PR to be filled) ... and wanted to have the corresponding logs for the report. But this problem do not occur with high frequency, and logging locally will introduce some annoyance. Will see what we can do there.

> Having the slurmstepd log out of the stdout of the slurmd is a different
> story all together as it is a much harder issue since we have to grab the
> output of a separate process and add it to the slurmd's.

Let's forget this way, then.

Thanks.