Ticket 6540 - Need assistance determining a sane path for XDG_RUNTIME_DIR, pam_systemd, and SLURM
Summary: Need assistance determining a sane path for XDG_RUNTIME_DIR, pam_systemd, and...
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 17.11.7
Hardware: Linux Linux
: --- 3 - Medium Impact
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-02-19 10:29 MST by Ryan Novosielski
Modified: 2019-03-25 08:13 MDT (History)
0 users

See Also:
Site: Rutgers
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: CentOS
Machine Name: amarel
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ryan Novosielski 2019-02-19 10:29:39 MST
We are seeing errors from software packages that use XDG_RUNTIME_DIR's value and expect to write to /run/user/109366, for example. From the research I've done, this is normally created by the pam_systemd.so PAM module. However, this module is not a part of /etc/pam.d/slurm on our system, and therefore is not called.

Currently, we use pam_slurm.so in /etc/pam.d/sshd on our compute nodes. We are aware of the existence with pam_slurm_adopt.so and what it does, and theoretically will plan to implement it. I believe both of these modules are used not in /etc/pam.d/slurm, but by other services like sshd.

Ideally, I'd like to provide the XDG-related stuff by adding pam_systemd.so to /etc/pam.d/slurm on the compute nodes. I'm warned that this has some sort of interaction with cgroups, however, which may affect the native cgroup handling in SLURM and pam_slurm_adopt.so.

Can you comment on a sensible way to move forward? Thanks!
Comment 1 Ryan Novosielski 2019-02-19 10:30:33 MST
OS is CentOS 7.5, FYI.
Comment 2 Felip Moll 2019-02-20 03:49:47 MST
(In reply to Ryan Novosielski from comment #1)
> OS is CentOS 7.5, FYI.

Hi Ryan,

We are aware of the situation and are currently discussing it internally
to find a fix for this. There's a proposal to change the design of pam_slurm_adopt.so
and split the module into two, one for pam account and one for session setup.

What happens now is that as you found, pam_systemd.so setups the session doing this:
a) Create  /run/user/$UID and set XDG_RUNTIME_DIR
b) set XDG_SESSION_ID
c) A new systemd scope unit is created for the session
d) remove /run/user/$UID on exit.

on c), it creates a new scope, meaning that the process will be eventually moved into
a new cgroup. This conflicts with pam_slurm_adopt where we adopt the ssh sessions into
the slurm cgroup, because pam_systemd.so runs after pam_slurm_adopt.so.

I am marking now your bug as a duplicate of bug 5920, I encourage you to follow that one
to keep track of the status.

For the moment the workaround is to set/unset the required directories and variables in a
prolog/epilog script.

Please, comment on 5920 if you have further feedback.

*** This ticket has been marked as a duplicate of ticket 5920 ***
Comment 3 Ryan Novosielski 2019-03-21 09:52:41 MDT
Felip,

I have another question. The question in this bug was really about the use of pam_systemd in the "slurm" PAM service, not "ssh" or "system-auth" or other. It didn't immediately occur to me that pam_slurm_adopt conversely would be in the "ssh" PAM service.

My new question is the following: it seems like there is no real potential for negative interaction created by adding "pam_systemd" to the SLURM service, as the conflict is with "pam_slurm_adopt", and that module is used in a different service.
Comment 4 Felip Moll 2019-03-21 10:47:53 MDT
> My new question is the following: it seems like there is no real potential
> for negative interaction created by adding "pam_systemd" to the SLURM
> service, as the conflict is with "pam_slurm_adopt", and that module is used
> in a different service.

That's an interesting question. I need to read through PAM code and do a couple of tests before responding properly, because I am not sure how will systemd affect the current cgroups when the module is called. Also, I'm afraid that systemd will steal processes back if you have both PAM and pam_slurm_adopt enabled.

Let me check everything, I will come back to you.
Comment 5 Ryan Novosielski 2019-03-21 11:22:04 MDT
Thanks, I appreciate it. 

I'm going to run a test to compare /proc/self/cgroup on a node where we have placed pam_systemd into the slurm service and one where we haven't, after 15 mins, to see if there is any difference.
Comment 6 Ryan Novosielski 2019-03-21 11:38:24 MDT
Node with pam_systemd in slurm service:

[novosirj@amarel-test1 ~]$ srun --reservation=pam_systemd -t 30:00 --pty bash -i
srun: job 95830766 queued and waiting for resources
srun: job 95830766 has been allocated resources
[novosirj@slepner060 ~]$ sleep 900; cat /proc/self/cgroup
11:cpuset:/slurm/uid_109366/job_95830766/step_0
10:cpuacct,cpu:/
9:blkio:/
8:freezer:/slurm/uid_109366/job_95830766/step_0
7:pids:/
6:hugetlb:/
5:devices:/
4:memory:/slurm/uid_109366/job_95830766/step_0
3:perf_event:/
2:net_prio,net_cls:/
1:name=systemd:/user.slice/user-109366.slice/session-c20.scope

Node without pam_systemd in slurm service:

[novosirj@amarel-test2 ~]$ srun --pty -t 30:00 bash -i
srun: job 95830770 queued and waiting for resources
srun: job 95830770 has been allocated resources
[novosirj@node009 ~]$ sleep 900; cat /proc/self/cgroup
11:memory:/slurm/uid_109366/job_95830770/step_0
10:freezer:/slurm/uid_109366/job_95830770/step_0
9:cpuset:/slurm/uid_109366/job_95830770/step_0
8:blkio:/
7:net_prio,net_cls:/
6:cpuacct,cpu:/
5:perf_event:/
4:devices:/
3:pids:/
2:hugetlb:/
1:name=systemd:/system.slice/slurmd.service

There does seem to be a difference. Whether this matters is a question.
Comment 7 Ryan Novosielski 2019-03-22 12:10:45 MDT
Just checking to see if there's any update. Thank you!
Comment 8 Felip Moll 2019-03-25 06:02:16 MDT
(In reply to Ryan Novosielski from comment #7)
> Just checking to see if there's any update. Thank you!

The following containers:
11:memory:/slurm/uid_109366/job_95830770/step_0
10:freezer:/slurm/uid_109366/job_95830770/step_0
9:cpuset:/slurm/uid_109366/job_95830770/step_0

Are equal in both cases. I guess there is no problem if you enable this, but take into account that pam_slurm_adopt is affected by the problem, so adding this module may not be possible with your modifications.

Also ensure the slurmd service unit has a Delegate=yes, otherwise cgroups created from slurmd may be modified by systemd e.g. when reloading or restarting a service.
Comment 9 Ryan Novosielski 2019-03-25 06:06:59 MDT
Thank you very much. Implementing pam_slurm_adopt is a slightly longer term goal and this is a good workaround for a more pressing need. We’ll keep it in mind.
Comment 10 Felip Moll 2019-03-25 07:52:33 MDT
(In reply to Ryan Novosielski from comment #9)
> Thank you very much. Implementing pam_slurm_adopt is a slightly longer term
> goal and this is a good workaround for a more pressing need. We’ll keep it
> in mind.

Ok Ryan, if it is fine for you I will close this bug and keep track of the pam_systemd and pam_slurm_adopt in the bug 5920.
Comment 11 Ryan Novosielski 2019-03-25 08:10:17 MDT
That's fine by me; thanks again.
Comment 12 Felip Moll 2019-03-25 08:13:38 MDT
Thanks,

Closing as infogiven.