We are seeing errors from software packages that use XDG_RUNTIME_DIR's value and expect to write to /run/user/109366, for example. From the research I've done, this is normally created by the pam_systemd.so PAM module. However, this module is not a part of /etc/pam.d/slurm on our system, and therefore is not called. Currently, we use pam_slurm.so in /etc/pam.d/sshd on our compute nodes. We are aware of the existence with pam_slurm_adopt.so and what it does, and theoretically will plan to implement it. I believe both of these modules are used not in /etc/pam.d/slurm, but by other services like sshd. Ideally, I'd like to provide the XDG-related stuff by adding pam_systemd.so to /etc/pam.d/slurm on the compute nodes. I'm warned that this has some sort of interaction with cgroups, however, which may affect the native cgroup handling in SLURM and pam_slurm_adopt.so. Can you comment on a sensible way to move forward? Thanks!
OS is CentOS 7.5, FYI.
(In reply to Ryan Novosielski from comment #1) > OS is CentOS 7.5, FYI. Hi Ryan, We are aware of the situation and are currently discussing it internally to find a fix for this. There's a proposal to change the design of pam_slurm_adopt.so and split the module into two, one for pam account and one for session setup. What happens now is that as you found, pam_systemd.so setups the session doing this: a) Create /run/user/$UID and set XDG_RUNTIME_DIR b) set XDG_SESSION_ID c) A new systemd scope unit is created for the session d) remove /run/user/$UID on exit. on c), it creates a new scope, meaning that the process will be eventually moved into a new cgroup. This conflicts with pam_slurm_adopt where we adopt the ssh sessions into the slurm cgroup, because pam_systemd.so runs after pam_slurm_adopt.so. I am marking now your bug as a duplicate of bug 5920, I encourage you to follow that one to keep track of the status. For the moment the workaround is to set/unset the required directories and variables in a prolog/epilog script. Please, comment on 5920 if you have further feedback. *** This ticket has been marked as a duplicate of ticket 5920 ***
Felip, I have another question. The question in this bug was really about the use of pam_systemd in the "slurm" PAM service, not "ssh" or "system-auth" or other. It didn't immediately occur to me that pam_slurm_adopt conversely would be in the "ssh" PAM service. My new question is the following: it seems like there is no real potential for negative interaction created by adding "pam_systemd" to the SLURM service, as the conflict is with "pam_slurm_adopt", and that module is used in a different service.
> My new question is the following: it seems like there is no real potential > for negative interaction created by adding "pam_systemd" to the SLURM > service, as the conflict is with "pam_slurm_adopt", and that module is used > in a different service. That's an interesting question. I need to read through PAM code and do a couple of tests before responding properly, because I am not sure how will systemd affect the current cgroups when the module is called. Also, I'm afraid that systemd will steal processes back if you have both PAM and pam_slurm_adopt enabled. Let me check everything, I will come back to you.
Thanks, I appreciate it. I'm going to run a test to compare /proc/self/cgroup on a node where we have placed pam_systemd into the slurm service and one where we haven't, after 15 mins, to see if there is any difference.
Node with pam_systemd in slurm service: [novosirj@amarel-test1 ~]$ srun --reservation=pam_systemd -t 30:00 --pty bash -i srun: job 95830766 queued and waiting for resources srun: job 95830766 has been allocated resources [novosirj@slepner060 ~]$ sleep 900; cat /proc/self/cgroup 11:cpuset:/slurm/uid_109366/job_95830766/step_0 10:cpuacct,cpu:/ 9:blkio:/ 8:freezer:/slurm/uid_109366/job_95830766/step_0 7:pids:/ 6:hugetlb:/ 5:devices:/ 4:memory:/slurm/uid_109366/job_95830766/step_0 3:perf_event:/ 2:net_prio,net_cls:/ 1:name=systemd:/user.slice/user-109366.slice/session-c20.scope Node without pam_systemd in slurm service: [novosirj@amarel-test2 ~]$ srun --pty -t 30:00 bash -i srun: job 95830770 queued and waiting for resources srun: job 95830770 has been allocated resources [novosirj@node009 ~]$ sleep 900; cat /proc/self/cgroup 11:memory:/slurm/uid_109366/job_95830770/step_0 10:freezer:/slurm/uid_109366/job_95830770/step_0 9:cpuset:/slurm/uid_109366/job_95830770/step_0 8:blkio:/ 7:net_prio,net_cls:/ 6:cpuacct,cpu:/ 5:perf_event:/ 4:devices:/ 3:pids:/ 2:hugetlb:/ 1:name=systemd:/system.slice/slurmd.service There does seem to be a difference. Whether this matters is a question.
Just checking to see if there's any update. Thank you!
(In reply to Ryan Novosielski from comment #7) > Just checking to see if there's any update. Thank you! The following containers: 11:memory:/slurm/uid_109366/job_95830770/step_0 10:freezer:/slurm/uid_109366/job_95830770/step_0 9:cpuset:/slurm/uid_109366/job_95830770/step_0 Are equal in both cases. I guess there is no problem if you enable this, but take into account that pam_slurm_adopt is affected by the problem, so adding this module may not be possible with your modifications. Also ensure the slurmd service unit has a Delegate=yes, otherwise cgroups created from slurmd may be modified by systemd e.g. when reloading or restarting a service.
Thank you very much. Implementing pam_slurm_adopt is a slightly longer term goal and this is a good workaround for a more pressing need. We’ll keep it in mind.
(In reply to Ryan Novosielski from comment #9) > Thank you very much. Implementing pam_slurm_adopt is a slightly longer term > goal and this is a good workaround for a more pressing need. We’ll keep it > in mind. Ok Ryan, if it is fine for you I will close this bug and keep track of the pam_systemd and pam_slurm_adopt in the bug 5920.
That's fine by me; thanks again.
Thanks, Closing as infogiven.