I use pam_slurm_adopt, and cgroups; ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup and cgroup.conf; CgroupMountpoint=/sys/fs/cgroup CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes AllowedRAMSpace=100 ConstrainSwapSpace=yes AllowedSwapSpace=0 ConstrainDevices=yes sbatch and srun processes seem to land in the correct cgroups and it all works perfectly, but extern steps when ssh'ing into the node does not. The cpuset cgroup contains the extern PID as I expected; /sys/fs/cgroup/cpuset/slurm/uid_xxxxx/job_xxx/cgroup.procs but it's not to be found in memory and devices: /sys/fs/cgroup/memory/slurm/uid_xxxxx/job_xxx/cgroup.procs /sys/fs/cgroup/devices/slurm/uid_xxxxx/job_xxx/cgroup.procs and running "nvidia-smi" on a shared GPU node I ssh into via pam_slurm_adopt I see all GPUs and not just the ones allocated to the job. This suggests to me that the ssh shell isn't constrained in terms of memory or gpus. Oversight, bug, or have I missed a configuration option?
Mikael, Can you share your pam configuration for sshd? cheers, Marcin
Turns out systemd-logind wasn't disabled and masked on these nodes. The fact that the cpuset group was working threw me off. Sorry for the noise. Best regards, Mikael *** This ticket has been marked as a duplicate of ticket 5920 ***