This is a fork of a ticket from bug ID 18620 for this issue: From https://bugs.schedmd.com/show_bug.cgi?id=18620#c15 The Training cluster is up, but has a few issues. [root@mgmtnode ~]# slurmd -V slurm 23.11.1 - It seems slurmd failed to start on each node[00-09] - it appears the cluster added slurm_pam_adopt but ssh still allows through - I see idle cloud[xxx] nodes?? I did not do a cloud build - I can't ssh to the 'db' note and check it but can ping it I can freely otherwise ssh between the nodes, etc. Here are some diagnostic excerpts: [root@mgmtnode ~]# sinfo -la Fri Jan 12 12:06:49 2024 PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE RESERVATION NODELIST cloud up infinite 1-infinite no NO all 1025 idle~ cloud[0000-1024] debug* up infinite 1-infinite no NO all 10 unknown* node[00-09] [root@mgmtnode ~]# ssh db ssh: connect to host db port 22: Connection refused [root@mgmtnode ~]# ping db PING db(db (2001:db8:1:1::1:3)) 56 data bytes 64 bytes from db (2001:db8:1:1::1:3): icmp_seq=1 ttl=64 time=0.045 ms 64 bytes from db (2001:db8:1:1::1:3): icmp_seq=2 ttl=64 time=0.040 ms 64 bytes from db (2001:db8:1:1::1:3): icmp_seq=3 ttl=64 time=0.035 ms 64 bytes from db (2001:db8:1:1::1:3): icmp_seq=4 ttl=64 time=0.027 ms ^C --- db ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3098ms rtt min/avg/max/mdev = 0.027/0.036/0.045/0.009 ms slurmctld is fine: [root@mgmtnode ~]# systemctl status slurmctld ● slurmctld.service - Slurm controller daemon Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/slurmctld.service.d └─cluster.conf /etc/systemd/system/slurmctld.service.d └─local.conf Active: active (running) since Fri 2024-01-12 04:15:03 EST; 7h ago Process: 173 ExecStartPost=/usr/local/bin/slurmctld.startup2.sh (code=exited, status=0/SUCCESS) Process: 113 ExecStartPre=/usr/local/bin/slurmctld.startup.sh (code=exited, status=0/SUCCESS) Process: 112 ExecStartPre=/usr/bin/chown slurm:slurm /var/log/slurmctld.log (code=exited, status=0/SUCCESS) Process: 111 ExecStartPre=/usr/bin/touch /var/log/slurmctld.log (code=exited, status=0/SUCCESS) Process: 110 ExecStartPre=/usr/bin/chmod -R 0770 /auth (code=exited, status=0/SUCCESS) Process: 107 ExecStartPre=/usr/bin/chown -R slurm:slurm /auth /etc/slurm/ (code=exited, status=0/SUCCESS) Main PID: 166 (slurmctld) Tasks: 36 Memory: 46.2M CGroup: /docker.slice/docker-2b1c74d8bc42dee8dbe55d6a796b56bb3d4500586ca08fe0c94bd56f92196cc8.scope/system.slice/slurmctld.service ├─166 /usr/local/sbin/slurmctld --systemd └─167 slurmctld: slurmscriptd Jan 12 11:52:19 mgmtnode slurmctld[166]: slurmctld: debug: Spawning registration agent for node[00-09] 10 hosts Jan 12 11:52:31 mgmtnode slurmctld[166]: slurmctld: debug: Spawning registration agent for node[00-09] 10 hosts Jan 12 11:52:31 mgmtnode slurmctld[166]: slurmctld: error: Nodes node[00-09] not responding Jan 12 11:52:39 mgmtnode slurmctld[166]: slurmctld: debug: Requesting control from backup controller mgmtnode2 Jan 12 11:52:39 mgmtnode slurmctld[166]: slurmctld: debug: backup controller mgmtnode2 responding Jan 12 11:52:43 mgmtnode slurmctld[166]: slurmctld: error: Nodes node[00-09] not responding Jan 12 11:52:43 mgmtnode slurmctld[166]: slurmctld: debug: Spawning registration agent for node[00-09] 10 hosts Jan 12 11:52:52 mgmtnode slurmctld[166]: slurmctld: debug: sched: Running job scheduler for full queue. Jan 12 11:52:55 mgmtnode slurmctld[166]: slurmctld: debug: Spawning registration agent for node[00-09] 10 hosts Jan 12 11:52:55 mgmtnode slurmctld[166]: slurmctld: error: Nodes node[00-09] not responding but slurmd even after a restart fails. They are all like that in each node I spot-checked: [root@node07 ~]# systemctl list-units UNIT LOAD ACTIVE SUB DESCRIPTION -.mount loaded active mounted Root Mount dev-log.mount loaded active mounted /dev/log dev-mqueue.mount loaded active mounted POSIX Message Queue File System etc-hostname.mount loaded active mounted /etc/hostname etc-hosts.mount loaded active mounted /etc/hosts etc-resolv.conf.mount loaded active mounted /etc/resolv.conf etc-slurm.mount loaded active mounted /etc/slurm etc-ssh.mount loaded active mounted /etc/ssh home.mount loaded active mounted /home proc-acpi.mount loaded active mounted /proc/acpi proc-bus.mount loaded active mounted /proc/bus proc-fs.mount loaded active mounted /proc/fs proc-irq.mount loaded active mounted /proc/irq proc-kcore.mount loaded active mounted /proc/kcore proc-keys.mount loaded active mounted /proc/keys proc-scsi.mount loaded active mounted /proc/scsi proc-sysrq\x2dtrigger.mount loaded active mounted /proc/sysrq-trigger proc-timer_list.mount loaded active mounted /proc/timer_list root.mount loaded active mounted root.mount run-lock.mount loaded active mounted /run/lock srv-containers.mount loaded active mounted /srv/containers ● sys-fs-fuse-connections.mount masked active mounted sys-fs-fuse-connections.mount sys-fs-fuse.mount loaded active mounted /sys/fs/fuse tmp.mount loaded active mounted Temporary Directory (/tmp) usr-local-src.mount loaded active mounted /usr/local/src usr-share-zoneinfo-UTC.mount loaded active mounted /usr/share/zoneinfo/UTC var-lib-journal.mount loaded active mounted /var/lib/journal var-spool-mail.mount loaded active mounted /var/spool/mail systemd-ask-password-console.path loaded active waiting Dispatch Password Requests to Console Directory Watch systemd-ask-password-wall.path loaded active waiting Forward Password Requests to Wall Directory Watch init.scope loaded active running System and Service Manager crond.service loaded active running Command Scheduler dbus.service loaded active running D-Bus System Message Bus dracut-shutdown.service loaded active exited Restore /run/initramfs on shutdown ldconfig.service loaded active exited Rebuild Dynamic Linker Cache munge.service loaded active running MUNGE authentication service selinux-autorelabel-mark.service loaded active exited Mark the need to relabel after reboot ● slurmd.service loaded failed failed Slurm node daemon sshd.service loaded active running OpenSSH server daemon systemd-journal-catalog-update.service loaded active exited Rebuild Journal Catalog systemd-journal-flush.service loaded active exited Flush Journal to Persistent Storage systemd-journald.service loaded active running Journal Service systemd-sysusers.service loaded active exited Create System Users systemd-tmpfiles-setup.service loaded active exited Create Volatile Files and Directories systemd-update-done.service loaded active exited Update is Completed systemd-update-utmp.service loaded active exited Update UTMP about System Boot/Shutdown systemd-user-sessions.service loaded active exited Permit User Sessions -.slice loaded active active Root Slice [root@node07 ~]# systemctl slurmd status Unknown operation slurmd. [root@node07 ~]# systemctl status slurmd ● slurmd.service - Slurm node daemon Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/slurmd.service.d └─cluster.conf /etc/systemd/system/slurmd.service.d └─local.conf Active: failed (Result: exit-code) since Fri 2024-01-12 04:14:29 EST; 7h ago Process: 121 ExecStart=/usr/local/sbin/slurmd --systemd $SLURMD_OPTIONS (code=exited, status=1/FAILURE) Process: 114 ExecStartPre=/usr/local/bin/slurmd.startup.sh (code=exited, status=0/SUCCESS) Process: 113 ExecStartPre=/usr/bin/chown slurm:slurm /var/log/slurmd.log (code=exited, status=0/SUCCESS) Process: 112 ExecStartPre=/usr/bin/touch /var/log/slurmd.log (code=exited, status=0/SUCCESS) Main PID: 121 (code=exited, status=1/FAILURE) Jan 12 04:14:29 node07 slurmd[121]: slurmd: error: ProctrackType 1 specified more than once, latest value used Jan 12 04:14:29 node07 slurmd[121]: slurmd: debug: Log file re-opened Jan 12 04:14:29 node07 slurmd[121]: slurmd: debug: CPUs:48 Boards:1 Sockets:2 CoresPerSocket:12 ThreadsPerCore:2 Jan 12 04:14:29 node07 slurmd[121]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 9:devices:/ Jan 12 04:14:29 node07 slurmd[121]: 7:freezer:/ Jan 12 04:14:29 node07 slurmd[121]: 6:perf_event:/ Jan 12 04:14:29 node07 slurmd[121]: 5:net_cls,net_prio:/ Jan 12 04:14:29 node07 slurmd[121]: 1:name=systemd:/ Jan 12 04:14:29 node07 slurmd[121]: 0::/docker.slice/docker-07a88b0251dce9e294f193e08a3c5a8821a24e089f927876045b1542a43e1023.scope/init.scope [root@node07 ~]# systemctl restart slurmd Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details. [root@node07 ~]# journalctl -xe -- The system journal process has started up, opened the journal -- files for writing and is now ready to process requests. Jan 12 04:14:29 node07 systemd-journald[70]: Runtime journal (/run/log/journal/040e07a4495d48b1925d320d1aa78b73) is 8.0M, max 4.0G, 3.9G free. -- Subject: Disk space used by the journal -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Runtime journal (/run/log/journal/040e07a4495d48b1925d320d1aa78b73) is currently using 8.0M. -- Maximum allowed usage is set to 4.0G. -- Leaving at least 4.0G free (of currently available 20.2G of disk space). -- Enforced usage limit is thus 4.0G, of which 3.9G are still available. -- -- The limits controlling how much disk space is used by the journal may -- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=, -- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in -- /etc/systemd/journald.conf. See journald.conf(5) for details. Jan 12 04:14:29 node07 systemd-journald[70]: Runtime journal (/run/log/journal/040e07a4495d48b1925d320d1aa78b73) is 8.0M, max 4.0G, 3.9G free. -- Subject: Disk space used by the journal -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Runtime journal (/run/log/journal/040e07a4495d48b1925d320d1aa78b73) is currently using 8.0M. -- Maximum allowed usage is set to 4.0G. -- Leaving at least 4.0G free (of currently available 20.2G of disk space). -- Enforced usage limit is thus 4.0G, of which 3.9G are still available. -- -- The limits controlling how much disk space is used by the journal may -- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=, -- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in -- /etc/systemd/journald.conf. See journald.conf(5) for details. Jan 12 04:14:29 node07 systemd-tmpfiles[76]: [/usr/local/lib/tmpfiles.d/munge.conf:1] Line references path below legacy directory /var/run/, updating /var/run/munge → /run/munge; please update the tmpfiles.d/ d> Jan 12 04:14:29 node07 slurmd[121]: slurmd: error: ProctrackType 1 specified more than once, latest value used Jan 12 04:14:29 node07 slurmd[121]: slurmd: debug: Log file re-opened Jan 12 04:14:29 node07 slurmd[121]: slurmd: debug: CPUs:48 Boards:1 Sockets:2 CoresPerSocket:12 ThreadsPerCore:2 Jan 12 04:14:29 node07 slurmd[121]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 9:devices:/ Jan 12 04:14:29 node07 slurmd[121]: 7:freezer:/ Jan 12 04:14:29 node07 slurmd[121]: 6:perf_event:/ Jan 12 04:14:29 node07 slurmd[121]: 5:net_cls,net_prio:/ Jan 12 04:14:29 node07 slurmd[121]: 1:name=systemd:/ Jan 12 04:14:29 node07 slurmd[121]: 0::/docker.slice/docker-07a88b0251dce9e294f193e08a3c5a8821a24e089f927876045b1542a43e1023.scope/init.scope Jan 12 11:51:24 node07 slurmd[258]: slurmd: error: ProctrackType 1 specified more than once, latest value used Jan 12 11:51:24 node07 slurmd[258]: slurmd: debug: Log file re-opened Jan 12 11:51:24 node07 slurmd[258]: slurmd: debug: CPUs:48 Boards:1 Sockets:2 CoresPerSocket:12 ThreadsPerCore:2 Jan 12 11:51:24 node07 slurmd[258]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 9:devices:/ Jan 12 11:51:24 node07 slurmd[258]: 7:freezer:/ Jan 12 11:51:24 node07 slurmd[258]: 6:perf_event:/ Jan 12 11:51:24 node07 slurmd[258]: 5:net_cls,net_prio:/ Jan 12 11:51:24 node07 slurmd[258]: 1:name=systemd:/ Jan 12 11:51:24 node07 slurmd[258]: 0::/docker.slice/docker-07a88b0251dce9e294f193e08a3c5a8821a24e089f927876045b1542a43e1023.scope/init.scope ESCOC a78b73) is 8.0M, max 4.0G, 3.9G free. a78b73) is 8.0M, max 4.0G, 3.9G free. h below legacy directory /var/run/, updating /var/run/munge → /run/munge; please update the tmpfiles.d/ drop-in file accordingly. used erCore:2 evices:/ 876045b1542a43e1023.scope/init.scope used erCore:2 evices:/ 876045b1542a43e1023.scope/init.scope ESCOD -- The system journal process has started up, opened the journal -- files for writing and is now ready to process requests. Jan 12 04:14:29 node07 systemd-journald[70]: Runtime journal (/run/log/journal/040e07a4495d48b1925d320d1aa78b73) is 8.0M, max 4.0G, 3.9G free. -- Subject: Disk space used by the journal -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Runtime journal (/run/log/journal/040e07a4495d48b1925d320d1aa78b73) is currently using 8.0M. -- Maximum allowed usage is set to 4.0G. -- Leaving at least 4.0G free (of currently available 20.2G of disk space). -- Enforced usage limit is thus 4.0G, of which 3.9G are still available. -- -- The limits controlling how much disk space is used by the journal may -- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=, -- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in -- /etc/systemd/journald.conf. See journald.conf(5) for details. Jan 12 04:14:29 node07 systemd-journald[70]: Runtime journal (/run/log/journal/040e07a4495d48b1925d320d1aa78b73) is 8.0M, max 4.0G, 3.9G free. -- Subject: Disk space used by the journal -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Runtime journal (/run/log/journal/040e07a4495d48b1925d320d1aa78b73) is currently using 8.0M. -- Maximum allowed usage is set to 4.0G. -- Leaving at least 4.0G free (of currently available 20.2G of disk space). -- Enforced usage limit is thus 4.0G, of which 3.9G are still available. -- -- The limits controlling how much disk space is used by the journal may -- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=, -- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in -- /etc/systemd/journald.conf. See journald.conf(5) for details. Jan 12 04:14:29 node07 systemd-tmpfiles[76]: [/usr/local/lib/tmpfiles.d/munge.conf:1] Line references path below legacy directory /var/run/, updating /var/run/munge → /run/munge; please update the tmpfiles.d/ d> Jan 12 04:14:29 node07 slurmd[121]: slurmd: error: ProctrackType 1 specified more than once, latest value used Jan 12 04:14:29 node07 slurmd[121]: slurmd: debug: Log file re-opened Jan 12 04:14:29 node07 slurmd[121]: slurmd: debug: CPUs:48 Boards:1 Sockets:2 CoresPerSocket:12 ThreadsPerCore:2 Jan 12 04:14:29 node07 slurmd[121]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 9:devices:/ Jan 12 04:14:29 node07 slurmd[121]: 7:freezer:/ Jan 12 04:14:29 node07 slurmd[121]: 6:perf_event:/ Jan 12 04:14:29 node07 slurmd[121]: 5:net_cls,net_prio:/ Jan 12 04:14:29 node07 slurmd[121]: 1:name=systemd:/ Jan 12 04:14:29 node07 slurmd[121]: 0::/docker.slice/docker-07a88b0251dce9e294f193e08a3c5a8821a24e089f927876045b1542a43e1023.scope/init.scope Jan 12 11:51:24 node07 slurmd[258]: slurmd: error: ProctrackType 1 specified more than once, latest value used Jan 12 11:51:24 node07 slurmd[258]: slurmd: debug: Log file re-opened Jan 12 11:51:24 node07 slurmd[258]: slurmd: debug: CPUs:48 Boards:1 Sockets:2 CoresPerSocket:12 ThreadsPerCore:2 Jan 12 11:51:24 node07 slurmd[258]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 9:devices:/ Jan 12 11:51:24 node07 slurmd[258]: 7:freezer:/ Jan 12 11:51:24 node07 slurmd[258]: 6:perf_event:/ Jan 12 11:51:24 node07 slurmd[258]: 5:net_cls,net_prio:/ Jan 12 11:51:24 node07 slurmd[258]: 1:name=systemd:/ Jan 12 11:51:24 node07 slurmd[258]: 0::/docker.slice/docker-07a88b0251dce9e294f193e08a3c5a8821a24e089f927876045b1542a43e1023.scope/init.scope wilma's pam_slurm_adopt is not working (BTW, in our real cluster in el7 and el9 it does work, just reporting this with the test cluster I just built): [root@node09 /]# uname -a Linux node09 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023 x86_64 x86_64 x86_64 GNU/Linux [root@node09 /]# ssh wilma@node01 wilma@node01's password: Access denied by pam_slurm_adopt: you have no active jobs on this node [wilma@node01 ~]$ netstat -pnat (No info could be read for "-p": geteuid()=1013 but you should be root.) Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN - tcp 0 0 127.0.0.11:40891 0.0.0.0:* LISTEN - tcp6 0 0 :::22 :::* LISTEN - tcp6 0 0 2001:db8:1:1::5:11:22 2001:db8:1:1::5:1:58838 ESTABLISHED - [wilma@node01 ~]$ route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default _gateway 0.0.0.0 UG 0 0 0 eth0 10.11.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 Main issue now for me to get slurmd working in node01-09. What could be the cause? Thanks!
(In reply to Serguei Mokhov from comment #0) > This is a fork of a ticket from bug ID 18620 for this issue: Thanks. Having 1 issue per ticket really helps us keep track of all of the issues and avoid a lot of confusion. > - It seems slurmd failed to start on each node[00-09] The relevant log was in comment#0: > Jan 12 04:14:29 node07 slurmd[121]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 9:devices:/ As noted here: > https://slurm.schedmd.com/cgroups.html The hybrid cgroup mode is not supported by Slurm and has been removed in more recent kernels. Please configure cgroups v1 or v2 per the above link on the host system to resolve this error. > - it appears the cluster added slurm_pam_adopt but ssh still allows through One of the training labs includes adding the pam_deny to /etc/pam.d/sshd2 to enforce pam_slurm_adopt. This will need to be done manually if you wish to see it enforced. > - I see idle cloud[xxx] nodes?? I did not do a cloud build they are included in the default config to avoid having multiple versions. if not built with cloud mode, then they can be removed from slurm.conf or safely ignored. > - I can't ssh to the 'db' note and check it but can ping it Try calling `make HOST=db bash` instead. The container image is provided the the mysql project and does not include sshd.
(In reply to Nate Rini from comment #1) > > - It seems slurmd failed to start on each node[00-09] > > The relevant log was in comment#0: > > Jan 12 04:14:29 node07 slurmd[121]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 9:devices:/ > > As noted here: > > https://slurm.schedmd.com/cgroups.html > > The hybrid cgroup mode is not supported by Slurm and has been removed in > more recent kernels. Please configure cgroups v1 or v2 per the above link on > the host system to resolve this error. Oh, that's the parent system's cgroups not in the containers... Let me see how I can fix that... > > - it appears the cluster added slurm_pam_adopt but ssh still allows through > > One of the training labs includes adding the pam_deny to /etc/pam.d/sshd2 to > enforce pam_slurm_adopt. This will need to be done manually if you wish to > see it enforced. > > > - I see idle cloud[xxx] nodes?? I did not do a cloud build > > they are included in the default config to avoid having multiple versions. > if not built with cloud mode, then they can be removed from slurm.conf or > safely ignored. > > > - I can't ssh to the 'db' note and check it but can ping it > > Try calling `make HOST=db bash` instead. The container image is provided the > the mysql project and does not include sshd. Alright, thanks.
(In reply to Serguei Mokhov from comment #2) > (In reply to Nate Rini from comment #1) > > > > - It seems slurmd failed to start on each node[00-09] > > > > The relevant log was in comment#0: > > > Jan 12 04:14:29 node07 slurmd[121]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 9:devices:/ > > > > As noted here: > > > https://slurm.schedmd.com/cgroups.html > > > > The hybrid cgroup mode is not supported by Slurm and has been removed in > > more recent kernels. Please configure cgroups v1 or v2 per the above link on > > the host system to resolve this error. > > Oh, that's the parent system's cgroups not in the containers... > Let me see how I can fix that... I don't think the host system has hybrid or v1 cgroups: # cat /proc/cgroups #subsys_name hierarchy num_cgroups enabled cpuset 0 837 1 cpu 0 837 1 cpuacct 0 837 1 blkio 0 837 1 memory 0 837 1 devices 9 53 1 freezer 7 1 1 net_cls 5 1 1 perf_event 6 1 1 net_prio 5 1 1 hugetlb 0 837 1 pids 0 837 1 rdma 0 837 1 misc 0 837 1 # ll /sys/fs/cgroup/ total 0 -r--r--r-- 1 root root 0 Dec 18 19:22 cgroup.controllers -rw-r--r-- 1 root root 0 Jan 12 14:06 cgroup.max.depth -rw-r--r-- 1 root root 0 Jan 12 14:06 cgroup.max.descendants -rw-r--r-- 1 root root 0 Dec 18 19:22 cgroup.procs -r--r--r-- 1 root root 0 Jan 12 14:06 cgroup.stat -rw-r--r-- 1 root root 0 Jan 12 10:25 cgroup.subtree_control -rw-r--r-- 1 root root 0 Jan 12 14:06 cgroup.threads -r--r--r-- 1 root root 0 Jan 8 23:41 cpuset.cpus.effective -r--r--r-- 1 root root 0 Jan 8 23:41 cpuset.mems.effective -r--r--r-- 1 root root 0 Jan 12 14:06 cpu.stat drwxr-xr-x 2 root root 0 Jan 8 23:47 dev-hugepages.mount drwxr-xr-x 2 root root 0 Jan 8 23:47 dev-mqueue.mount drwxr-xr-x 24 root root 0 Jan 12 10:08 docker.slice drwxr-xr-x 2 root root 0 Dec 18 19:22 init.scope -r--r--r-- 1 root root 0 Jan 12 14:06 io.stat drwxr-xr-x 2 root root 0 Jan 8 23:47 machine.slice -r--r--r-- 1 root root 0 Jan 12 14:06 memory.numa_stat --w------- 1 root root 0 Jan 12 14:06 memory.reclaim -r--r--r-- 1 root root 0 Jan 12 14:06 memory.stat -r--r--r-- 1 root root 0 Jan 12 14:06 misc.capacity drwxr-xr-x 2 root root 0 Jan 8 23:47 proc-sys-fs-binfmt_misc.mount drwxr-xr-x 2 root root 0 Jan 8 23:47 sys-fs-fuse-connections.mount drwxr-xr-x 2 root root 0 Jan 8 23:47 sys-kernel-config.mount drwxr-xr-x 2 root root 0 Jan 8 23:47 sys-kernel-debug.mount drwxr-xr-x 2 root root 0 Jan 8 23:47 sys-kernel-tracing.mount drwxr-xr-x 48 root root 0 Jan 12 13:52 system.slice drwxr-xr-x 3 root root 0 Jan 12 11:44 user.slice [root@filth docker-scale-out]# cat /sys/fs/cgroup/cgroup.controllers cpuset cpu io memory hugetlb pids rdma misc [root@filth docker-scale-out]# stat -f /sys/fs/cgroup File: "/sys/fs/cgroup" ID: 0 Namelen: 255 Type: cgroup2fs Block size: 4096 Fundamental block size: 4096 Blocks: Total: 0 Free: 0 Available: 0 Inodes: Total: 0 Free: 0 How does it detect "hybrid"?
(In reply to Serguei Mokhov from comment #3) > How does it detect "hybrid"? Call: > cat /proc/self/cgroup
(In reply to Nate Rini from comment #4) > (In reply to Serguei Mokhov from comment #3) > > How does it detect "hybrid"? > > Call: > > cat /proc/self/cgroup # cat /proc/self/cgroup 9:devices:/ 7:freezer:/ 6:perf_event:/ 5:net_cls,net_prio:/ 1:name=systemd:/ 0::/user.slice/user-11929.slice/session-377.scope What here does tell me it is hybrid?
(In reply to Serguei Mokhov from comment #5) > (In reply to Nate Rini from comment #4) > > (In reply to Serguei Mokhov from comment #3) > > > How does it detect "hybrid"? > > > > Call: > > > cat /proc/self/cgroup > > # cat /proc/self/cgroup > 9:devices:/ > 7:freezer:/ > 6:perf_event:/ > 5:net_cls,net_prio:/ > 1:name=systemd:/ > 0::/user.slice/user-11929.slice/session-377.scope > > What here does tell me it is hybrid? Cgroup v2 looks like this: > srun cat /proc/self/cgroup > 0::/system.slice/slurmstepd.scope/job_1330/step_0/user/task_0 Please try following the suggestions below: (In reply to Ben Glines from bug#18359 comment#8) > We only support legacy mode (cgroup v1) and unified mode (cgroup v2), and > not hybrid setups [3] as you have noticed. > > To disable hybrid mode and only enable cgroup v2 (this is the mode what we > recommend), you'll need to add "cgroup_no_v1=all" to your kernel command > line. Depending on your setup, this can be added to GRUB_CMDLINE_LINUX="" in > /etc/default/grub. Run `sudo update-grab` after making the change, and then > reboot. > > Check the cgroup mount points to ensure that onlyu cgroup v2 is enabled. You > should see something like the following: > > $ mount | grep cgroup > > cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot) > > Let me know if you have any questions about this. > > [1] https://slurm.schedmd.com/slurm.conf.html#OPT_task/affinity > [2] https://slurm.schedmd.com/cgroups.html#task > [3] https://slurm.schedmd.com/cgroups.html#overview
(In reply to Nate Rini from comment #6) > (In reply to Serguei Mokhov from comment #5) > > (In reply to Nate Rini from comment #4) > > > (In reply to Serguei Mokhov from comment #3) > > > > How does it detect "hybrid"? > > > > > > Call: > > > > cat /proc/self/cgroup > > > > # cat /proc/self/cgroup > > 9:devices:/ > > 7:freezer:/ > > 6:perf_event:/ > > 5:net_cls,net_prio:/ > > 1:name=systemd:/ > > 0::/user.slice/user-11929.slice/session-377.scope > > > > What here does tell me it is hybrid? > > Cgroup v2 looks like this: > > srun cat /proc/self/cgroup > > 0::/system.slice/slurmstepd.scope/job_1330/step_0/user/task_0 Well I have 0::/ in mine too :) > Please try following the suggestions below: That's the thing... # mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime) Looks I have nothing else mounted # uname -a Linux filth 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023 x86_64 x86_64 x86_64 GNU/Linux "RHEL 9, by default, mounts and uses cgroups-v2." We did not change that in Alma. And AlmaLinux 9 defaults to using cgroup v2 as well. I'll try GRUB_CMDLINE_LINUX maybe next time I have the opportunity to reboot the server, but the above analysis so far suggests I don't have v1.. > (In reply to Ben Glines from bug#18359 comment#8) > > We only support legacy mode (cgroup v1) and unified mode (cgroup v2), and > > not hybrid setups [3] as you have noticed. > > > > To disable hybrid mode and only enable cgroup v2 (this is the mode what we > > recommend), you'll need to add "cgroup_no_v1=all" to your kernel command > > line. Depending on your setup, this can be added to GRUB_CMDLINE_LINUX="" in > > /etc/default/grub. Run `sudo update-grab` after making the change, and then > > reboot. > > > > Check the cgroup mount points to ensure that onlyu cgroup v2 is enabled. You > > should see something like the following: > > > $ mount | grep cgroup > > > cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot) > > > > Let me know if you have any questions about this. > > > > [1] https://slurm.schedmd.com/slurm.conf.html#OPT_task/affinity > > [2] https://slurm.schedmd.com/cgroups.html#task > > [3] https://slurm.schedmd.com/cgroups.html#overview
So... I've added cgroup_no_v1=all to my GRUB_CMDLINE_LINUX and rebooted. The result is virtually the same: # cat /proc/self/cgroup 14:freezer:/ 9:perf_event:/ 8:net_cls,net_prio:/ 7:devices:/ 1:name=systemd:/ 0::/user.slice/user-11929.slice/session-1.scope # mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime) # make HOST=node01 bash docker compose exec node01 /bin/bash [root@node01 /]# systemctl status slurmd ● slurmd.service - Slurm node daemon Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/slurmd.service.d └─cluster.conf /etc/systemd/system/slurmd.service.d └─local.conf Active: failed (Result: exit-code) since Fri 2024-01-12 19:10:20 EST; 6min ago Process: 121 ExecStart=/usr/local/sbin/slurmd --systemd $SLURMD_OPTIONS (code=exited, status=1/FAILURE) Process: 114 ExecStartPre=/usr/local/bin/slurmd.startup.sh (code=exited, status=0/SUCCESS) Process: 113 ExecStartPre=/usr/bin/chown slurm:slurm /var/log/slurmd.log (code=exited, status=0/SUCCESS) Process: 112 ExecStartPre=/usr/bin/touch /var/log/slurmd.log (code=exited, status=0/SUCCESS) Main PID: 121 (code=exited, status=1/FAILURE) Jan 12 19:10:20 node01 slurmd[121]: slurmd: error: ProctrackType 1 specified more than once, latest value used Jan 12 19:10:20 node01 slurmd[121]: slurmd: debug: Log file re-opened Jan 12 19:10:20 node01 slurmd[121]: slurmd: debug: CPUs:48 Boards:1 Sockets:2 CoresPerSocket:12 ThreadsPerCore:2 Jan 12 19:10:20 node01 slurmd[121]: slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 14:freezer:/ Jan 12 19:10:20 node01 slurmd[121]: 9:perf_event:/ Jan 12 19:10:20 node01 slurmd[121]: 8:net_cls,net_prio:/ Jan 12 19:10:20 node01 slurmd[121]: 7:devices:/ Jan 12 19:10:20 node01 slurmd[121]: 1:name=systemd:/ Jan 12 19:10:20 node01 slurmd[121]: 0::/docker.slice/docker-4b68f66a75745866e007e7e21c61bac490d250d59925f6eb30246abf2dc22e3c.scope/init.scope [root@node01 /]# cat /proc/self/cgroup 14:freezer:/ 9:perf_event:/ 8:net_cls,net_prio:/ 7:devices:/ 1:name=systemd:/ 0::/docker.slice/docker-4b68f66a75745866e007e7e21c61bac490d250d59925f6eb30246abf2dc22e3c.scope/init.scope [root@node01 /]# mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (ro,relatime) cgroup2 on /sys/fs/cgroup/docker.slice type cgroup2 (rw,nosuid,nodev,noexec,relatime) [root@node01 /]#
(In reply to Serguei Mokhov from comment #8) > So... I've added cgroup_no_v1=all to my GRUB_CMDLINE_LINUX and rebooted. > The result is virtually the same: Is this after a reboot?
(In reply to Nate Rini from comment #9) > (In reply to Serguei Mokhov from comment #8) > > So... I've added cgroup_no_v1=all to my GRUB_CMDLINE_LINUX and rebooted. > > The result is virtually the same: > > Is this after a reboot? Please try the suggestions here too: https://slurm.schedmd.com/faq.html#cgroupv2 The output in comment#7 is clearly a hybrid cgroup mount setup.
Any change to try the suggestion in comment#10?
It's been more than a week since comment#10. Please respond when convenient, and the ticket will automatically re-open.