Ticket 12053 - slurmd -C fails in configless mode
Summary: slurmd -C fails in configless mode
Status: RESOLVED DUPLICATE of ticket 11434
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 20.11.8
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Tim McMullan
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-07-16 04:37 MDT by Ward Poelmans
Modified: 2021-07-16 08:55 MDT (History)
1 user (show)

See Also:
Site: VUB
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ward Poelmans 2021-07-16 04:37:45 MDT
We're using the configless mode of slurm.

It works fine but 'slurmd -C' is broken with it:

[root@node300 ~]# time slurmd -C
NodeName=node300 slurmd: error: s_p_parse_file: unable to status file /etc/slurm/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
slurmd: error: ClusterName needs to be specified
slurmd: Considering each NUMA node as a socket
CPUs=40 Boards=1 SocketsPerBoard=4 CoresPerSocket=10 ThreadsPerCore=1 RealMemory=191765
UpTime=18-00:03:19


It hangs for a minute and then finally prints the Nodespec line.


Neither:
slurmd --conf-server slurmctld-server -C
slurmd -f /var/spool/slurm/slurmd/conf-cache/slurm.conf -C

work too.

If I create a symlink from /var/spool/slurm/slurmd/conf-cache/slurm.conf to /etc/slurm/slurm.conf, it all works fine.
Comment 2 Tim McMullan 2021-07-16 08:52:55 MDT
Hi!

This looks a lot like bug11434.  Do you have "Sub NUMA Cluster" or its equivilant bios setting for your hardware enabled on your nodes?  slurmd -C isn't intended to try to read the slurm config, but if that is enabled there was a way it would try to load it without ever figuring out the config situation.

Let me now if this is the case on your nodes
Thanks,
--Tim
Comment 3 Ward Poelmans 2021-07-16 08:55:07 MDT
Yeah, that's exact the same issue. For some reason I didn't find it.

*** This ticket has been marked as a duplicate of ticket 11434 ***