Ticket 12053

Summary: slurmd -C fails in configless mode
Product: Slurm Reporter: Ward Poelmans <ward.poelmans>
Component: ConfigurationAssignee: Tim McMullan <mcmullan>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: mcmullan
Version: 20.11.8   
Hardware: Linux   
OS: Linux   
Site: VUB Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Ward Poelmans 2021-07-16 04:37:45 MDT
We're using the configless mode of slurm.

It works fine but 'slurmd -C' is broken with it:

[root@node300 ~]# time slurmd -C
NodeName=node300 slurmd: error: s_p_parse_file: unable to status file /etc/slurm/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
slurmd: error: ClusterName needs to be specified
slurmd: Considering each NUMA node as a socket
CPUs=40 Boards=1 SocketsPerBoard=4 CoresPerSocket=10 ThreadsPerCore=1 RealMemory=191765
UpTime=18-00:03:19


It hangs for a minute and then finally prints the Nodespec line.


Neither:
slurmd --conf-server slurmctld-server -C
slurmd -f /var/spool/slurm/slurmd/conf-cache/slurm.conf -C

work too.

If I create a symlink from /var/spool/slurm/slurmd/conf-cache/slurm.conf to /etc/slurm/slurm.conf, it all works fine.
Comment 2 Tim McMullan 2021-07-16 08:52:55 MDT
Hi!

This looks a lot like bug11434.  Do you have "Sub NUMA Cluster" or its equivilant bios setting for your hardware enabled on your nodes?  slurmd -C isn't intended to try to read the slurm config, but if that is enabled there was a way it would try to load it without ever figuring out the config situation.

Let me now if this is the case on your nodes
Thanks,
--Tim
Comment 3 Ward Poelmans 2021-07-16 08:55:07 MDT
Yeah, that's exact the same issue. For some reason I didn't find it.

*** This ticket has been marked as a duplicate of ticket 11434 ***