Created attachment 14683 [details] slurm.conf Today we upgraded the controller node from 19.05 to 20.02.3, and immediately all Slurm commands (on the controller node) give error messages: # sinfo --version sinfo: error: NodeNames=a[001-140] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=d[001-019] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=d[021-022] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=d[023-068] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=g[001-021],g[024-078] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=g[079-084],g[089-110] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=g[085-088] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=h[001-002] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=i[004-030] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=i[031-050] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=x[001-168],x[181-192] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=x[169-180] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=c[001-196] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. sinfo: error: NodeNames=b[001-012] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. slurm 20.02.3 In slurm.conf we have defined NodeName with Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1. According to the slurm.conf manual the CPUs should then be calculated automatically: "If CPUs is omitted, its default will be set equal to the product of Boards, Sockets, CoresPerSocket, and ThreadsPerCore." and: "Boards and CPUs are mutually exclusive." The error message "CPUs=1 match no Sockets" would seem to be a bug. It may be the same issue as in bug 9233. Can you please help with a workaround? Thanks, Ole
When I change slurm.conf lines: NodeName=a[001-140] Weight=10001 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 ThreadsPerCore=1 ... into NodeName=a[001-140] Weight=10001 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 ... then the errors do not show up! It appears that Boards=1 SocketsPerBoard=2 is not supported correctly in 20.02.3, and that one must in stead use Sockets=2. Question: Should Boards and SocketsPerBoard be working correctly, or should these parameters be deprecated? Thanks, Ole
Thanks Ole, as you have guessed this is a duplicate of 9233. I am investigating the regression right now. I am marking this but as duplicate, so please, follow the other one from now on. Thank you for your help. *** This bug has been marked as a duplicate of bug 9233 ***