Bug 3753 - slurmd and slurmctld fail to start with TRESBillingWeights
Summary: slurmd and slurmctld fail to start with TRESBillingWeights
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other bugs)
Version: 16.05.8
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-05-01 12:04 MDT by Robert Yelle
Modified: 2017-05-01 12:13 MDT (History)
0 users

See Also:
Site: University of Oregon
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf (5.47 KB, text/plain)
2017-05-01 12:04 MDT, Robert Yelle
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Yelle 2017-05-01 12:04:24 MDT
Created attachment 4456 [details]
slurm.conf

Hello,

We are trying to set up trackable resources according to:

https://slurm.schedmd.com/tres.html

but when I try to start slurmd or slurmctld with the updated slurm.conf file with TRESBillingWeights set up:

TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"

then I get this error message:

[root@n025 etc]# systemctl restart slurmd
Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details.
[root@n025 etc]# systemctl status slurmd
● slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-05-01 10:51:54 PDT; 9s ago
  Process: 110656 ExecStart=/cm/shared/apps/slurm/16.05.8/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 49567 (code=exited, status=0/SUCCESS)

May 01 10:51:54 n025 systemd[1]: Starting Slurm node daemon...
May 01 10:51:54 n025 slurmd[110656]: error: _parse_next_key: Parsing error at unrecognized key: TRESBillingWeights
May 01 10:51:54 n025 slurmd[110656]: error: Parse error in file /etc/slurm/slurm.conf line 80: "TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0""
May 01 10:51:54 n025 slurmd[110656]: fatal: Unable to process configuration file
May 01 10:51:54 n025 systemd[1]: slurmd.service: control process exited, code=exited status=1
May 01 10:51:54 n025 systemd[1]: Failed to start Slurm node daemon.
May 01 10:51:54 n025 systemd[1]: Unit slurmd.service entered failed state.
May 01 10:51:54 n025 systemd[1]: slurmd.service failed.

I have attached my slurm.conf file.  Let me know if you need anything else.

Thanks,

Rob Yelle
Comment 1 Tim Wickberg 2017-05-01 12:10:29 MDT
TRESBillingWeights is a Partition configuration option; it cannot be set on its own directly.

There's a special keyword of DEFAULT that can be used to set default partition options. Assuming you want this to apply to all your partitions, you could add a line to slurm.conf like:

PartitionName=DEFAULT TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"
Comment 2 Robert Yelle 2017-05-01 12:11:59 MDT
Thanks Tim!

Rob

On May 1, 2017, at 11:10 AM, bugs@schedmd.com<mailto:bugs@schedmd.com> wrote:

Tim Wickberg<mailto:tim@schedmd.com> changed bug 3753<https://bugs.schedmd.com/show_bug.cgi?id=3753>
What    Removed Added
Assignee        support@schedmd.com<mailto:support@schedmd.com>         tim@schedmd.com<mailto:tim@schedmd.com>

Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=3753#c1> on bug 3753<https://bugs.schedmd.com/show_bug.cgi?id=3753> from Tim Wickberg<mailto:tim@schedmd.com>

TRESBillingWeights is a Partition configuration option; it cannot be set on its
own directly.

There's a special keyword of DEFAULT that can be used to set default partition
options. Assuming you want this to apply to all your partitions, you could add
a line to slurm.conf like:

PartitionName=DEFAULT TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 3 Tim Wickberg 2017-05-01 12:13:14 MDT
Certainly. Please reopen if there's anything else I can answer; I'm marking this closed for now.

- Tim