Bug 4245 - grptres limits for nodes/cores
Summary: grptres limits for nodes/cores
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Limits (show other bugs)
Version: 16.05.10
Hardware: Linux Linux
: --- 5 - Enhancement
Assignee: Unassigned Developer
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-10-10 13:22 MDT by Bill Abbott
Modified: 2017-12-20 18:59 MST (History)
3 users (show)

See Also:
Site: Rutgers
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: future
DevPrio: 4 - Medium
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Abbott 2017-10-10 13:22:40 MDT
We want to implement multiple partitions that contain the same pool of nodes, and limit each partition to some subset of the nodes.  The original idea was to give each partition its own QOS (but all the nodes) and set the limit in that QOS using GrpTRES=nodes=somenumber.

The problem is that a partition that contains 3 nodes but 84 cores can only support 3 single-core jobs.  Per

https://slurm.schedmd.com/sacctmgr.html:

"NOTE: On GrpTRES limits dealing with nodes as a TRES. Each job's node allocation is counted separately (i.e. if a single node has resources allocated to two jobs, this is counted as two allocated nodes)."

Our current workaround is to give each partition entirely separate nodes, which limits our flexibility.

Is this fixed or planned to be fixed in a future version?

Thanks.
Comment 1 Dominik Bartkiewicz 2017-10-11 00:56:28 MDT
Hi

Why you don't want to base this on cpus?
GrpTRES=cpus=somenumber

Dominik
Comment 2 Bill Abbott 2017-10-11 06:18:57 MDT
We guarantee owners immediate access to purchased nodes, and limiting by cpu would allow one owner to spread single-core jobs across all nodes.  That would potentially block other owners for up to max walltime.
Comment 5 Dominik Bartkiewicz 2017-10-16 06:09:05 MDT
Hi

I am afraid we are not planing to change this GrpTRES=nodes behavior.
If you have homogeneous nodes, usage of GrpTRES=cpus should be good choice.
Scenario from comment 2 is generally not effective and shouldn't be used.

Dominik
Comment 7 Dominik Bartkiewicz 2017-10-17 06:11:43 MDT
Hi

Could you give me more info?
I need numbers of nodes, how many nodes do you want to give to each group
Are your nodes homogeneous?
What kind of benefit do you expect from using "floating partitions"?

Dominik
Comment 8 Bill Abbott 2017-10-23 14:10:44 MDT
Sure.  The nodes in this case are homogeneous and there are 52 of them.  Most owners will have 1-5 nodes in their partition.  We want to use floating partitions so we can bring nodes in and out of service without impacting users.

The behavior right now is that an owner with 3 nodes (28 cores each) who runs 10 single-core jobs would end up with 3 single-core jobs running, 7 single-core job in the queue and 81 cores sitting idle.

What we'd like instead is that those 10 single-core jobs get packed into those three nodes, no jobs end up in the queue, 74 cores still available.
Comment 9 Tim Wickberg 2017-10-27 14:22:09 MDT
(In reply to Bill Abbott from comment #8)
> Sure.  The nodes in this case are homogeneous and there are 52 of them. 
> Most owners will have 1-5 nodes in their partition.  We want to use floating
> partitions so we can bring nodes in and out of service without impacting
> users.
> 
> The behavior right now is that an owner with 3 nodes (28 cores each) who
> runs 10 single-core jobs would end up with 3 single-core jobs running, 7
> single-core job in the queue and 81 cores sitting idle.
> 
> What we'd like instead is that those 10 single-core jobs get packed into
> those three nodes, no jobs end up in the queue, 74 cores still available.

Are the core counts the same across the nodes?

If they are, limiting the TRES based on cpu count, and leaving the node count off, would be the simplest strategy at present.

One other option, which I'm not sure if you've looked at, would be to use a PartitionQOS with MaxNodes=3. That should limit the Partition to using three nodes at a time.

You do need to be careful how this interacts with normal QOS's though. Are you able to attach your current slurm.conf, and output from 'scontrol show assoc' ?

thanks,
- Tim
Comment 10 Bill Abbott 2017-10-30 13:50:50 MDT
The core counts are the same across this pool of nodes, but I don't see how that addresses the problem.  If user 1 has 5 nodes but uses 6, then user 2 gets blocked from their node.  That violates our SLA.  How can we limit tres by cpu but ensure they all go to the same 5 nodes?

The partition option of MaxNodes looked perfect, but that apparently means "Max nodes per job", not "Max nodes per partition".  We want an arbitrary number of jobs that can't spread past 5 nodes in any case.

I'll paste the slurm.conf as a different comment, but here are the relevant lines from a single user:

NodeName=slepner[001-048] Weight=2 Feature=sandybridge,fdr RealMemory=128903 CoresPerSocket=8

PartitionName=DEFAULT PriorityTier=10 DefaultTime=2:00 MaxTime=3-0 State=UP AllocNodes=fen[1-2],amarel[1-2],helix,clcwb01

PartitionName=sas Nodes=slepner[001-002,048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas

# sacctmgr show qos sas
      Name   Priority  GraceTime    Preempt PreemptMode                                    Flags UsageThres UsageFactor       GrpTRES   GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES MaxTRESPerNode   MaxTRESMins     MaxWall     MaxTRESPU MaxJobsPU MaxSubmitPU     MaxTRESPA MaxJobsPA MaxSubmitPA       MinTRES 
---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- ------------- 
       sas          0   00:00:00                cluster                                                        1.000000        cpu=32




So what we really want is

PartitionName=sas Nodes=slepner[001-048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas

and

grptres=nodes=2
Comment 11 Bill Abbott 2017-10-30 13:53:16 MDT
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=saul
ControlAddr=saul
#BackupController=
#BackupAddr=
#
AuthType=auth/munge
CacheGroups=0
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
DisableRootJobs=YES
#EnforcePartLimits=NO
Epilog=/etc/slurm/slurm.epilog.clean
#PrologSlurmctld=
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
JobCheckpointDir=/var/lib/slurm/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
JobRequeue=0
#JobSubmitPlugins=1
#KillOnBadExit=0
#Licenses=foo*4,bar
#MailProg=/usr/bin/mail
#MaxJobCount=5000
#MaxStepCount=40000
#MaxTasksPerNode=128
#MpiDefault=none
MpiDefault=none
#MpiParams=ports=#-#
MpiParams=ports=12000-12999
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
#ProctrackType=proctrack/pgid
#ProctrackType=proctrack/linuxproc
ProctrackType=proctrack/cgroup
#Prolog=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
PropagateResourceLimitsExcept=MEMLOCK
RebootProgram=/sbin/reboot
ReturnToService=0
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm/slurmd
SlurmUser=slurm
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/tmp/slurmctld
SwitchType=switch/none
#TaskEpilog=
#TaskPlugin=task/none
TaskPlugin=task/cgroup
#TaskPluginParam=
#TaskProlog=
TopologyPlugin=topology/tree
#TmpFs=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
UsePAM=1
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
HealthCheckInterval=300
HealthCheckProgram=/usr/sbin/nhc
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
DefMemPerCPU=4096
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SchedulerPort=7321
#SelectType=select/linear
SelectType=select/cons_res
#SelectTypeParameters=
SelectTypeParameters=CR_CPU_Memory
#SelectTypeParameters=CR_CPU
#
#
# JOB PRIORITY
#PriorityType=priority/basic
PriorityType=priority/multifactor
PriorityDecayHalfLife=21-0
#PriorityCalcPeriod=
PriorityFavorSmall=NO
#PriorityMaxAge=
#PriorityUsageResetPeriod=
PriorityWeightAge=1000
PriorityWeightFairshare=8000
PriorityWeightJobSize=4000
PriorityWeightPartition=5000
PriorityWeightTRES=GRES/gpu=7000,GRES/mic=7000
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
AccountingStorageEnforce=associations,limits,qos
AccountingStorageHost=squid
AccountingStorageLoc=/var/log/slurm/jobacctstor
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/slurmdbd
#AccountingStorageUser=
AccountingStorageTRES=gres/gpu,gres/mic
AccountingStoreJobComment=YES
ClusterName=amarel
#DebugFlags=
#DebugFlags=Gres
#JobCompHost=
JobCompLoc=/var/log/slurm/jobcomp.log
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/filetxt
#JobCompUser=
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
###
#JobAcctGatherType=jobacct_gather/cgroup
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmSchedLogFile=/var/log/slurm/slurmsched.log
SlurmSchedLogLevel=1
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
# Preemption
PreemptMode=REQUEUE
#PreemptMode=CANCEL
PreemptType=preempt/partition_prio
#SchedulerParameters=preempt_youngest_first
#
# GPU Nodes
GresTypes=gpu,mic

Nodename=DEFAULT Sockets=2 ThreadsPerCore=1 State=UNKNOWN
# LOGIN NODES
#Nodename=amarel[1-2] RealMemory=128000 CoresPerSocket=14 ThreadsPerCore=2
#Nodename=fen[1-2] RealMemory=128000 CoresPerSocket=8 ThreadsPerCore=2
# COMPUTE NODES
NodeName=slepner[001-048] Weight=2 Feature=sandybridge,fdr RealMemory=128903 CoresPerSocket=8
NodeName=slepner[054-058] Weight=4 Feature=ivybridge,fdr RealMemory=128903 CoresPerSocket=10
NodeName=slepner[059-084] Weight=6 Feature=haswell,fdr RealMemory=128817 CoresPerSocket=12
NodeName=slepner[085-088] Weight=8 Feature=broadwell,fdr RealMemory=128817 CoresPerSocket=14
NodeName=gpu[001-003] Weight=10 Feature=sandybrige,fdr,tesla RealMemory=64391 CoresPerSocket=6 Gres=gpu:8
NodeName=gpu[004] Weight=10 Feature=sandybridge,fdr,xeonphi RealMemory=64391 CoresPerSocket=6 Gres=mic:8
NodeName=gpu[005-006] Weight=10 Feature=broadwell,fdr,maxwell RealMemory=128839 CoresPerSocket=14 Gres=gpu:4
NodeName=hal[0001-0032,0053-0072] Weight=12 Feature=broadwell,edr RealMemory=128190 CoresPerSocket=14
NodeName=hal[0033-0052] Weight=14 Feature=broadwell,edr RealMemory=257214 CoresPerSocket=14
NodeName=pascal[001-004] Weight=16 Feature=broadwell,edr,pascal RealMemory=128190 CoresPerSocket=14 Gres=gpu:2
NodeName=mem[001-002] Weight=18 Feature=broadwell,edr RealMemory=1500000 Sockets=4 CoresPerSocket=12
#NodeName=slepnert001 CPUS=4 RealMemory=2001 CoresPerSocket=2 ThreadsPerCore=1 State=UNKNOWN
#
PartitionName=DEFAULT PriorityTier=10 DefaultTime=2:00 MaxTime=3-0 State=UP AllocNodes=fen[1-2],amarel[1-2],helix,clcwb01
PartitionName=main Nodes=ALL QOS=main Default=YES
PartitionName=bg Nodes=ALL PriorityTier=1 AllowGroups=hpctech QOS=bg
PartitionName=oarc Nodes=ALL PriorityTier=40 AllowGroups=oarc QOS=bg
PartitionName=gpu Nodes=pascal[001-004],gpu[001-006] PriorityTier=20 QOS=gpu
PartitionName=mem Nodes=mem[001-002] PriorityTier=20
#PartitionName=admin Nodes=slepnert001 DefaultTime=30 MaxTime=120 AllowAccounts=oirt


# owner partitions
# hal0001-0032 & hal0053-72 are 128 gig memory nodes
# hal0033-52 are 256 gig memory nodes
PartitionName=sas Nodes=slepner[001-002,048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas
PartitionName=mp1009_1 Nodes=slepner[003-009,048] PriorityTier=40 AllowGroups=mp1009_1 MaxTime=14-0 QOS=mp1009_1
#PartitionName=sdk94_1 Nodes=slepner[010-012,048] PriorityTier=40 AllowGroups=sdk94_1 MaxTime=14-0 QOS=sdk94_1
#PartitionName=ab1337_2 Nodes=slepner[013-027,048] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1
#PartitionName=rk509_1 Nodes=slepner[028-030,048] PriorityTier=40 AllowGroups=rk509_1 MaxTime=14-0 QOS=rk509_1
#PartitionName=ll502_1 Nodes=slepner[031,048] PriorityTier=40 AllowGroups=ll502_1 MaxTime=14-0 QOS=ll502_1
#PartitionName=rs1032_1 Nodes=slepner[032-042,048] PriorityTier=40 AllowGroups=rs1032_1 MaxTime=14-0 QOS=rs1032_1
#PartitionName=cs_1 Nodes=slepner[043,048] PriorityTier=40 AllowGroups=cs_1 MaxTime=14-0 QOS=cs_1
#PartitionName=ccb_2 Nodes=slepner[044,048] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1
#PartitionName=waksman_1 Nodes=slepner[045,048] PriorityTier=40 AllowGroups=waksman_1 MaxTime=14-0 QOS=waksman_1
PartitionName=tongz_1 Nodes=slepner[046,048] PriorityTier=40 AllowGroups=tongz_1 MaxTime=14-0 QOS=tongz_1
#PartitionName=ccib_1 Nodes=slepner[054,058] PriorityTier=40 AllowGroups=ccib_1 MaxTime=14-0 QOS=ccib_1
#PartitionName=ccb_3 Nodes=slepner[055-056,058] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1
#PartitionName=pl314_1 Nodes=slepner[059,084] PriorityTier=40 AllowGroups=pl314_1 MaxTime=14-0 QOS=pl314_1
#PartitionName=ccb_4 Nodes=slepner[060-061,084] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1
#PartitionName=mischaik_1 Nodes=slepner[062-074,084] PriorityTier=40 AllowGroups=mischaik_1 MaxTime=14-0 QOS=mischaik_1
#PartitionName=ab1337_2 Nodes=slepner[075-077,084] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1
#PartitionName=cqb_1 Nodes=slepner[078-079,084] PriorityTier=40 AllowGroups=cqb_1 MaxTime=14-0 QOS=cqb_1
#PartitionName=sci_1 Nodes=slepner[080,084] PriorityTier=40 AllowGroups=sci_1 MaxTime=14-0 QOS=sci_1
PartitionName=tongz_2 Nodes=gpu[005-006] PriorityTier=40 AllowGroups=tongz_1 MaxTime=14-0 QOS=tongz_2
#
PartitionName=luwang_1 Nodes=hal[0001-0009,0072] PriorityTier=40 AllowGroups=luwang_1 MaxTime=14-0 QOS=luwang_1
PartitionName=miller_1 Nodes=hal[0010,0072] PriorityTier=40 AllowGroups=miller_1 MaxTime=14-0 QOS=miller_1
PartitionName=bromberg_1 Nodes=hal[0011-0022,0072] PriorityTier=40 AllowGroups=bromberg_1 MaxTime=14-0 QOS=bromberg_1
PartitionName=mitrofanova_1 Nodes=hal[0023-0024,0072] PriorityTier=40 AllowGroups=mitrofanova MaxTime=14-0 QOS=mitrofanova_1
PartitionName=kopp_1 Nodes=hal[0025-0029,0072] PriorityTier=40 AllowGroups=kopp MaxTime=14-0 QOS=kopp_1
PartitionName=ccb_1 Nodes=hal[0030-0032,0072] PriorityTier=40 AllowGroups=ccb MaxTime=14-0 QOS=ccb_1
PartitionName=brzustowicz_1 Nodes=hal[0033-0036,0052] PriorityTier=40 AllowGroups=brzustowicz_1 MaxTime=14-0 QOS=brzustowicz_1
PartitionName=matise_1 Nodes=hal[0037,0052] PriorityTier=40 AllowGroups=matise_1 MaxTime=14-0 QOS=matise_1
PartitionName=xing_1 Nodes=hal[0038-0039,0052] PriorityTier=40 AllowGroups=xing_1 MaxTime=14-0 QOS=xing_1
PartitionName=ellison_1 Nodes=hal[0040,0052] PriorityTier=40 AllowGroups=ellison_1 MaxTime=14-0 QOS=ellison_1
PartitionName=hginj_1 Nodes=hal[0041-0044,0052] PriorityTier=40 AllowGroups=hginj_1 MaxTime=14-0 QOS=hginj_1
PartitionName=cgu_1 Nodes=hal[0045-0050,0052] PriorityTier=40 AllowGroups=cgu_1 MaxTime=14-0 QOS=cgu_1
PartitionName=genetics_1 Nodes=hal[0033-0050,0052] PriorityTier=30 AllowGroups=brzustowicz_1,matise_1,ellison_1,genetics_1,hginj_1,cgu_1 MaxTime=14-0 QOS=genetics_1
PartitionName=rshiroko_1 Nodes=hal[0051,0052] PriorityTier=40 AllowGroups=rshiroko_1 MaxTime=14-0 QOS=rshiroko_1
PartitionName=ab1337_1 Nodes=slepner[003,008] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1
PartitionName=jdb252_1 Nodes=hal[0053,0072] PriorityTier=40 AllowGroups=jdb252_1 MaxTime=14-0 QOS=jdb252_1
PartitionName=ecastner_1 Nodes=hal[0054,0072] PriorityTier=40 AllowGroups=ecastner_1 MaxTime=14-0 QOS=ecastner_1
PartitionName=alangold Nodes=hal[0055,0072] PriorityTier=40 AllowGroups=alangold_1 MaxTime=14-0 QOS=alangold_1
PartitionName=njms_genomics_1 Nodes=hal[0056-0058,0072] PriorityTier=40 AllowGroups=njms_genomics_1 MaxTime=14-0 QOS=njms_genomics_1
PartitionName=dmcs_1    Nodes=hal[0059-0060,0072] PriorityTier=30 AllowGroups=dmcs_1    MaxTime=14-0 QOS=dmcs_1     # dmcs_1 and jbrodie_x overlap, dif prio, dif group
PartitionName=jbrodie_1 Nodes=hal[0059-0060,0072] PriorityTier=40 AllowGroups=jbrodie_1 MaxTime=14-0 QOS=jbrodie_1  # jbrodie_x overlap, priority different 
PartitionName=jbrodie_2 Nodes=hal[0059-0060,0072] PriorityTier=35 AllowGroups=jbrodie_1 MaxTime=14-0 QOS=jbrodie_1  # jbrodie_x overlap, group same
PartitionName=jeehiun_1 Nodes=hal[0052,0072] PriorityTier=40 AllowGroups=jeehiun_1 MaxTime=14-0 QOS=jeehiun_1 # 256=0052,0052:128=0059-0060,0072 
PartitionName=es901_1 Nodes=hal[0061,0072] PriorityTier=40 AllowGroups=es901_1 MaxTime=14-0 QOS=es901_1 # 256=0061,0052:128=0052,0072
PartitionName=jn511_1 Nodes=hal[0053-0056,0072] PriorityTier=40 AllowGroups=jn511_1 MaxTime=14-0 QOS=jn511_1 # 256=0053-0056,0052:128=0061,0072
Comment 12 Bill Abbott 2017-10-30 13:56:53 MDT
# sacctmgr show assoc where cluster=amarel|egrep -v "(general|workshop)"
   Cluster    Account       User  Partition     Share GrpJobs       GrpTRES GrpSubmit     GrpWall   GrpTRESMins MaxJobs       MaxTRES MaxTRESPerNode MaxSubmit     MaxWall   MaxTRESMins                  QOS   Def QOS GrpTRESRunMin 
---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- 
    amarel       root                               1                                                                                                                                                  normal                         
    amarel       root       root                    1                                                                                                                                                  normal                         
    amarel     ab1337                               1                                                                                                                                                  normal                         
    amarel     ab1337     ab1337                    1                                                                                                                                                  normal                         
    amarel     ab1337     acw103                    1                                                                                                                                                  normal                         
    amarel     ab1337     rss230                    1                                                                                                                                                  normal                         
    amarel   alangold                               1                                                                                                                                                  normal                         
    amarel   alangold   alangold                    1                                                                                                                                                  normal                         
    amarel   alangold      fh204                    1                                                                                                                                                  normal                         
    amarel   alangold   kroghjes                    1                                                                                                                                                  normal                         
    amarel   alangold     sm1792                    1                                                                                                                                                  normal                         
    amarel     amarel                               1                                                                                                                                                  normal                         
    amarel   amitrofa                               1                                                                                                                                                  normal                         
    amarel   amitrofa     am2051                    1                                                                                                                                                  normal                         
    amarel   amitrofa   amitrofa                    1                                                                                                                                                  normal                         
    amarel   amitrofa      kd566                    1                                                                                                                                                  normal                         
    amarel   amitrofa      mcu17                    1                                                                                                                                                  normal                         
    amarel   amitrofa      nje17                    1                                                                                                                                                  normal                         
    amarel   amitrofa     sh1019                    1                                                                                                                                                  normal                         
    amarel   amitrofa     sp1388                    1                                                                                                                                                  normal                         
    amarel   amitrofa      zsb11                    1                                                                                                                                                  normal                         
    amarel  brannigan                               1                                                                                                                                                  normal                         
    amarel  brannigan     sm1249                    1                                                                                                                                                  normal                         
    amarel        ccb                               1                                                                                                                                                  normal                         
    amarel        ccb      an567                    1                                                                                                                                                  normal                         
    amarel        ccb      by122                    1                                                                                                                                                  normal                         
    amarel        ccb     jds375                    1                                                                                                                                                  normal                         
    amarel        ccb    jeehiun                    1                                                                                                                                                  normal                         
    amarel        ccb      jx112                    1                                                                                                                                                  normal                         
    amarel        ccb   kroghjes                    1                                                                                                                                                  normal                         
    amarel        ccb      mse48                    1                                                                                                                                                  normal                         
    amarel        ccb      mz325                    1                                                                                                                                                  normal                         
    amarel        ccb      nw187                    1                                                                                                                                                  normal                         
    amarel      cee53                               1                                                                                                                                                  normal                         
    amarel      cee53      cee53                    1                                                                                                                                                  normal                         
    amarel      cee53      cpr74                    1                                                                                                                                                  normal                         
    amarel        cqb                               1                                                                                                                                                  normal                         
    amarel        cqb     evgeni                    1                                                                                                                                                  normal                         
    amarel dmcs_rtwrf                               1                                                                                                                                                  normal                         
    amarel   ecastner                               1                                                                                                                                                  normal                         
    amarel   ecastner      bw194                    1                                                                                                                                                  normal                         
    amarel   ecastner   ecastner                    1                                                                                                                                                  normal                         
    amarel   ecastner     jds375                    1                                                                                                                                                  normal                         
    amarel   ecastner      mse48                    1                                                                                                                                                  normal                         
    amarel   ecastner      mz325                    1                                                                                                                                                  normal                         
    amarel      es901                               1                                                                                                                                                  normal                         
    amarel      es901      es901                    1                                                                                                                                                  normal                         
    amarel   genetics                               1                                                                                                                                                  normal                         
    amarel   genetics      ak917                    1                                                                                                                                                  normal                         
    amarel   genetics      azaro                    1                                                                                                                                                  normal                         
    amarel   genetics      cpr74                    1                                                                                                                                                  normal                         
    amarel   genetics     kcchen                    1                                                                                                                                                  normal                         
    amarel   genetics      vm379                    1                                                                                                                                                  normal                         
    amarel    hberman                               1                                                                                                                                                  normal                         
    amarel   jaytisch                               1                                                                                                                                                  normal                         
    amarel    jbrodie                               1                                                                                                                                                  normal                         
    amarel    jbrodie   belmonte                    1                                                                                                                                                  normal                         
    amarel    jbrodie   ccamastr                    1                                                                                                                                                  normal                         
    amarel    jbrodie    jbrodie                    1                                                                                                                                                  normal                         
    amarel    jbrodie     rjdave                    1                                                                                                                                                  normal                         
    amarel    jbrodie    tnmiles                    1                                                                                                                                                  normal                         
    amarel     jdb252                               1                                                                                                                                                  normal                         
    amarel     jdb252     jdb252                    1                                                                                                                                                  normal                         
    amarel     jdb252     sm1249                    1                                                                                                                                                  normal                         
    amarel     jdb252      tj227                    1                                                                                                                                                  normal                         
    amarel    jeehiun                               1                                                                                                                                                  normal                         
    amarel    jeehiun     aek119                    1                                                                                                                                                  normal                         
    amarel    jeehiun    jeehiun                    1                                                                                                                                                  normal                         
    amarel    jeehiun      jx112                    1                                                                                                                                                  normal                         
    amarel    jeehiun      ksg80                    1                                                                                                                                                  normal                         
    amarel    jeehiun     linphi                    1                                                                                                                                                  normal                         
    amarel    jeehiun      nw187                    1                                                                                                                                                  normal                         
    amarel    jeehiun       yn81                    1                                                                                                                                                  normal                         
    amarel      jn511                               1                                                                                                                                                  normal                         
    amarel      jn511      jn511                    1                                                                                                                                                  normal                         
    amarel       jx76                               1                                                                                                                                                  normal                         
    amarel       jx76       jx76                    1                                                                                                                                                  normal                         
    amarel       lbrz                               1                                                                                                                                                  normal                         
    amarel       lbrz      azaro                    1                                                                                                                                                  normal                         
    amarel       lbrz      vm379                    1                                                                                                                                                  normal                         
    amarel      lw506                               1                                                                                                                                                  normal                         
    amarel      lw506      gd342                    1                                                                                                                                                  normal                         
    amarel      lw506     jd1308                    1                                                                                                                                                  normal                         
    amarel      lw506      lw506                    1                                                                                                                                                  normal                         
    amarel      lw506      sz398                    1                                                                                                                                                  normal                         
    amarel      lw506      yj231                    1                                                                                                                                                  normal                         
    amarel      lw506      yw594                    1                                                                                                                                                  normal                         
    amarel     matise                               1                                                                                                                                                  normal                         
    amarel     matise     matise                    1                                                                                                                                                  normal                         
    amarel njms_geno+                               1                                                                                                                                                  normal                         
    amarel njms_geno+      clcgs                    1                                                                                                                                                  normal                         
    amarel njms_geno+       dupe                    1                                                                                                                                                  normal                         
    amarel njms_geno+   ghannysa                    1                                                                                                                                                  normal                         
    amarel njms_geno+   husainse                    1                                                                                                                                                  normal                         
    amarel njms_geno+     kevina                    1                                                                                                                                                  normal                         
    amarel njms_geno+   soteropa                    1                                                                                                                                                  normal                         
    amarel njms_geno+      yc759                    1                                                                                                                                                  normal                         
    amarel       oarc                               1                                                                                                                                                  normal                         
    amarel       oarc    babbott                    1                                                                                                                                           bg,normal,sas                         
    amarel       oarc       dupe                    1                                                                                                                                                  normal                         
    amarel       oarc   ericmars                    1                                                                                                                                                  normal                         
    amarel       oarc      gc563                    1                                                                                                                                                  normal                         
    amarel       oarc       jbv9                    1                                                                                                                                                  normal                         
    amarel       oarc     jpc303                    1                                                                                                                                                  normal                         
    amarel       oarc     kevina                    1                                                                                                                                                  normal                         
    amarel       oarc   kholodvl                    1                                                                                                                                                  normal                         
    amarel       oarc   michelso                    1                                                                                                                                                  normal                         
    amarel       oarc   novosirj                    1                                                                                                                                                  normal                         
    amarel       oarc      ts840                    1                                                                                                                                                  normal                         
    amarel       oarc      yc759                    1                                                                                                                                                  normal                         
    amarel       oirt                               1                                                                                                                                                  normal                         
    amarel       oirt   ericmars                    1                                                                                                                                                  normal                         
    amarel       oirt     kevina                    1                                                                                                                                                  normal                         
    amarel       oirt      pl427                    1                                                                                                                                                  normal                         
    amarel      rk509                               1                                                                                                                                                  normal                         
    amarel      rk509      ea289                    1                                                                                                                                                  normal                         
    amarel     rs1032                               1                                                                                                                                                  normal                         
    amarel     rs1032      ec675                    1                                                                                                                                                  normal                         
    amarel     rs1032     rs1032                    1                                                                                                                                                  normal                         
    amarel     rs1032     sss274                    1                                                                                                                                                  normal                         
    amarel   rshiroko                               1                                                                                                                                                  normal                         
    amarel   rshiroko     ak1511                    1                                                                                                                                                  normal                         
    amarel   rshiroko   rshiroko                    1                                                                                                                                                  normal                         
    amarel        sas                               1                                                                                                                                                  normal                         
    amarel        sas    babbott                    1                                                                                                                                                  normal                         
    amarel    smiller                               1                                                                                                                                                  normal                         
    amarel    smiller   sdmiller                    1                                                                                                                                                  normal                         
    amarel        soe                               1                                                                                                                                                  normal                         
    amarel      tongz                               1                                                                                                                                                  normal                         
    amarel      tongz      rj254                    1                                                                                                                                                  normal                         
    amarel      yanab                               1                                                                                                                                                  normal                         
    amarel      yanab     am2260                    1                                                                                                                                                  normal                         
    amarel      yanab     ap1397                    1                                                                                                                                                  normal                         
    amarel      yanab   chengzhu                    1                                                                                                                                                  normal                         
    amarel      yanab     cmm591                    1                                                                                                                                                  normal                         
    amarel      yanab     nal115                    1                                                                                                                                                  normal                         
    amarel      yanab      yanab                    1                                                                                                                                                  normal                         
    amarel      yanab      ym277                    1                                                                                                                                                  normal                         
    amarel      yanab      yw410                    1                                                                                                                                                  normal                         
    amarel      yanab      zz109                    1                                                                                                                                                  normal                         
    amarel       york                               1                                                                                                                                                  normal                         
    amarel       york     ag1508                    1                                                                                                                                                  normal                         
    amarel       york   giambasu                    1                                                                                                                                                  normal
Comment 14 Felip Moll 2017-12-05 11:56:25 MST
I think we understand your problem: you want to implement the idea of having a pool of nodes, and then assign a maximum usage of X nodes to each of your groups. Then, when a user of a particular group starts using a node, you just want to make this node exclusive for their group. Finally, when a job finishes, exclusivity is ended and the node becomes available again in the pool.

This feature is currently not in Slurm and I see the point in your first comment about GrpTRES=nodes=X.

By now you have this options:

- Give each partition separate nodes as you are doing.
- Create reservations for each group, its a bit more flexible than modifying slurm.conf each time.
- Change the concept of 'owning nodes' to 'owning cores', so you will end up with the solution proposed by Dominik and Tim of using GrpTRES=cpus.


Which is the specific/technical reason that makes you not considering this third option of dealing with a pool of cores instead of a pool of nodes?

If this options are not a solution for you I fear that this ticket should be marked as an Enhancement.
Comment 15 Bill Abbott 2017-12-07 13:36:39 MST
Felix,

You have the situation correct; that is what we want to do.  The reason is that a PI who goes out and buys their own small cluster will get guaranteed immediate access to the nodes at any time, and that will outweigh all of the excellent reasons why they shouldn't do that.

By giving them 10 nodes that they can immediately access (via preemption, without potentially waiting for max walltime), we neutralize that argument.

How would the reservations work exactly?  I assume the reservation would specify the exact 5 nodes rather than the whole pool, but how would general access users use those nodes when idle?

Enhancement seems like the right Importance.

Thanks,

Bill
Comment 16 Felip Moll 2017-12-08 02:37:54 MST
(In reply to Bill Abbott from comment #15)
> Felix,
> 
> You have the situation correct; that is what we want to do.  The reason is
> that a PI who goes out and buys their own small cluster will get guaranteed
> immediate access to the nodes at any time, and that will outweigh all of the
> excellent reasons why they shouldn't do that.
> 
> By giving them 10 nodes that they can immediately access (via preemption,
> without potentially waiting for max walltime), we neutralize that argument.
> 

I suppose then that not giving PI 'N exclusive cores' and rather giving '10  exclusive nodes' is more of an aesthetic reason, but at some point understandable.


> How would the reservations work exactly?  I assume the reservation would
> specify the exact 5 nodes rather than the whole pool, but how would general
> access users use those nodes when idle?

Well, it does note provide anything different to the first option and in fact I see now that you are overlapping partitions, so just forget this option.

If you wouldn't overlap you would have a reservation for a specific account/s just like you have a partition with AllowGroups.

> Enhancement seems like the right Importance.
> 

Marking it as an enhancement.

> Thanks,
> 
> Bill
Comment 17 Bill Abbott 2017-12-20 18:59:17 MST
Another option might be to give partitions MaxNodesPerGroup, MaxNodesPerAccount, MaxNodesPerPartition or something like that.  MaxNodes looks like it's actually MaxNodesPerJob, but works properly with fitting small jobs into the right number of whole nodes.