Good Afternoon SLURM Support We seeking clarifications on setting up QOS based on GRES:GPU=4 per user. Previously we are using QOS with restrictions based on number of hosts used per user, but this is found to be inefficient in our workloads. 1) So we wish(or at least try) to move QOS restriction based on GRES:GPU=4, in short, each user account can only used up to 4 GPU cards (MAX). 2) Or more specify, like GRES:GPU:P40=4 , each user account can only used all up 4x P40 GPU cards at any one time. We use this command " sacctmgr add qos P40 MaxTRESPerUser=gres/gpu=4 Flags=OverPartQOS " but is doesn't work as expected. Slurm.conf: SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory FastSchedule=1 PriorityType=priority/multifactor AccountingStorageType=accounting_storage/slurmdbd GresTypes=gpu NodeName=nodes[000-012] Gres=gpu:2 Procs=28 RealMemory=233472 Sockets=2 CoresPerSocket=14 Weight=7 State=UNKNOWN cgroup.conf: CgroupAutomount=yes ConstrainDevices=yes TaskAffinity=yes ConstrainCores=yes ConstrainRAMSpace=yes AllowedDevicesFile=/opt/slurm-16.05.4/etc/cgroup_allowed_devices.conf cgroup_allowed_devices.conf: /dev/vd* /dev/null /dev/zero /dev/urandom /dev/cpu/*/* gres.conf: Name=gpu Type=P40-PCIE-16GB File=/dev/nvidia0 CPUs=0-27 Name=gpu Type=P40-PCIE-16GB File=/dev/nvidia1 CPUs=0-27 Did we miss out something ? or this setup correct ? Kindly advise. Thanks. Cheers Damien
Make sure to have set AccountingStorageTRES=gres/gpu in slurm.conf, then restarted slurmctld. (The restart is required to push the update to slurmdbd.) Running 'scontrol reconfigure' after that will prevent a lot of warning messages about the configuration file being out of sync with the compute nodes, although that's not a big deal here; they aren't affected by that change. After that's done, you can set your limit with: sacctmgr modify QOS foo set MaxTRESPerUser=gres/gpu=4 Users running under that QOS will then be limited to 4 devices (either requested as --gres=gpu:4 or --gres=gpu:p40:4, or through a combination of jobs). What is the exact misbehavior you are seeing? Which version are you running?
Hi. Thanks for the replies. Can we put these together ? AccountingStorageTRES=gres/gpu AccountingStorageType=accounting_storage/slurmdbd Any impact ? Cheers Damien
(In reply to Damien from comment #2) > Hi. > > Thanks for the replies. > > > Can we put these together ? > > AccountingStorageTRES=gres/gpu > AccountingStorageType=accounting_storage/slurmdbd > You must put both if you want to track TRES resources like gpu, and if you want to use slurmdbd. These are independent, see man slurm.conf for extended info about this params: AccountingStorageTRES: Comma separated list of resources you wish to track on the cluster. ... AccountingStorageType: The accounting storage mechanism type. ... > Any impact ? If you don't track gres/gpu, you will not be able to count resources and therefore limits will not be applied. I assumed you have AccountingStorageEnforce=qos and GresTypes=gpu already, and NodeName entries accordingly set to gres.conf: NodeName=gamba3 ... Gres=gpu:tesla:2 gres.conf: NodeName=gamba3 Name=gpu Type=tesla File=/dev/nvidia[10-11] Tell me how it goes. I checked it and think the behavior is what you want.
Hi Felip Many thanks for your replies. We are using 16.05.04, Yes, we are planning to update this soon. For part 2) question: We have two favours of GPUs in this cluster, For an example, the P100s and the P40s. Can we enforce a cluster-wide QOS on a particular model of GPU, rather then GPU as general ? Something like: sacctmgr modify QOS foo set MaxTRESPerUser=gres/gpu:P40=4 Does this make sense ? logical ? Kindly advise. Thanks. Cheers Damien (In reply to Felip Moll from comment #3) > > You must put both if you want to track TRES resources like gpu, and if you > want to use slurmdbd. > > These are independent, see man slurm.conf for extended info about this > params: > > AccountingStorageTRES: Comma separated list of resources you wish to track > on the cluster. ... > AccountingStorageType: The accounting storage mechanism type. ... > > > > Any impact ? > > If you don't track gres/gpu, you will not be able to count resources and > therefore limits will not be applied. > > I assumed you have AccountingStorageEnforce=qos and GresTypes=gpu already, > and NodeName entries accordingly set to gres.conf: > > NodeName=gamba3 ... Gres=gpu:tesla:2 > > gres.conf: > > NodeName=gamba3 Name=gpu Type=tesla File=/dev/nvidia[10-11] > > > > Tell me how it goes. I checked it and think the behavior is what you want.
(In reply to Damien from comment #4) > Hi Felip > > Many thanks for your replies. > > We are using 16.05.04, Yes, we are planning to update this soon. This is necessary, remember that our support model requires customers to stay within the last two major releases (currently 17.02 or 17.11). > For part 2) question: We have two favours of GPUs in this cluster, For an > example, the P100s and the P40s. > > Can we enforce a cluster-wide QOS on a particular model of GPU, rather then > GPU as general ? > > > Something like: > > sacctmgr modify QOS foo set MaxTRESPerUser=gres/gpu:P40=4 > > > Does this make sense ? logical ? > Despite it makes sense and is logical, I am sorry to say that current implementation have some limitations and this is one of them. Currently only gpu as general can be tracked. Same question was done in bug 3397. Hope it helped.
Damien, This comment: > Despite it makes sense and is logical, I am sorry to say that current > implementation have some limitations and this is one of them. Currently only > gpu as general can be tracked. Same question was done in bug 3397. > > Hope it helped. was for version <17.02. In version 17.02 we already support your demanded feature, i.e.: slurm.conf: AccountingStorageTRES=gres/gpu:p100,gres/gpu:tesla GresTypes=gpu NodeName=xx2 ... Gres=gpu:p100:1 NodeName=xx3 ... Gres=gpu:tesla:1 NodeName=xx4 ... Gres=gpu:tesla:3 and ]$ cat gres.conf NodeName=gamba2 Name=gpu Type=p100 File=/dev/nv1 NodeName=gamba3 Name=gpu Type=tesla File=/dev/nv2 NodeName=gamba4 Name=gpu Type=tesla File=/dev/nv[3-5] ]$ sacctmgr show qos -pn test|0|00:00:00||cluster|||1.000000|||||||||||gres/gpu:tesla=1||||||| Should do what you are seeking for. The problem is that I just found a bug in 17.11 with this feature. I am working currently to fix it, but the functionality must be there and fit all your needs.
Hi Felip Yes, this solution works for us, just test it correctly. The main key for us is: -- AccountingStorageEnforce=qos -- Thanks. Damien
Thanks for your help in this matter. Please close this ticket. Cheers Damien
(In reply to Damien from comment #13) > Thanks for your help in this matter. > > > Please close this ticket. > > > Cheers > > Damien Hi Damien, thanks for your answer. I would like to keep this open a little bit more since I am working on a fix on this matter. Specifically: When we have a QoS limit like maxtrespu=gres:tesla:1 and we submit a job asking --gres=gpu:3 , the QoS limit is ignored. I don't know if it is currently affecting you, if it is a concert a possible workaround would be to use a jobsumit lua plugin to deny sending jobs that don't specify the gpu model.
Hi, Finally documentation has been added for this issue so I close this bug now. Commit: c2c06468 available on 17.11.6.