Hi Slurm Support We are trying to configure 'MaxNodesPerUser' in QOS, but have some strange results, see this: ---- [root@m3-login2 ~]# sacctmgr modify QOS name=m3h set MaxNodesPU=4 Modified qos... m3h Would you like to commit changes? (You have 30 seconds to decide) (N/y): y [root@m3-login2 ~]# [root@m3-login2 ~]# sacctmgr show qos m3h Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES ---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- ------------- m3h 0 00:00:00 cluster 1.000000 node=4 [smaruf@m3-login2 ~]$ [smaruf@m3-login2 ~]$ squeue -u smaruf JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1153281 m3h run_docm smaruf R 3-13:54:28 1 m3h001 1154011 m3h run_docm smaruf R 2-18:47:58 1 m3h006 [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script Submitted batch job 1157417 [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script Submitted batch job 1157418 [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script Submitted batch job 1157419 [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script Submitted batch job 1157420 [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script Submitted batch job 1157421 [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script Submitted batch job 1157422 [smaruf@m3-login2 ~]$ [smaruf@m3-login2 ~]$ squeue -u smaruf JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1157419 m3h slurm-se smaruf PD 0:00 1 (QOSMaxNodePerUserLimit) 1157420 m3h slurm-se smaruf PD 0:00 1 (QOSMaxNodePerUserLimit) 1157421 m3h slurm-se smaruf PD 0:00 1 (QOSMaxNodePerUserLimit) 1157422 m3h slurm-se smaruf PD 0:00 1 (QOSMaxNodePerUserLimit) 1153281 m3h run_docm smaruf R 3-13:55:01 1 m3h001 1154011 m3h run_docm smaruf R 2-18:48:31 1 m3h006 1157418 m3h slurm-se smaruf R 0:15 1 m3h001 1157417 m3h slurm-se smaruf R 0:18 1 m3h001 [smaruf@m3-login2 ~]$ ---- These all are 1-CPU-core jobs, m3h001 has 24 CPU-cores. In theory, we should be able to squeeze more jobs into one m3h node, but the rest are hit with 'QOSMaxNodePerUserLimit'. The intention of setting the previous 'sacctmgr' command is not to allow one user utilise more than 4 nodes in this partition, but this seems to translate to allow 4 jobs only in m3h partition per user. We have an older cluster running slurs 14.08, when doing the same sacctmgr command, it gives us not to allow one user utilise more than 4 nodes in the selected partition. Is our syntax command correct ? or our interpretation of slurm documentation is wrong ? Kindly advise. Thanks. Cheers Damien
(In reply to Damien from comment #0) > Hi Slurm Support > > We are trying to configure 'MaxNodesPerUser' in QOS, but have some strange > results, see this: > ---- > > [root@m3-login2 ~]# sacctmgr modify QOS name=m3h set MaxNodesPU=4 > Modified qos... > m3h > Would you like to commit changes? (You have 30 seconds to decide) > (N/y): y > [root@m3-login2 ~]# > [root@m3-login2 ~]# sacctmgr show qos m3h > Name Priority GraceTime Preempt PreemptMode > Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin > GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins > MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA > MaxSubmitPA MinTRES > ---------- ---------- ---------- ---------- ----------- > ---------------------------------------- ---------- ----------- > ------------- ------------- ------------- ------- --------- ----------- > ------------- -------------- ------------- ----------- ------------- > --------- ----------- ------------- --------- ----------- ------------- > m3h 0 00:00:00 cluster > 1.000000 > node=4 > > > > [smaruf@m3-login2 ~]$ > [smaruf@m3-login2 ~]$ squeue -u smaruf > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 1153281 m3h run_docm smaruf R 3-13:54:28 1 m3h001 > 1154011 m3h run_docm smaruf R 2-18:47:58 1 m3h006 > [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script > Submitted batch job 1157417 > [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script > Submitted batch job 1157418 > [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script > Submitted batch job 1157419 > [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script > Submitted batch job 1157420 > [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script > Submitted batch job 1157421 > [smaruf@m3-login2 ~]$ sbatch slurm-serial-job-script > Submitted batch job 1157422 > > > [smaruf@m3-login2 ~]$ > [smaruf@m3-login2 ~]$ squeue -u smaruf > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 1157419 m3h slurm-se smaruf PD 0:00 1 > (QOSMaxNodePerUserLimit) > 1157420 m3h slurm-se smaruf PD 0:00 1 > (QOSMaxNodePerUserLimit) > 1157421 m3h slurm-se smaruf PD 0:00 1 > (QOSMaxNodePerUserLimit) > 1157422 m3h slurm-se smaruf PD 0:00 1 > (QOSMaxNodePerUserLimit) > 1153281 m3h run_docm smaruf R 3-13:55:01 1 m3h001 > 1154011 m3h run_docm smaruf R 2-18:48:31 1 m3h006 > 1157418 m3h slurm-se smaruf R 0:15 1 m3h001 > 1157417 m3h slurm-se smaruf R 0:18 1 m3h001 > [smaruf@m3-login2 ~]$ > > > ---- > > These all are 1-CPU-core jobs, m3h001 has 24 CPU-cores. In theory, we should > be able to squeeze more jobs into one m3h node, but the rest are hit with > 'QOSMaxNodePerUserLimit'. The intention of setting the previous 'sacctmgr' > command is not to allow one user utilise more than 4 nodes in this > partition, but this seems to translate to allow 4 jobs only in m3h partition > per user. > > > We have an older cluster running slurs 14.08, when doing the same sacctmgr > command, it gives us not to allow one user utilise more than 4 nodes in the > selected partition. > > > Is our syntax command correct ? or our interpretation of slurm documentation > is wrong ? A job running on a node always counts as a separate node; the way this is designed it's not able to group multiple jobs running on the same node together into a single "node" resource. So as soon as you have jobs running across four nodes, no further jobs will launch, even if they could fit in alongside those other jobs. Our usual recommendation is to structure the QOS limits around CPUs to avoid this complication. - Tim
Hi Tim Thanks for your reply. 1) If we are going to implement with CPU limits, For example 4 nodes (24 cores each) which will be a 96 CPUs limit on a select partition, We might have single core job which ask for high memory and with other single core jobs from a single user will still potentially takes more nodes (more than 4 nodes). Is there a method to prevent this ? We are trying to limit users from taking up more of this premium nodes/partition. 2) In a related note, each of this has 2x P100 GPU cards , see: -- cat gres.conf #slurm gres file for m3h001 #No Of Devices=2 Name=gpu Type=P100-PCIE-16GB File=/dev/nvidia0 CPUs=0-27 Name=gpu Type=P100-PCIE-16GB File=/dev/nvidia1 CPUs=0-27 -- Can we use QOS to limit 4x GPUs per user ? If possible, How can sacctmgr this QOS ? with which parameters or syntax ? Kindly advise. Thanks. Cheers Damien (In reply to Tim Wickberg from comment #1) > > These all are 1-CPU-core jobs, m3h001 has 24 CPU-cores. In theory, we should > > be able to squeeze more jobs into one m3h node, but the rest are hit with > > 'QOSMaxNodePerUserLimit'. The intention of setting the previous 'sacctmgr' > > command is not to allow one user utilise more than 4 nodes in this > > partition, but this seems to translate to allow 4 jobs only in m3h partition > > per user. > > > > > > We have an older cluster running slurs 14.08, when doing the same sacctmgr > > command, it gives us not to allow one user utilise more than 4 nodes in the > > selected partition. > > > > > > Is our syntax command correct ? or our interpretation of slurm documentation > > is wrong ? > > A job running on a node always counts as a separate node; the way this is > designed it's not able to group multiple jobs running on the same node > together into a single "node" resource. So as soon as you have jobs running > across four nodes, no further jobs will launch, even if they could fit in > alongside those other jobs. > > Our usual recommendation is to structure the QOS limits around CPUs to avoid > this complication. > > - Tim
> 1) If we are going to implement with CPU limits, For example 4 nodes (24 > cores each) which will be a 96 CPUs limit on a select partition, We might > have single core job which ask for high memory and with other single core > jobs from a single user will still potentially takes more nodes (more than 4 > nodes). Is there a method to prevent this ? > > We are trying to limit users from taking up more of this premium > nodes/partition. You'd want to look into limits around GrpTRES, and to set them on mem and/or cpu values. So you could limit them to 40 cpus and 300gb of memory total with something like: sacctmgr update user tim set grptres=cpu=40,mem=300gb > 2) In a related note, each of this has 2x P100 GPU cards , see: > -- > cat gres.conf > #slurm gres file for m3h001 > #No Of Devices=2 > Name=gpu Type=P100-PCIE-16GB File=/dev/nvidia0 CPUs=0-27 > Name=gpu Type=P100-PCIE-16GB File=/dev/nvidia1 CPUs=0-27 > -- > > Can we use QOS to limit 4x GPUs per user ? If possible, How can sacctmgr > this QOS ? with which parameters or syntax ? You could either use a QOS, or set the limit on the user directly. You'd want to do a few things: 1) Make sure that you have the gpu type defined in AccountingStorageTRES in slurm.conf, e.g.: AccountingStorageTRES=gres/gpu (Restart slurmctld after making any change to that line.) 2) Use either MaxTRES (to limit what a single job can do) or GrpTRES (to limit what the collection of jobs can do, either for an individual user/account or in the QOS) to limit the gres/gpu type. Something like: sacctmgr update qos normal set grptres=gres/gpu=4 on an appropriate QOS would handle what I believe you're after.
Thanks for this details. We will give this a try and do more testings to achieve our objectives. Cheers Damien
Hey Damien - Marking resolved/infogiven now; please reopen if you have any further questions. - Tim