-------- Forwarded Message -------- Subject: [slurm-dev] Jobs blocked with Reason=BadConstraints when -N not specified Date: Wed, 24 Feb 2016 01:56:18 -0800 From: Roche Ewan <ewan.roche@epfl.ch> Reply-To: slurm-dev <slurm-dev@schedmd.com> To: slurm-dev <slurm-dev@schedmd.com> Hello, following an update to 15.08.8 from 14.11.7 we observe what appears to be a bug. We can reproduce the bug on clusters with TaskPlugin=task/affinity and TaskPlugin=task/cgroup The background is that one of our clusters has two groups of nodes with different core counts (16 and 24) so we advise users not to specify the number of nodes in order to make best use of the resources. The topology plugin ensures that jobs will run on one group or the other but never span both. The problem is as follows: I submit a simple job #!/bin/sh #SBATCH --ntasks=1536 #SBATCH --cpus-per-task=1 #SBATCH --time=00:10:00 We then see $ scontrol show job 432224 JobId=432224 JobName=run64x24.job UserId=eroche(141633) GroupId=scitas-ge(11902) Priority=105081 Nice=0 Account=scitas-ge QOS=scitas JobState=PENDING*Reason=Priority* Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A SubmitTime=2016-02-24T10:04:24 EligibleTime=2016-02-24T10:04:24 StartTime=2016-02-25T15:56:16 EndTime=Unknown PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=parallel AllocNode:Sid=deneb2:9853 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) *NumNodes=64 NumCPUs=1536 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1536,node=1* Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* *MinCPUsNode=1* MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=0 Contiguous=0 Licenses=(null) Network=(null) Command=/home/eroche/jobs/run64x24.job WorkDir=/home/eroche/jobs StdErr=/home/eroche/jobs/slurm-432224.out StdIn=/dev/null StdOut=/home/eroche/jobs/slurm-432224.out Power= SICP=0 If the administrator then updates the priority of this task scontrol update jobid=432224 priority=10000000 The job information changes and the task is held with Reason=BadConstraints $ scontrol show job 432224 JobId=432224 JobName=run64x24.job UserId=eroche(141633) GroupId=scitas-ge(11902) Priority=0 Nice=0 Account=scitas-ge QOS=scitas JobState=PENDING*Reason=BadConstraints* Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A SubmitTime=2016-02-24T10:04:24 EligibleTime=2016-02-24T10:04:24 StartTime=2016-02-24T10:04:54 EndTime=2016-02-24T10:04:54 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=parallel AllocNode:Sid=deneb2:9853 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) *NumNodes=64 NumCPUs=1536 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1536,node=1* Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* *MinCPUsNode=1536* MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=0 Contiguous=0 Licenses=(null) Network=(null) Command=/home/eroche/jobs/run64x24.job WorkDir=/home/eroche/jobs StdErr=/home/eroche/jobs/slurm-432224.out StdIn=/dev/null StdOut=/home/eroche/jobs/slurm-432224.out Power= SICP=0 Looking at the scheduler logs we see [2016-02-24T10:04:53.742]*update_job: setting pn_min_cpus from 1 to 1536for job_id 432224* [2016-02-24T10:04:53.742] sched: update_job: setting priority to 10000000for job_id 432224 [2016-02-24T10:04:53.742] debug2: initial priorityfor job 432224 is 10000000 [2016-02-24T10:04:53.743] _slurm_rpc_update_job complete JobId=432224 uid=0 usec=427 [2016-02-24T10:04:53.743] debug3: Writing job id 432224 to header record of job_state file So for some reason pn_min_cpus gets set to the total number of tasks and as we don’t have any nodes with 1536 cores the scheduler advises accordingly. [2016-02-24T10:04:54.224] _build_node_list: No nodes satisfy job 432224 requirements in partition parallel [2016-02-24T10:04:54.224] sched: schedule: JobID=432224 State=0x0 NodeCnt=0 non-runnable:Requested node configuration is not available [2016-02-24T10:04:57.229] debug3: sched: JobId=432224. State=PENDING. Reason=BadConstraints. Priority=0. If we try and change this scontrol update jobid=432224 MinCPUsNode=1 we see [2016-02-24T10:05:32.332] debug3: JobDesc: user_id=4294967294 job_id=432224 partition=(null) name=(null) [2016-02-24T10:05:32.332]*update_job: setting pn_min_cpus to 1for job_id 432224* [2016-02-24T10:05:32.332]*update_job: setting pn_min_cpus from 1 to 1536for job_id 432224* [2016-02-24T10:05:32.333] _slurm_rpc_update_job complete JobId=432224 uid=0 usec=505 [2016-02-24T10:05:32.333] debug3: Writing job id 432224 to header record of job_state file So it gets changed and then immediately changed back The only way we have found to fix this is with: scontrol update jobid=432224 NumNodes=64-64 Which results in [2016-02-24T10:05:58.476] debug3: JobDesc: user_id=4294967294 job_id=432224 partition=(null) name=(null) [2016-02-24T10:05:58.476] update_job: setting min_nodes from 1 to 64for job_id 432224 [2016-02-24T10:05:58.476] update_job: setting pn_min_cpus from 1536 to 24for job_id 432224 [2016-02-24T10:05:58.476] _slurm_rpc_update_job complete JobId=432224 uid=0 usec=387 [2016-02-24T10:05:58.477] debug3: Writing job id 432224 to header record of job_state file Does anybody know why the pn_min_cpus is being “incorrectly” set and therefore blocking the job? If the job script includes “-N 64” everything works correctly. Thanks Ewan Roche SCITAS Ecole Polytechnique Fédérale de Lausanne