2478 – Jobs blocked with Reason=BadConstraints when -N not specified

Bug 2478 - Jobs blocked with Reason=BadConstraints when -N not specified

Summary: Jobs blocked with Reason=BadConstraints when -N not specified

Status:	RESOLVED DUPLICATE of bug 2472

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Limits (show other bugs)
Version:	15.08.8
Hardware:	Linux Linux

Importance:	--- 4 - Minor Issue
Assignee:	Tim Wickberg
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2016-02-24 02:00 MST by Tim Wickberg
Modified:	2016-02-24 05:19 MST (History)
CC List:	0 users

See Also:	2468
Site:	-Other-
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tim Wickberg 2016-02-24 02:00:34 MST

-------- Forwarded Message --------
Subject: 	[slurm-dev] Jobs blocked with Reason=BadConstraints when -N 
not specified
Date: 	Wed, 24 Feb 2016 01:56:18 -0800
From: 	Roche Ewan <ewan.roche@epfl.ch>
Reply-To: 	slurm-dev <slurm-dev@schedmd.com>
To: 	slurm-dev <slurm-dev@schedmd.com>



Hello,
following an update to 15.08.8 from 14.11.7 we observe what appears to 
be a bug.

We can reproduce the bug on clusters with TaskPlugin=task/affinity 
and TaskPlugin=task/cgroup

The background is that one of our clusters has two groups of nodes with 
different core counts (16 and 24) so we advise users not to specify the 
number of nodes in order to make best use of the resources. The topology 
plugin ensures that jobs will run on one group or the other but never 
span both.

The problem is as follows:

I submit a simple job

#!/bin/sh
#SBATCH --ntasks=1536
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00



We then see

$ scontrol show job 432224
JobId=432224 JobName=run64x24.job
    UserId=eroche(141633) GroupId=scitas-ge(11902)
    Priority=105081 Nice=0 Account=scitas-ge QOS=scitas
    JobState=PENDING*Reason=Priority*  Dependency=(null)
    Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
    RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A
    SubmitTime=2016-02-24T10:04:24 EligibleTime=2016-02-24T10:04:24
    StartTime=2016-02-25T15:56:16 EndTime=Unknown
    PreemptTime=None SuspendTime=None SecsPreSuspend=0
    Partition=parallel AllocNode:Sid=deneb2:9853
    ReqNodeList=(null) ExcNodeList=(null)
    NodeList=(null)
    *NumNodes=64 NumCPUs=1536 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    TRES=cpu=1536,node=1*
    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
    *MinCPUsNode=1*  MinMemoryNode=0 MinTmpDiskNode=0
    Features=(null) Gres=(null) Reservation=(null)
    Shared=0 Contiguous=0 Licenses=(null) Network=(null)
    Command=/home/eroche/jobs/run64x24.job
    WorkDir=/home/eroche/jobs
    StdErr=/home/eroche/jobs/slurm-432224.out
    StdIn=/dev/null
    StdOut=/home/eroche/jobs/slurm-432224.out
    Power= SICP=0



If the administrator then updates the priority of this task

scontrol update jobid=432224 priority=10000000


The job information changes and the task is held with Reason=BadConstraints

$ scontrol show job 432224
JobId=432224 JobName=run64x24.job
    UserId=eroche(141633) GroupId=scitas-ge(11902)
    Priority=0 Nice=0 Account=scitas-ge QOS=scitas
    JobState=PENDING*Reason=BadConstraints*  Dependency=(null)
    Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
    RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A
    SubmitTime=2016-02-24T10:04:24 EligibleTime=2016-02-24T10:04:24
    StartTime=2016-02-24T10:04:54 EndTime=2016-02-24T10:04:54
    PreemptTime=None SuspendTime=None SecsPreSuspend=0
    Partition=parallel AllocNode:Sid=deneb2:9853
    ReqNodeList=(null) ExcNodeList=(null)
    NodeList=(null)
    *NumNodes=64 NumCPUs=1536 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    TRES=cpu=1536,node=1*
    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
    *MinCPUsNode=1536*  MinMemoryNode=0 MinTmpDiskNode=0
    Features=(null) Gres=(null) Reservation=(null)
    Shared=0 Contiguous=0 Licenses=(null) Network=(null)
    Command=/home/eroche/jobs/run64x24.job
    WorkDir=/home/eroche/jobs
    StdErr=/home/eroche/jobs/slurm-432224.out
    StdIn=/dev/null
    StdOut=/home/eroche/jobs/slurm-432224.out
    Power= SICP=0



Looking at the scheduler logs we see

[2016-02-24T10:04:53.742]*update_job: setting pn_min_cpus from 1 to 1536for  job_id 432224*
[2016-02-24T10:04:53.742] sched: update_job: setting priority to 10000000for  job_id 432224
[2016-02-24T10:04:53.742] debug2: initial priorityfor  job 432224 is 10000000
[2016-02-24T10:04:53.743] _slurm_rpc_update_job complete JobId=432224 uid=0 usec=427
[2016-02-24T10:04:53.743] debug3: Writing job id 432224 to header record of job_state file


So for some reason pn_min_cpus gets set to the total number of tasks and 
as we don’t have any nodes with 1536 cores the scheduler advises 
accordingly.

[2016-02-24T10:04:54.224] _build_node_list: No nodes satisfy job 432224 requirements in partition parallel
[2016-02-24T10:04:54.224] sched: schedule: JobID=432224 State=0x0 NodeCnt=0 non-runnable:Requested node configuration is not available
[2016-02-24T10:04:57.229] debug3: sched: JobId=432224. State=PENDING. Reason=BadConstraints. Priority=0.



If we try and change this

scontrol update jobid=432224 MinCPUsNode=1


we see

[2016-02-24T10:05:32.332] debug3: JobDesc: user_id=4294967294 job_id=432224 partition=(null) name=(null)
[2016-02-24T10:05:32.332]*update_job: setting pn_min_cpus to 1for  job_id 432224*
[2016-02-24T10:05:32.332]*update_job: setting pn_min_cpus from 1 to 1536for  job_id 432224*
[2016-02-24T10:05:32.333] _slurm_rpc_update_job complete JobId=432224 uid=0 usec=505
[2016-02-24T10:05:32.333] debug3: Writing job id 432224 to header record of job_state file



So it gets changed and then immediately changed back

The only way we have found to fix this is with:

scontrol update jobid=432224 NumNodes=64-64


Which results in

[2016-02-24T10:05:58.476] debug3: JobDesc: user_id=4294967294 job_id=432224 partition=(null) name=(null)
[2016-02-24T10:05:58.476] update_job: setting min_nodes from 1 to 64for  job_id 432224
[2016-02-24T10:05:58.476] update_job: setting pn_min_cpus from 1536 to 24for  job_id 432224
[2016-02-24T10:05:58.476] _slurm_rpc_update_job complete JobId=432224 uid=0 usec=387
[2016-02-24T10:05:58.477] debug3: Writing job id 432224 to header record of job_state file



Does anybody know why the pn_min_cpus is being “incorrectly” set and 
therefore blocking the job?

If the job script includes “-N 64” everything works correctly.


Thanks

Ewan Roche

SCITAS
Ecole Polytechnique Fédérale de Lausanne