Ticket 5138

Summary:	Reason=ReqNodeNotAvail, but unavailable nodes not in requested partition
Product:	Slurm	Reporter:	pbisbal
Component:	Scheduling	Assignee:	Tim Wickberg <tim>
Status:	RESOLVED DUPLICATE	QA Contact:
Severity:	3 - Medium Impact
Priority:	---
Version:	17.11.4
Hardware:	Linux
OS:	Linux
Site:	Princeton Plasma Physics Laboratory (PPPL)	Alineos Sites:	---
Atos/Eviden Sites:	---	Confidential Site:	---
Coreweave sites:	---	Cray Sites:	---
DS9 clusters:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Linux Distro:	---
Machine Name:		CLE Version:
Version Fixed:		Target Release:	---
DevPrio:	---	Emory-Cloud Sites:	---

Description pbisbal 2018-05-07 15:26:43 MDT

On my cluster, I have several partitions, each with their own QOS, time limits, etc.

Several times today, I've received complaints from users that they submitted jobs to a partition with available nodes, but jobs are stuck in the PD state. I have spent the majority of my day investigating this, but haven't turned up anything meaningful.  The 3 jobs show the "ReqNodeNotAvail" reason, but none of the nodes listed at not available are even in the partitions these jobs are submitted to. Neither job has requested a specific node, either, and all seem to be requesting small enough amounts of resources that these jobs should start right away. 

I have checked slurmctld.log on the server, and have not been able to find any clues. Any where else I should look? Any ideas what could be causing this?

Here's some relevant output from squeue, sinfo, etc.:


# squeue  -t PD  | grep ReqNodeNotAvail
            263978    jassby     NSTX jklabach PD       0:00      6 (ReqNodeNotAvail, UnavailableNodes:dawson[036-037,045,067],greene021,mccune008)
            264450      mque    den04    ehkim PD       0:00      1 (ReqNodeNotAvail, UnavailableNodes:dawson[036-037,045,067],greene021,mccune008)
            264568     ellis running-  adiallo PD       0:00      1 (ReqNodeNotAvail, UnavailableNodes:dawson[036-037,045,067],greene021,mccune008)

# for job in 263978 264450 264568; do scontrol show job $job; done 
JobId=263978 JobName=NSTX
   UserId=jklabach(40779) GroupId=users(589) MCS_label=N/A
   Priority=3279 Nice=0 Account=unix QOS=jassby
   JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:dawson[036-037,045,067],greene021,mccune008 Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=25-00:00:00 TimeMin=N/A
   SubmitTime=2018-05-04T14:45:17 EligibleTime=2018-05-04T14:45:17
   StartTime=2018-05-12T17:00:00 EndTime=2018-06-06T17:00:00 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2018-05-07T17:24:19
   Partition=jassby AllocNode:Sid=sunfire16:2212
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=6 NumCPUs=96 NumTasks=96 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=96,mem=280G,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=280G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/p/iterppn/BASTION/NSTX_Neutronics/NG_NSTX_InitialRun/Jassby_No_Starting_Flux.sh
   WorkDir=/p/iterppn/BASTION/NSTX_Neutronics/NG_NSTX_InitialRun
   StdErr=/p/iterppn/BASTION/NSTX_Neutronics/NG_NSTX_InitialRun/slurm-263978.out
   StdIn=/dev/null
   StdOut=/p/iterppn/BASTION/NSTX_Neutronics/NG_NSTX_InitialRun/slurm-263978.out
   Power=

JobId=264450 JobName=den04
   UserId=ehkim(30375) GroupId=users(589) MCS_label=N/A
   Priority=1084 Nice=0 Account=unix QOS=mque
   JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:dawson[036-037,045,067],greene021,mccune008 Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=8-08:00:00 TimeMin=N/A
   SubmitTime=2018-05-07T12:43:40 EligibleTime=2018-05-07T12:43:40
   StartTime=2018-05-12T17:00:00 EndTime=2018-05-21T01:00:00 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2018-05-07T17:24:19
   Partition=mque AllocNode:Sid=sunfire09:17228
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=18000M,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/p/space/ehkim/FW2D_WORK/POP.2.2017/nstx120752/mphi12_15e18/g230/test.job
   WorkDir=/p/space/ehkim/FW2D_WORK/POP.2.2017/nstx120752/mphi12_15e18/g230
   StdErr=/p/space/ehkim/FW2D_WORK/POP.2.2017/nstx120752/mphi12_15e18/g230/test.err
   StdIn=/dev/null
   StdOut=/p/space/ehkim/FW2D_WORK/POP.2.2017/nstx120752/mphi12_15e18/g230/test.out
   Power=
   

JobId=264568 JobName=running-BES-170881
   UserId=adiallo(30689) GroupId=users(589) MCS_label=N/A
   Priority=1032 Nice=0 Account=unix QOS=ellis
   JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:dawson[036-037,045,067],greene021,mccune008 Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=8-08:00:00 TimeMin=N/A
   SubmitTime=2018-05-07T16:37:30 EligibleTime=2018-05-07T16:37:30
   StartTime=2018-05-12T17:00:00 EndTime=2018-05-21T01:00:00 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2018-05-07T17:24:19
   Partition=ellis AllocNode:Sid=ganesh21:18492
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=2 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,mem=62.50G,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=32000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/p/nstxusr/user/adiallo/DIIID_WORK/BARADA-XP/BES_MAG_ANALYSIS/BICOHERENCE_BES/runbatch_B3.sbatch
   WorkDir=/p/nstxusr/user/adiallo/DIIID_WORK/BARADA-XP/BES_MAG_ANALYSIS/BICOHERENCE_BES
   StdErr=/p/nstxusr/user/adiallo/DIIID_WORK/BARADA-XP/BES_MAG_ANALYSIS/BICOHERENCE_BES/slurm-264568.out
   StdIn=/dev/null
   StdOut=/p/nstxusr/user/adiallo/DIIID_WORK/BARADA-XP/BES_MAG_ANALYSIS/BICOHERENCE_BES/slurm-264568.out
   Power=

# sinfo -p ellis,mque,jassby
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
ellis        up 8-08:00:00      3    mix ellis[001,003-004]
ellis        up 8-08:00:00      7   idle ellis[002,005-010]
jassby       up 30-00:00:0      6   idle jassby[001-006]
mque         up 8-08:00:00      2    mix ganesh[21,24]
mque         up 8-08:00:00      3  alloc ganesh[25-27]
mque         up 8-08:00:00      2   idle ganesh[20,22]

Comment 1 pbisbal 2018-05-07 15:36:51 MDT

Looks like this is a collision with a maintenance reservation I have starting Friday at 5 PM: 

# scontrol show res
ReservationName=root_10 StartTime=2018-05-11T17:00:00 EndTime=2018-05-12T17:00:00 Duration=1-00:00:00
   Nodes=beast002,dawson[027-062,064-065,067-068,071-072,074-083,085-099,101-103,105,107-110,112-162],ellis[001-010],fielder[001-011],ganesh[20-22,24-27],greene[001-048],jassby[001-006],kruskal[001,003-005,007-008,010-016,018,020-028,030-036],mccune[001-040] NodeCnt=279 CoreCnt=7080 Features=(null) PartitionName=(null) Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES
   TRES=cpu=7112
   Users=root Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a

We recently changed the max. time limit for our most commonly used partitions to be only 48 hours (down from 8 days), so I wasn't thinking about the smaller, less used partitions that still had longer time limits that could intersect with the maintenance window. 

So this appears to be my fault, but is it possible to change the reason for jobs that are stuck in pending due to a reservation? "ReqNodeNotAvail" is completely misleading, since the nodes actually are available, just not for the time requested. 

Changing the reason for this situation will result in a lot less work for me (and all other Slurm admins, I bet) in the days/hours preceding a reservation. 

Prentice

Comment 2 Tim Wickberg 2018-05-07 18:29:45 MDT

We cannot change this currently on 17.11, but we are working on changing that reason field for the 18.08; I'm closing this bug as a duplicate of bug 4987 which is tracking work on that.

- Tim

*** This ticket has been marked as a duplicate of ticket 4987 ***

Comment 3 pbisbal 2018-05-08 07:32:36 MDT

That's perfect. Thanks.