595 – Jobs Blocking Primary Scheduler

Ticket 595 - Jobs Blocking Primary Scheduler

Summary: Jobs Blocking Primary Scheduler

Status:	RESOLVED FIXED

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmctld (show other tickets)
Version:	2.6.x
Hardware:	Linux Linux

Importance:	--- 3 - Medium Impact
Assignee:	Moe Jette
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2014-02-20 07:01 MST by Moe Jette
Modified:	2014-02-24 02:08 MST (History)
CC List:	1 user (show)

See Also:
Site:	Harvard University
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:	2.6.7, 14.03.0-pre7
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
block job's required nodes rather than entire queue (954 bytes, patch) 2014-02-20 08:07 MST, Moe Jette	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Moe Jette 2014-02-20 07:01:30 MST

So we found a "feature" in slurm today which is a bit concerning. If you have a job which requests a specific node and it is at the top of the queue it ends up blocking the primary scheduler for scheduling even though jobs after the top job may have space and not use the node specifically requested by the top job. Instead those other jobs need to wait for the backfill loop which is quite a bit slower than the primary loop.

Needless to say this is not optimal. It would be best for the primary loop to note that the job in question is requesting a specific node, block it off, and then move on to schedule the other jobs behind it knowing that it can't schedule to the specific node that first job is looking for. That way the primary loop can penetrate deeper in cases when you have specific nodes requested.

We've had this block up our scheduler a couple of times now. First we have a series of jobs which were sent out to rebuild specific nodes. Those ended up blocking the primary scheduler from scheduling the jobs lower down the priority tree due to the nodes that were to be reimaged not being available yet. We also had another case where a user requested a specific node that wouldn't come available for 3 days. As you can imagine having the primary scheduling loop only go one job deep for a few days when there are a ton of job slots open on other nodes and jobs that can fill them is suboptimal.

Anyways, if we could get a fix to this put in for the next release that would be great. All the main loop needs to do is if it sees a job requesting a specific node to note that and move on, not just fail. If it gets to a job that it can't schedule and it isn't requesting specific cores then it is fine for it to stop. But we can't have it blocking on requests for specific nodes.

Comment 1 Moe Jette 2014-02-20 08:07:32 MST

Created attachment 636 [details]
block job's required nodes rather than entire queue

Commit is here:

https://github.com/SchedMD/slurm/commit/eafc0a4fd99a2c7d5edfe6df118a96ca038af4c2

Comment 2 Moe Jette 2014-02-24 02:08:32 MST

closing bug, patch provided 2/20/14