Ticket 595 - Jobs Blocking Primary Scheduler
Summary: Jobs Blocking Primary Scheduler
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 2.6.x
Hardware: Linux Linux
: --- 3 - Medium Impact
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-02-20 07:01 MST by Moe Jette
Modified: 2014-02-24 02:08 MST (History)
1 user (show)

See Also:
Site: Harvard University
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 2.6.7, 14.03.0-pre7
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
block job's required nodes rather than entire queue (954 bytes, patch)
2014-02-20 08:07 MST, Moe Jette
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Moe Jette 2014-02-20 07:01:30 MST
So we found a "feature" in slurm today which is a bit concerning. If you have a job which requests a specific node and it is at the top of the queue it ends up blocking the primary scheduler for scheduling even though jobs after the top job may have space and not use the node specifically requested by the top job.  Instead those other jobs need to wait for the backfill loop which is quite a bit slower than the primary loop.

Needless to say this is not optimal.  It would be best for the primary loop to note that the job in question is requesting a specific node, block it off, and then move on to schedule the other jobs behind it knowing that it can't schedule to the specific node that first job is looking for.  That way the primary loop can penetrate deeper in cases when you have specific nodes requested.

We've had this block up our scheduler a couple of times now.  First we have a series of jobs which were sent out to rebuild specific nodes.  Those ended up blocking the primary scheduler from scheduling the jobs lower down the priority tree due to the nodes that were to be reimaged not being available yet.  We also had another case where a user requested a specific node that wouldn't come available for 3 days.  As you can imagine having the primary scheduling loop only go one job deep for a few days when there are a ton of job slots open on other nodes and jobs that can fill them is suboptimal.

Anyways, if we could get a fix to this put in for the next release that would be great.  All the main loop needs to do is if it sees a job requesting a specific node to note that and move on, not just fail.   If it gets to a job that it can't schedule and it isn't requesting specific cores then it is fine for it to stop.  But we can't have it blocking on requests for specific nodes.
Comment 1 Moe Jette 2014-02-20 08:07:32 MST
Created attachment 636 [details]
block job's required nodes rather than entire queue

Commit is here:

https://github.com/SchedMD/slurm/commit/eafc0a4fd99a2c7d5edfe6df118a96ca038af4c2
Comment 2 Moe Jette 2014-02-24 02:08:32 MST
closing bug, patch provided 2/20/14