Bug 8069 - scontrol schedule command
Summary: scontrol schedule command
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other bugs)
Version: 19.05.3
Hardware: Linux Linux
: --- 5 - Enhancement
Assignee: Unassigned Developer
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-11-07 10:27 MST by Paul Edmon
Modified: 2019-11-08 10:14 MST (History)
1 user (show)

See Also:
Site: Harvard University
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Edmon 2019-11-07 10:27:28 MST
A nice addition to the functionality of scontrol would be the ability to force the scheduler to consider a specific job immediately.  I'm not speaking of upping its priority but rather having the scheduling logic just run for that job both primary and backfill.  Naturally you would want to limit this to admins only.  It would work by doing:

scontrol schedule JOBID

The scheduler would then inspect that job immediately and see if it can be scheduled either in the primary loop or the backfill loop and if it can't be scheduled to spit back when it thinks it will run.

This would be useful for jobs that have had their parameters adjusted by the admin to better fit in the queue and they want to get the scheduler to reconsider the job in light of those modifications.  Or alternatively if there was a super low priority job but the admin thinks it would work for backfill and wants to try to see if the scheduler will push it through with out modifying its priority.  Or the admin has a user that is curious when their job may run but it is so low on the priority chain that the backfill scheduler hasn't looked at it and thus hasn't given an estimated time.
Comment 2 Jason Booth 2019-11-07 13:37:46 MST
Paul, we received your request. Normally requests like this are accomplished with a paid engagement/sponsorship. Is this something your site was interested in sponsoring? 

One option that currently exists is the --test-only which tests a job before it is submitted and returns an expected start time, so this may help when testing out new job options to see when the scheduler thinks the job could start.
Comment 3 Tim Wickberg 2019-11-07 13:45:53 MST
> This would be useful for jobs that have had their parameters adjusted by the
> admin to better fit in the queue and they want to get the scheduler to
> reconsider the job in light of those modifications.  Or alternatively if
> there was a super low priority job but the admin thinks it would work for
> backfill and wants to try to see if the scheduler will push it through with
> out modifying its priority.  Or the admin has a user that is curious when
> their job may run but it is so low on the priority chain that the backfill
> scheduler hasn't looked at it and thus hasn't given an estimated time.

Hi Paul -

The backfill scheduler doesn't really work that way. There's no way to consider an individual job... backfill needs to work through the entire ensemble of jobs to rebuild it's plan for the whole system.

Now... you may be asking why we can't just build off the current state and consider one extra job. Well... we don't save any of this intermediate state. It's much simpler and faster to throw it out each backfill cycle and start again from scratch, rather than try to revise the backfill map as jobs start/stop, nodes come and go, and job priorities change.

For the main scheduler... while we could add in something of this nature, I should note that a quick version of the main scheduler runs and tests a few high-priority jobs as any job completes, or a node changes state. On most systems this is extremely frequently. (You can disable this with SchedulerParameters=defer.)

Otherwise, waiting for the main scheduling cycle to kick in would be my main suggestion, with normal tuning this is once a minute anyways. I don't see any value to adding a way to force this to happen immediately; you could always restart slurmctld to achieve that if desired.

Our usual recommendations around this type of request are to look into further backfill tuning to get it to consider additional jobs, and update the reason fields on jobs that may not have been considered. If you'd like some pointers on that, please open a ticket to discuss.

We do have some plans to add additional status indications to jobs in the ~20.11 release (it's a private customer bug, so I won't link it), but otherwise I can't see us building something that works quite as you've described, and will move to close this as resolved/infogiven.

- Tim