Ticket 2565 - backfill scheduler priority threshold
Summary: backfill scheduler priority threshold
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 15.08.8
Hardware: Cray XC Linux
: --- 3 - Medium Impact
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-03-17 05:38 MDT by Doug Jacobsen
Modified: 2016-03-24 15:22 MDT (History)
4 users (show)

See Also:
Site: NERSC
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 16.05.0-pre2
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Add bf_min_prio_reserve option to SchedulerParameters (7.29 KB, patch)
2016-03-18 05:22 MDT, Moe Jette
Details | Diff
Add bf_min_prio_reserve option to SchedulerParameters (7.05 KB, patch)
2016-03-18 05:30 MDT, Moe Jette
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Doug Jacobsen 2016-03-17 05:38:20 MDT
Hello,

As we discussed yesterday, NERSC typically has an extremely long queue with a variety of priority needs.  At present to prevent jobs from getting excessive priority and messing up reservations for bigger jobs, we are using user and partition cutoffs in the BF scheduling algorithm.

Yesterday we discussed the possibility of setting a job priority value  threshold wherein jobs above a certain priority threshold would be considered for reservations, and jobs below that threshold could still be started out of order based on the constraints set by the high-priority segment, but would not add further constraints or reservations themselves.

The purpose of this would be to increase the number of jobs that can be considered for backfill in order to improve utilization.

Furthermore, the priority threshold gives me a clear dividing line where I can position high priority jobs with their QOS to avoid having jobs "pop up" and disrupt resource reservations.

This threshold will help me to organize my priority function better and improve utilization.

Depending on how fast we can get through the low priority segment, I may eventually ask that we shuffle the jobs in the low priority segment (or better all jobs that don't have a start time), and scan through them in random order.  This would allow a short scheduling cycle to run more frequently and probabilistically see many jobs.  But I think adding the priority threshold should be the higher priority portion of this.

If at all possible, I would love to have this feature in 15.08 series as this is a clear and present issue for us.

Thanks so much,
Doug Jacobsen
Comment 1 Moe Jette 2016-03-18 03:32:28 MDT
It should be relatively simple. I don't anticipate any problem adding this feature to Slurm version 16.05 and perhaps give you an early patch for 15.08.
Comment 2 Moe Jette 2016-03-18 05:22:24 MDT
Created attachment 2886 [details]
Add bf_min_prio_reserve option to SchedulerParameters

This patch is for Slurm version 15.08 and adds a "bf_min_prio_reserve" option to SchedulerParameters. If set, then only jobs with a priority equal to or higher than the configured value will have the backfill scheduler reserve resources for starting at a later time. The time required for processing jobs with a lower priority is vastly lower since the scheduling logic only needs to determine if the job can be started _now_ rather than simulating the termination of all running jobs to determine when and where it might start in the future. If most jobs have a priority lower than the threshold, your bf_max_job_part and bf_max_job_user parameters can be either vastly increased or eliminated.

I do not intend to add this new feature to version 15.08, but it will go into 16.05 with some additional logic for the non-backfill scheduler (which runs at more frequent intervals).
Comment 3 Moe Jette 2016-03-18 05:30:08 MDT
Created attachment 2887 [details]
Add bf_min_prio_reserve option to SchedulerParameters

Slight revision to previous patch based upon merge issues with v16.05
Comment 4 Doug Jacobsen 2016-03-18 07:16:00 MDT
This is fantastic, Moe.  Thank you!  I'll spin this up on our test system
now and start planning a transition if it works as it reads.

-Doug

----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
dmjacobsen@lbl.gov

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________


On Fri, Mar 18, 2016 at 11:30 AM, <bugs@schedmd.com> wrote:

> Moe Jette <jette@schedmd.com> changed bug 2565
> <https://bugs.schedmd.com/show_bug.cgi?id=2565>
> What Removed Added
> Attachment #2886 [details] is obsolete   1
>
> *Comment # 3 <https://bugs.schedmd.com/show_bug.cgi?id=2565#c3> on bug
> 2565 <https://bugs.schedmd.com/show_bug.cgi?id=2565> from Moe Jette
> <jette@schedmd.com> *
>
> Created attachment 2887 [details] <https://bugs.schedmd.com/attachment.cgi?id=2887&action=diff> [details] <https://bugs.schedmd.com/attachment.cgi?id=2887&action=edit>
> Add bf_min_prio_reserve option to SchedulerParameters
>
> Slight revision to previous patch based upon merge issues with v16.05
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 5 Moe Jette 2016-03-18 07:43:50 MDT
I've committed to Slurm version 16.05 the requested functionality here:
https://github.com/SchedMD/slurm/commit/45560872d84358779690e1164de842c5061bbd36

Note the changes in v16.05 are a bit more comprehensive than those in the attached patch for v15.08 and there are other changes in the code base, so please work with the attached patch (which was made specifically for v15.08) until you upgrade to v16.05.

I'm also going to lower the priority and closing the ticket now, but if you encounter problems feel free to re-open it.
Comment 6 Moe Jette 2016-03-18 08:47:40 MDT
Patch also provided for v15.08.
Comment 8 Doug Jacobsen 2016-03-24 15:22:36 MDT
Hi Moe,

I wanted to let you know that even though we've only had this enabled since
Tuesday on cori, and for about 12 hours on edison (starting today at 9AM),
the effect on our schedule was immediately noticeable.  And our users are
seeing the difference as well.

Throughput and responsiveness have both improved.  The scheduling cycle on
cori dropped from 120s as measured by sdiag to 5s overnight and 25-30s
during the day.  We're reviewing far more jobs for backfill, and
utilization is fantastic.  Typically cori has been achieving 85-90% at
max.  Yesterday we hit 95.8% and are on track for 97% today -- all time
highs for cori.

I really appreciate you implementing this feature for us - thank you.  Will
let you know how it progresses over time.

I did restructure the priority parameters a bit when I enabled it.  I've
ensured that "premium" priority jobs are run in a qos that has a base
priority *at* the threshold level.  I've set our debug jobs to run in a qos
that starts at a priority that will take 60 minutes to age up to the
threshold.  Regular jobs 3 days below the priority threshold, and lower for
low priority jobs.  Other than a small contribution from fairshare, I've
removed all "stacking" priority additions (e.g., TRES contributions), so it
is almost entirely qos + age with a dash of fairshare for local ordering.

-Doug

----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
dmjacobsen@lbl.gov

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________


On Fri, Mar 18, 2016 at 2:47 PM, <bugs@schedmd.com> wrote:

> Moe Jette <jette@schedmd.com> changed bug 2565
> <https://bugs.schedmd.com/show_bug.cgi?id=2565>
> What Removed Added
> Status UNCONFIRMED RESOLVED
> Resolution --- FIXED
> Version Fixed   16.05.0-pre2
>
> *Comment # 6 <https://bugs.schedmd.com/show_bug.cgi?id=2565#c6> on bug
> 2565 <https://bugs.schedmd.com/show_bug.cgi?id=2565> from Moe Jette
> <jette@schedmd.com> *
>
> Patch also provided for v15.08.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>