Hello, As we discussed yesterday, NERSC typically has an extremely long queue with a variety of priority needs. At present to prevent jobs from getting excessive priority and messing up reservations for bigger jobs, we are using user and partition cutoffs in the BF scheduling algorithm. Yesterday we discussed the possibility of setting a job priority value threshold wherein jobs above a certain priority threshold would be considered for reservations, and jobs below that threshold could still be started out of order based on the constraints set by the high-priority segment, but would not add further constraints or reservations themselves. The purpose of this would be to increase the number of jobs that can be considered for backfill in order to improve utilization. Furthermore, the priority threshold gives me a clear dividing line where I can position high priority jobs with their QOS to avoid having jobs "pop up" and disrupt resource reservations. This threshold will help me to organize my priority function better and improve utilization. Depending on how fast we can get through the low priority segment, I may eventually ask that we shuffle the jobs in the low priority segment (or better all jobs that don't have a start time), and scan through them in random order. This would allow a short scheduling cycle to run more frequently and probabilistically see many jobs. But I think adding the priority threshold should be the higher priority portion of this. If at all possible, I would love to have this feature in 15.08 series as this is a clear and present issue for us. Thanks so much, Doug Jacobsen
It should be relatively simple. I don't anticipate any problem adding this feature to Slurm version 16.05 and perhaps give you an early patch for 15.08.
Created attachment 2886 [details] Add bf_min_prio_reserve option to SchedulerParameters This patch is for Slurm version 15.08 and adds a "bf_min_prio_reserve" option to SchedulerParameters. If set, then only jobs with a priority equal to or higher than the configured value will have the backfill scheduler reserve resources for starting at a later time. The time required for processing jobs with a lower priority is vastly lower since the scheduling logic only needs to determine if the job can be started _now_ rather than simulating the termination of all running jobs to determine when and where it might start in the future. If most jobs have a priority lower than the threshold, your bf_max_job_part and bf_max_job_user parameters can be either vastly increased or eliminated. I do not intend to add this new feature to version 15.08, but it will go into 16.05 with some additional logic for the non-backfill scheduler (which runs at more frequent intervals).
Created attachment 2887 [details] Add bf_min_prio_reserve option to SchedulerParameters Slight revision to previous patch based upon merge issues with v16.05
This is fantastic, Moe. Thank you! I'll spin this up on our test system now and start planning a transition if it works as it reads. -Doug ---- Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center <http://www.nersc.gov> dmjacobsen@lbl.gov ------------- __o ---------- _ '\<,_ ----------(_)/ (_)__________________________ On Fri, Mar 18, 2016 at 11:30 AM, <bugs@schedmd.com> wrote: > Moe Jette <jette@schedmd.com> changed bug 2565 > <https://bugs.schedmd.com/show_bug.cgi?id=2565> > What Removed Added > Attachment #2886 [details] is obsolete 1 > > *Comment # 3 <https://bugs.schedmd.com/show_bug.cgi?id=2565#c3> on bug > 2565 <https://bugs.schedmd.com/show_bug.cgi?id=2565> from Moe Jette > <jette@schedmd.com> * > > Created attachment 2887 [details] <https://bugs.schedmd.com/attachment.cgi?id=2887&action=diff> [details] <https://bugs.schedmd.com/attachment.cgi?id=2887&action=edit> > Add bf_min_prio_reserve option to SchedulerParameters > > Slight revision to previous patch based upon merge issues with v16.05 > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >
I've committed to Slurm version 16.05 the requested functionality here: https://github.com/SchedMD/slurm/commit/45560872d84358779690e1164de842c5061bbd36 Note the changes in v16.05 are a bit more comprehensive than those in the attached patch for v15.08 and there are other changes in the code base, so please work with the attached patch (which was made specifically for v15.08) until you upgrade to v16.05. I'm also going to lower the priority and closing the ticket now, but if you encounter problems feel free to re-open it.
Patch also provided for v15.08.
Hi Moe, I wanted to let you know that even though we've only had this enabled since Tuesday on cori, and for about 12 hours on edison (starting today at 9AM), the effect on our schedule was immediately noticeable. And our users are seeing the difference as well. Throughput and responsiveness have both improved. The scheduling cycle on cori dropped from 120s as measured by sdiag to 5s overnight and 25-30s during the day. We're reviewing far more jobs for backfill, and utilization is fantastic. Typically cori has been achieving 85-90% at max. Yesterday we hit 95.8% and are on track for 97% today -- all time highs for cori. I really appreciate you implementing this feature for us - thank you. Will let you know how it progresses over time. I did restructure the priority parameters a bit when I enabled it. I've ensured that "premium" priority jobs are run in a qos that has a base priority *at* the threshold level. I've set our debug jobs to run in a qos that starts at a priority that will take 60 minutes to age up to the threshold. Regular jobs 3 days below the priority threshold, and lower for low priority jobs. Other than a small contribution from fairshare, I've removed all "stacking" priority additions (e.g., TRES contributions), so it is almost entirely qos + age with a dash of fairshare for local ordering. -Doug ---- Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center <http://www.nersc.gov> dmjacobsen@lbl.gov ------------- __o ---------- _ '\<,_ ----------(_)/ (_)__________________________ On Fri, Mar 18, 2016 at 2:47 PM, <bugs@schedmd.com> wrote: > Moe Jette <jette@schedmd.com> changed bug 2565 > <https://bugs.schedmd.com/show_bug.cgi?id=2565> > What Removed Added > Status UNCONFIRMED RESOLVED > Resolution --- FIXED > Version Fixed 16.05.0-pre2 > > *Comment # 6 <https://bugs.schedmd.com/show_bug.cgi?id=2565#c6> on bug > 2565 <https://bugs.schedmd.com/show_bug.cgi?id=2565> from Moe Jette > <jette@schedmd.com> * > > Patch also provided for v15.08. > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >