Bug 2692 - QOS with guaranteed start time
Summary: QOS with guaranteed start time
Status: RESOLVED TIMEDOUT
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other bugs)
Version: 14.11.11
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-05-03 20:12 MDT by Steven Young
Modified: 2017-04-04 16:07 MDT (History)
0 users

See Also:
Site: University of Oxford
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steven Young 2016-05-03 20:12:51 MDT
We have a requirement to allow a set of users to have high priority access to a proportion of our cluster.  The extra degree of difficulty is that they have bursty usage, so when they do submit work, they want a guarantee that their jobs will start within 12 hours on their proportion of the cluster.

In our cluster the relevant partition for this discussion is our compute partition which has our compute nodes.  MaxTime for the compute partition is greater than 12 hours.  (In fact currently it is 5 days). We currently use Multifactor Priority with Fair Tree Fairshare.  We also currently have two QOS defined: normal and priority.  We have Priority Weighting so that Fairshare and QOS are equally weighted, then Age and JobSize are weighted less.  Partition weighting is currently zero. Setting up a superpriority QOS which has a GrpNodes setting to the required value will allow us to provide a higher priority access to the required proportion of the cluster, but won't allow us to guarantee the 12 hour start time since we normally have a back-log of jobs asking for multiple days of walltime.

I was recently re-reading the SLURM documentation on Reservations (http://slurm.schedmd.com/reservations.html), specifically about Reservations Floating Through Time.  Ie, we could create a reservation that has Flags=TIME_FLOAT and StartTime=now+12hours and the nodes assigned to this reservation would only allow jobs with TimeLimit requested of 12 hours or less.  That gets us part way to meeting the requirement.

Having read about these reservations I am now wondering whether there is any way SLURM could be "improved", so that users from the specific SLURM accounts that should have high priority access can be allowed to run on the reservation.
Comment 1 Danny Auble 2016-05-10 00:14:54 MDT
Hi Steven, we are looking into this.  Right now a job must be submitted, or altered later, to a reservation to use it and once submitted it needs to be removed from the job to allow it to run anywhere else.  Given that I'm not sure how easy the requested improvement would be.  Meaning I don't think we want to allow jobs to use a reservation unless they explicitly asked for it.  Let us think a little on this and get back to you. I'm guessing something else will be the solution though.
Comment 2 Tim Wickberg 2016-05-11 01:59:43 MDT
We've been batting it around, and I think the floating reservation, possibly with a job_submit plugin providing some assistance, is the best way to implement this right now. You may need to provide some additional scripting alongside that to disable the TIME_FLOAT flag on demand, and/or to recreate the reservation once used, but that's all going to be heavily dependent on your site requirements.

Slurm generally eschews approaches to resource management that require keeping resources idle - the reservation at least makes it obvious that things are running in an unusual manner, and I don't see a clean approach to adding this as a QoS feature.

I can classify this as an enhancement request if you'd like, but I don't think we'd tackle this without additional customer demand, and a good architectural approach to implementing it which remains unclear.

- Tim
Comment 3 Steven Young 2016-05-11 23:02:39 MDT
Hi, (Hope the bugzilla incoming email configuration works on 
bugs.schedmd.com),
[Resending from theteam@arc.ox.ac.uk as I got an error sending from my 
personal email address]

So if someone can provide more details about floating reservation and 
job_submit plugin (and additional scripting) we will be happy to 
implement this as a test and then roll it out into production.

In my original post to the slurm-devel list:

<https://groups.google.com/d/msg/slurm-devel/dcCkYt5M0xk/xNL1XUOKLQAJ>

I described some thinking about having a TIME_FLOAT reservation and 
having some script watch for jobs with the superpriority QOS which then 
reduces the reservation to allow the pending jobs to run.  I'll be 
interested to see how my earlier thinking differs from the solution you 
have in mind.

Another comment.  I'm still keen for the jobs being submitted by this 
special group of users to be submitted with a specific QOS so that the 
Slurm accounting database is able to provide access control to the 
special service level.

Cheers,
Steve.

On 11/05/16 15:59, bugs@schedmd.com wrote:
> Tim Wickberg <mailto:tim@schedmd.com> changed bug 2692
> <https://bugs.schedmd.com/show_bug.cgi?id=2692>
> What 	Removed 	Added
> Assignee 	support@schedmd.com 	tim@schedmd.com
>
> *Comment # 2 <https://bugs.schedmd.com/show_bug.cgi?id=2692#c2> on bug
> 2692 <https://bugs.schedmd.com/show_bug.cgi?id=2692> from Tim Wickberg
> <mailto:tim@schedmd.com> *
>
> We've been batting it around, and I think the floating reservation, possibly
> with a job_submit plugin providing some assistance, is the best way to
> implement this right now. You may need to provide some additional scripting
> alongside that to disable the TIME_FLOAT flag on demand, and/or to recreate the
> reservation once used, but that's all going to be heavily dependent on your
> site requirements.
>
> Slurm generally eschews approaches to resource management that require keeping
> resources idle - the reservation at least makes it obvious that things are
> running in an unusual manner, and I don't see a clean approach to adding this
> as a QoS feature.
>
> I can classify this as an enhancement request if you'd like, but I don't think
> we'd tackle this without additional customer demand, and a good architectural
> approach to implementing it which remains unclear.
>
> - Tim
>
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>
Comment 4 Tim Wickberg 2016-05-12 01:19:48 MDT
(In reply to Steven Young from comment #3)
> Hi, (Hope the bugzilla incoming email configuration works on 
> bugs.schedmd.com),
> [Resending from theteam@arc.ox.ac.uk as I got an error sending from my 
> personal email address]

It does, but you must send from an address with an account in Bugzilla as you've figured out.

> So if someone can provide more details about floating reservation and 
> job_submit plugin (and additional scripting) we will be happy to 
> implement this as a test and then roll it out into production.

There are a few examples of job_submit.lua scripts. Unfortunately there is no specific documentation for the plugin, and your best source of info on which fields are available, and how to look into partition / reservation info is the plugin source itself. (plugins/job_submit/lua/job_submit_lua.c).

The job_submit script can only affect the job itself, I'm not sure yet how you'd change the reservation start time automatically, or how best to maintain that reservation. Again, this isn't something directly supported, so you'd be out on your own for the implementation.

> In my original post to the slurm-devel list:
> 
> <https://groups.google.com/d/msg/slurm-devel/dcCkYt5M0xk/xNL1XUOKLQAJ>
> 
> I described some thinking about having a TIME_FLOAT reservation and 
> having some script watch for jobs with the superpriority QOS which then 
> reduces the reservation to allow the pending jobs to run.  I'll be 
> interested to see how my earlier thinking differs from the solution you 
> have in mind.
> 
> Another comment.  I'm still keen for the jobs being submitted by this 
> special group of users to be submitted with a specific QOS so that the 
> Slurm accounting database is able to provide access control to the 
> special service level.

Testing against a given QOS in the job_submit.lua plugin should be simple, and does sound like a reasonable approach.

- Tim
Comment 5 Tim Wickberg 2016-05-23 08:20:13 MDT
Marking as resolved/infogiven. Please reopen if there's anything else I can help with on this.

- Tim
Comment 6 Steven Young 2016-05-25 00:09:35 MDT
Hi,

Apologies for not getting back to you about our requirement sooner.  I 
do think we want more help.  I just reread the comments on the bug 
report.  The main thing I want is to ask about the thinking that you 
mentioned in Comment 2.  You mentioned a floating reservation with 
possibly a job_submit plugin providing some assistance.  I'd really like 
more details about this thinking.  Can someone please provide an 
explanation of how we might meet our guaranteed start time requirement?

Cheers,
Steve.

On 23/05/2016 22:20, bugs@schedmd.com wrote:
> Tim Wickberg <mailto:tim@schedmd.com> changed bug 2692
> <https://bugs.schedmd.com/show_bug.cgi?id=2692>
> What 	Removed 	Added
> Resolution 	--- 	INFOGIVEN
> Status 	UNCONFIRMED 	RESOLVED
>
> *Comment # 5 <https://bugs.schedmd.com/show_bug.cgi?id=2692#c5> on bug
> 2692 <https://bugs.schedmd.com/show_bug.cgi?id=2692> from Tim Wickberg
> <mailto:tim@schedmd.com> *
>
> Marking as resolved/infogiven. Please reopen if there's anything else I can
> help with on this.
>
> - Tim
Comment 7 Tim Wickberg 2016-05-31 03:42:53 MDT
I haven't tried this, but think it could be made to work. It's not ideal, but we don't have a plan to add a specific-time-to-launch setting at this point.

Roughly, I think you'd want to:

- Create a floating reservation (TIME_FLOAT) for the desired resources with the required lead time.

- Create a job_submit plugin that automatically sets the job to use the reservation, and removes the TIME_FLOAT flag from the reservation.

- Re-create the necessary reservation using an slurmctld epilog script when the job completes.

(In reply to Steven Young from comment #6)
> Hi,
> 
> Apologies for not getting back to you about our requirement sooner.  I 
> do think we want more help.  I just reread the comments on the bug 
> report.  The main thing I want is to ask about the thinking that you 
> mentioned in Comment 2.  You mentioned a floating reservation with 
> possibly a job_submit plugin providing some assistance.  I'd really like 
> more details about this thinking.  Can someone please provide an 
> explanation of how we might meet our guaranteed start time requirement?
> 
> Cheers,
> Steve.
Comment 8 Steven Young 2016-06-01 20:25:00 MDT
Hi Tim,

This doesn't seem sufficient.  Here are some clarifying questions:

Your scheme seems to assume that there will be a single job submitted 
which is intended to use the reserved nodes.  This isn't the case. 
There are a couple of groups (SLURM accounts) which will have access to 
the superpriority QOS.  The users will submit work with the QOS setting 
and then wait, though they will expect that if the work fits into their 
nodes (perhaps specified by a GrpNodes setting for the QOS), then it 
should start within the guaranteed lead time.

How can your scheme work for the situation where there are multiple jobs 
requesting the superpriority QOS?  Could the job_submit plugin move 
nodes from the TIME_FLOAT reservation to a non-floating reservation 
which the job is set to use?

Cheers,
Steve.

On 31/05/16 17:42, bugs@schedmd.com wrote:
> Tim Wickberg <mailto:tim@schedmd.com> changed bug 2692
> <https://bugs.schedmd.com/show_bug.cgi?id=2692>
> What 	Removed 	Added
> Status 	CONFIRMED 	UNCONFIRMED
> Ever confirmed 	1 	
>
> *Comment # 7 <https://bugs.schedmd.com/show_bug.cgi?id=2692#c7> on bug
> 2692 <https://bugs.schedmd.com/show_bug.cgi?id=2692> from Tim Wickberg
> <mailto:tim@schedmd.com> *
>
> I haven't tried this, but think it could be made to work. It's not ideal, but
> we don't have a plan to add a specific-time-to-launch setting at this point.
>
> Roughly, I think you'd want to:
>
> - Create a floating reservation (TIME_FLOAT) for the desired resources with the
> required lead time.
>
> - Create a job_submit plugin that automatically sets the job to use the
> reservation, and removes the TIME_FLOAT flag from the reservation.
>
> - Re-create the necessary reservation using an slurmctld epilog script when the
> job completes.
>
> (In reply to Steven Young fromcomment #6 <show_bug.cgi?id=2692#c6>)
>> Hi,  > > Apologies for not getting back to you about our requirement
> sooner. I > do think we want more help. I just reread the comments on
> the bug > report. The main thing I want is to ask about the thinking
> that you > mentioned in Comment 2 <show_bug.cgi?id=2692#c2>. You
> mentioned a floating reservation with > possibly a job_submit plugin
> providing some assistance. I'd really like > more details about this
> thinking. Can someone please provide an > explanation of how we might
> meet our guaranteed start time requirement? > > Cheers, > Steve.
>
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>
Comment 9 Tim Wickberg 2017-03-08 22:07:09 MST
Hi Steven -

My sincere apologies for not responding to this sooner; I have no excuse for not having handled this a long time ago. Moving on to the question at hand:

Unfortunately I still do not have perfect way to model this.

I have a few different ideas in mind, although none are a perfect match given the requirements.

The first would be to define a "floating" partitions that covers all nodes in the cluster, and restrict access to that partition to the specific accounts that need this higher priority access.

A higher PriorityTier setting (16.05+) would ensure jobs submitted to that partition would always start as soon as resources free up, and block jobs from other partitions from launching. Combined with low MaxTime limits on the other partitions, you could ensure that at least some nodes are available on the required deadline.

"Floating" partitions can be created by setting the Partition Nodes= line to encompass the entire cluster, but then limiting the number of cpus/nodes that the partition can have access to at a given time with a PartitionQOS.

A PartitionQOS allows you to apply any of the limit types available as a QOS to a Partitions - e.g. 'MaxTres=nodes=10' as part of a QOS applied to the partition would permit the partition access to up to 10 nodes at a time out of any referenced in the Nodes definition on that partition.

Unfortunately none of this addresses the specific deadline nature of your request - the only way to ensure this would be use leverage Reservations (as previously discussed, although that does admittedly cause other complications for successive jobs), or to make sure that at least some of the nodes under consideration have a short enough MaxTime limit to be able to free up by the deadline.

A few additional side questions:

- Have you had a change to upgrade to a newer release? Support for the 14.11 series ended in November, and we highly encourage customers to stay current. I think you may find some advantages to modelling part of your requirements with PartitionQOS settings, which are only available in the 15.08 releases and later.

- Do you mind attaching a current copy of your slurm.conf file for reference? That helps me frame this discussion better, and I may be able to make further suggestions based off the specifics settings you're currently using.

A few documentation references that may be of interest follow. If you have any questions I will make sure they are promptly handled, if not by me then by one of our other support engineers. Again, my apologies for not getting back to you much sooner on this.

- Tim

Partition QOS: https://slurm.schedmd.com/qos.html#partition

Preemption (I would suggest Partition based preemption, although QOS preemption may also be applicable depending on the configuration): https://slurm.schedmd.com/preempt.html

(In reply to Steven Young from comment #8)
> Hi Tim,
> 
> This doesn't seem sufficient.  Here are some clarifying questions:
> 
> Your scheme seems to assume that there will be a single job submitted 
> which is intended to use the reserved nodes.  This isn't the case. 
> There are a couple of groups (SLURM accounts) which will have access to 
> the superpriority QOS.  The users will submit work with the QOS setting 
> and then wait, though they will expect that if the work fits into their 
> nodes (perhaps specified by a GrpNodes setting for the QOS), then it 
> should start within the guaranteed lead time.
> 
> How can your scheme work for the situation where there are multiple jobs 
> requesting the superpriority QOS?  Could the job_submit plugin move 
> nodes from the TIME_FLOAT reservation to a non-floating reservation 
> which the job is set to use?
> 
> Cheers,
> Steve.
> 
> On 31/05/16 17:42, bugs@schedmd.com wrote:
> > Tim Wickberg <mailto:tim@schedmd.com> changed bug 2692
> > <https://bugs.schedmd.com/show_bug.cgi?id=2692>
> > What 	Removed 	Added
> > Status 	CONFIRMED 	UNCONFIRMED
> > Ever confirmed 	1 	
> >
> > *Comment # 7 <https://bugs.schedmd.com/show_bug.cgi?id=2692#c7> on bug
> > 2692 <https://bugs.schedmd.com/show_bug.cgi?id=2692> from Tim Wickberg
> > <mailto:tim@schedmd.com> *
> >
> > I haven't tried this, but think it could be made to work. It's not ideal, but
> > we don't have a plan to add a specific-time-to-launch setting at this point.
> >
> > Roughly, I think you'd want to:
> >
> > - Create a floating reservation (TIME_FLOAT) for the desired resources with the
> > required lead time.
> >
> > - Create a job_submit plugin that automatically sets the job to use the
> > reservation, and removes the TIME_FLOAT flag from the reservation.
> >
> > - Re-create the necessary reservation using an slurmctld epilog script when the
> > job completes.
> >
> > (In reply to Steven Young fromcomment #6 <show_bug.cgi?id=2692#c6>)
> >> Hi,  > > Apologies for not getting back to you about our requirement
> > sooner. I > do think we want more help. I just reread the comments on
> > the bug > report. The main thing I want is to ask about the thinking
> > that you > mentioned in Comment 2 <show_bug.cgi?id=2692#c2>. You
> > mentioned a floating reservation with > possibly a job_submit plugin
> > providing some assistance. I'd really like > more details about this
> > thinking. Can someone please provide an > explanation of how we might
> > meet our guaranteed start time requirement? > > Cheers, > Steve.
Comment 10 Tim Wickberg 2017-04-04 16:07:28 MDT
Hi Steven -

I'm marking this closed for now, as I haven't seen a reply to comment 9. Please reopen if there's anything else I can address, or feel free to file a new bug as well.

cheers,
- Tim