We have a requirement to allow a set of users to have high priority access to a proportion of our cluster. The extra degree of difficulty is that they have bursty usage, so when they do submit work, they want a guarantee that their jobs will start within 12 hours on their proportion of the cluster. In our cluster the relevant partition for this discussion is our compute partition which has our compute nodes. MaxTime for the compute partition is greater than 12 hours. (In fact currently it is 5 days). We currently use Multifactor Priority with Fair Tree Fairshare. We also currently have two QOS defined: normal and priority. We have Priority Weighting so that Fairshare and QOS are equally weighted, then Age and JobSize are weighted less. Partition weighting is currently zero. Setting up a superpriority QOS which has a GrpNodes setting to the required value will allow us to provide a higher priority access to the required proportion of the cluster, but won't allow us to guarantee the 12 hour start time since we normally have a back-log of jobs asking for multiple days of walltime. I was recently re-reading the SLURM documentation on Reservations (http://slurm.schedmd.com/reservations.html), specifically about Reservations Floating Through Time. Ie, we could create a reservation that has Flags=TIME_FLOAT and StartTime=now+12hours and the nodes assigned to this reservation would only allow jobs with TimeLimit requested of 12 hours or less. That gets us part way to meeting the requirement. Having read about these reservations I am now wondering whether there is any way SLURM could be "improved", so that users from the specific SLURM accounts that should have high priority access can be allowed to run on the reservation.
Hi Steven, we are looking into this. Right now a job must be submitted, or altered later, to a reservation to use it and once submitted it needs to be removed from the job to allow it to run anywhere else. Given that I'm not sure how easy the requested improvement would be. Meaning I don't think we want to allow jobs to use a reservation unless they explicitly asked for it. Let us think a little on this and get back to you. I'm guessing something else will be the solution though.
We've been batting it around, and I think the floating reservation, possibly with a job_submit plugin providing some assistance, is the best way to implement this right now. You may need to provide some additional scripting alongside that to disable the TIME_FLOAT flag on demand, and/or to recreate the reservation once used, but that's all going to be heavily dependent on your site requirements. Slurm generally eschews approaches to resource management that require keeping resources idle - the reservation at least makes it obvious that things are running in an unusual manner, and I don't see a clean approach to adding this as a QoS feature. I can classify this as an enhancement request if you'd like, but I don't think we'd tackle this without additional customer demand, and a good architectural approach to implementing it which remains unclear. - Tim
Hi, (Hope the bugzilla incoming email configuration works on bugs.schedmd.com), [Resending from theteam@arc.ox.ac.uk as I got an error sending from my personal email address] So if someone can provide more details about floating reservation and job_submit plugin (and additional scripting) we will be happy to implement this as a test and then roll it out into production. In my original post to the slurm-devel list: <https://groups.google.com/d/msg/slurm-devel/dcCkYt5M0xk/xNL1XUOKLQAJ> I described some thinking about having a TIME_FLOAT reservation and having some script watch for jobs with the superpriority QOS which then reduces the reservation to allow the pending jobs to run. I'll be interested to see how my earlier thinking differs from the solution you have in mind. Another comment. I'm still keen for the jobs being submitted by this special group of users to be submitted with a specific QOS so that the Slurm accounting database is able to provide access control to the special service level. Cheers, Steve. On 11/05/16 15:59, bugs@schedmd.com wrote: > Tim Wickberg <mailto:tim@schedmd.com> changed bug 2692 > <https://bugs.schedmd.com/show_bug.cgi?id=2692> > What Removed Added > Assignee support@schedmd.com tim@schedmd.com > > *Comment # 2 <https://bugs.schedmd.com/show_bug.cgi?id=2692#c2> on bug > 2692 <https://bugs.schedmd.com/show_bug.cgi?id=2692> from Tim Wickberg > <mailto:tim@schedmd.com> * > > We've been batting it around, and I think the floating reservation, possibly > with a job_submit plugin providing some assistance, is the best way to > implement this right now. You may need to provide some additional scripting > alongside that to disable the TIME_FLOAT flag on demand, and/or to recreate the > reservation once used, but that's all going to be heavily dependent on your > site requirements. > > Slurm generally eschews approaches to resource management that require keeping > resources idle - the reservation at least makes it obvious that things are > running in an unusual manner, and I don't see a clean approach to adding this > as a QoS feature. > > I can classify this as an enhancement request if you'd like, but I don't think > we'd tackle this without additional customer demand, and a good architectural > approach to implementing it which remains unclear. > > - Tim > > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. >
(In reply to Steven Young from comment #3) > Hi, (Hope the bugzilla incoming email configuration works on > bugs.schedmd.com), > [Resending from theteam@arc.ox.ac.uk as I got an error sending from my > personal email address] It does, but you must send from an address with an account in Bugzilla as you've figured out. > So if someone can provide more details about floating reservation and > job_submit plugin (and additional scripting) we will be happy to > implement this as a test and then roll it out into production. There are a few examples of job_submit.lua scripts. Unfortunately there is no specific documentation for the plugin, and your best source of info on which fields are available, and how to look into partition / reservation info is the plugin source itself. (plugins/job_submit/lua/job_submit_lua.c). The job_submit script can only affect the job itself, I'm not sure yet how you'd change the reservation start time automatically, or how best to maintain that reservation. Again, this isn't something directly supported, so you'd be out on your own for the implementation. > In my original post to the slurm-devel list: > > <https://groups.google.com/d/msg/slurm-devel/dcCkYt5M0xk/xNL1XUOKLQAJ> > > I described some thinking about having a TIME_FLOAT reservation and > having some script watch for jobs with the superpriority QOS which then > reduces the reservation to allow the pending jobs to run. I'll be > interested to see how my earlier thinking differs from the solution you > have in mind. > > Another comment. I'm still keen for the jobs being submitted by this > special group of users to be submitted with a specific QOS so that the > Slurm accounting database is able to provide access control to the > special service level. Testing against a given QOS in the job_submit.lua plugin should be simple, and does sound like a reasonable approach. - Tim
Marking as resolved/infogiven. Please reopen if there's anything else I can help with on this. - Tim
Hi, Apologies for not getting back to you about our requirement sooner. I do think we want more help. I just reread the comments on the bug report. The main thing I want is to ask about the thinking that you mentioned in Comment 2. You mentioned a floating reservation with possibly a job_submit plugin providing some assistance. I'd really like more details about this thinking. Can someone please provide an explanation of how we might meet our guaranteed start time requirement? Cheers, Steve. On 23/05/2016 22:20, bugs@schedmd.com wrote: > Tim Wickberg <mailto:tim@schedmd.com> changed bug 2692 > <https://bugs.schedmd.com/show_bug.cgi?id=2692> > What Removed Added > Resolution --- INFOGIVEN > Status UNCONFIRMED RESOLVED > > *Comment # 5 <https://bugs.schedmd.com/show_bug.cgi?id=2692#c5> on bug > 2692 <https://bugs.schedmd.com/show_bug.cgi?id=2692> from Tim Wickberg > <mailto:tim@schedmd.com> * > > Marking as resolved/infogiven. Please reopen if there's anything else I can > help with on this. > > - Tim
I haven't tried this, but think it could be made to work. It's not ideal, but we don't have a plan to add a specific-time-to-launch setting at this point. Roughly, I think you'd want to: - Create a floating reservation (TIME_FLOAT) for the desired resources with the required lead time. - Create a job_submit plugin that automatically sets the job to use the reservation, and removes the TIME_FLOAT flag from the reservation. - Re-create the necessary reservation using an slurmctld epilog script when the job completes. (In reply to Steven Young from comment #6) > Hi, > > Apologies for not getting back to you about our requirement sooner. I > do think we want more help. I just reread the comments on the bug > report. The main thing I want is to ask about the thinking that you > mentioned in Comment 2. You mentioned a floating reservation with > possibly a job_submit plugin providing some assistance. I'd really like > more details about this thinking. Can someone please provide an > explanation of how we might meet our guaranteed start time requirement? > > Cheers, > Steve.
Hi Tim, This doesn't seem sufficient. Here are some clarifying questions: Your scheme seems to assume that there will be a single job submitted which is intended to use the reserved nodes. This isn't the case. There are a couple of groups (SLURM accounts) which will have access to the superpriority QOS. The users will submit work with the QOS setting and then wait, though they will expect that if the work fits into their nodes (perhaps specified by a GrpNodes setting for the QOS), then it should start within the guaranteed lead time. How can your scheme work for the situation where there are multiple jobs requesting the superpriority QOS? Could the job_submit plugin move nodes from the TIME_FLOAT reservation to a non-floating reservation which the job is set to use? Cheers, Steve. On 31/05/16 17:42, bugs@schedmd.com wrote: > Tim Wickberg <mailto:tim@schedmd.com> changed bug 2692 > <https://bugs.schedmd.com/show_bug.cgi?id=2692> > What Removed Added > Status CONFIRMED UNCONFIRMED > Ever confirmed 1 > > *Comment # 7 <https://bugs.schedmd.com/show_bug.cgi?id=2692#c7> on bug > 2692 <https://bugs.schedmd.com/show_bug.cgi?id=2692> from Tim Wickberg > <mailto:tim@schedmd.com> * > > I haven't tried this, but think it could be made to work. It's not ideal, but > we don't have a plan to add a specific-time-to-launch setting at this point. > > Roughly, I think you'd want to: > > - Create a floating reservation (TIME_FLOAT) for the desired resources with the > required lead time. > > - Create a job_submit plugin that automatically sets the job to use the > reservation, and removes the TIME_FLOAT flag from the reservation. > > - Re-create the necessary reservation using an slurmctld epilog script when the > job completes. > > (In reply to Steven Young fromcomment #6 <show_bug.cgi?id=2692#c6>) >> Hi, > > Apologies for not getting back to you about our requirement > sooner. I > do think we want more help. I just reread the comments on > the bug > report. The main thing I want is to ask about the thinking > that you > mentioned in Comment 2 <show_bug.cgi?id=2692#c2>. You > mentioned a floating reservation with > possibly a job_submit plugin > providing some assistance. I'd really like > more details about this > thinking. Can someone please provide an > explanation of how we might > meet our guaranteed start time requirement? > > Cheers, > Steve. > > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. >
Hi Steven - My sincere apologies for not responding to this sooner; I have no excuse for not having handled this a long time ago. Moving on to the question at hand: Unfortunately I still do not have perfect way to model this. I have a few different ideas in mind, although none are a perfect match given the requirements. The first would be to define a "floating" partitions that covers all nodes in the cluster, and restrict access to that partition to the specific accounts that need this higher priority access. A higher PriorityTier setting (16.05+) would ensure jobs submitted to that partition would always start as soon as resources free up, and block jobs from other partitions from launching. Combined with low MaxTime limits on the other partitions, you could ensure that at least some nodes are available on the required deadline. "Floating" partitions can be created by setting the Partition Nodes= line to encompass the entire cluster, but then limiting the number of cpus/nodes that the partition can have access to at a given time with a PartitionQOS. A PartitionQOS allows you to apply any of the limit types available as a QOS to a Partitions - e.g. 'MaxTres=nodes=10' as part of a QOS applied to the partition would permit the partition access to up to 10 nodes at a time out of any referenced in the Nodes definition on that partition. Unfortunately none of this addresses the specific deadline nature of your request - the only way to ensure this would be use leverage Reservations (as previously discussed, although that does admittedly cause other complications for successive jobs), or to make sure that at least some of the nodes under consideration have a short enough MaxTime limit to be able to free up by the deadline. A few additional side questions: - Have you had a change to upgrade to a newer release? Support for the 14.11 series ended in November, and we highly encourage customers to stay current. I think you may find some advantages to modelling part of your requirements with PartitionQOS settings, which are only available in the 15.08 releases and later. - Do you mind attaching a current copy of your slurm.conf file for reference? That helps me frame this discussion better, and I may be able to make further suggestions based off the specifics settings you're currently using. A few documentation references that may be of interest follow. If you have any questions I will make sure they are promptly handled, if not by me then by one of our other support engineers. Again, my apologies for not getting back to you much sooner on this. - Tim Partition QOS: https://slurm.schedmd.com/qos.html#partition Preemption (I would suggest Partition based preemption, although QOS preemption may also be applicable depending on the configuration): https://slurm.schedmd.com/preempt.html (In reply to Steven Young from comment #8) > Hi Tim, > > This doesn't seem sufficient. Here are some clarifying questions: > > Your scheme seems to assume that there will be a single job submitted > which is intended to use the reserved nodes. This isn't the case. > There are a couple of groups (SLURM accounts) which will have access to > the superpriority QOS. The users will submit work with the QOS setting > and then wait, though they will expect that if the work fits into their > nodes (perhaps specified by a GrpNodes setting for the QOS), then it > should start within the guaranteed lead time. > > How can your scheme work for the situation where there are multiple jobs > requesting the superpriority QOS? Could the job_submit plugin move > nodes from the TIME_FLOAT reservation to a non-floating reservation > which the job is set to use? > > Cheers, > Steve. > > On 31/05/16 17:42, bugs@schedmd.com wrote: > > Tim Wickberg <mailto:tim@schedmd.com> changed bug 2692 > > <https://bugs.schedmd.com/show_bug.cgi?id=2692> > > What Removed Added > > Status CONFIRMED UNCONFIRMED > > Ever confirmed 1 > > > > *Comment # 7 <https://bugs.schedmd.com/show_bug.cgi?id=2692#c7> on bug > > 2692 <https://bugs.schedmd.com/show_bug.cgi?id=2692> from Tim Wickberg > > <mailto:tim@schedmd.com> * > > > > I haven't tried this, but think it could be made to work. It's not ideal, but > > we don't have a plan to add a specific-time-to-launch setting at this point. > > > > Roughly, I think you'd want to: > > > > - Create a floating reservation (TIME_FLOAT) for the desired resources with the > > required lead time. > > > > - Create a job_submit plugin that automatically sets the job to use the > > reservation, and removes the TIME_FLOAT flag from the reservation. > > > > - Re-create the necessary reservation using an slurmctld epilog script when the > > job completes. > > > > (In reply to Steven Young fromcomment #6 <show_bug.cgi?id=2692#c6>) > >> Hi, > > Apologies for not getting back to you about our requirement > > sooner. I > do think we want more help. I just reread the comments on > > the bug > report. The main thing I want is to ask about the thinking > > that you > mentioned in Comment 2 <show_bug.cgi?id=2692#c2>. You > > mentioned a floating reservation with > possibly a job_submit plugin > > providing some assistance. I'd really like > more details about this > > thinking. Can someone please provide an > explanation of how we might > > meet our guaranteed start time requirement? > > Cheers, > Steve.
Hi Steven - I'm marking this closed for now, as I haven't seen a reply to comment 9. Please reopen if there's anything else I can address, or feel free to file a new bug as well. cheers, - Tim