I have already opened Bug 15184 where I presented our hard requirement to be able to skip powering down nodes that are drained, and here I would like to talk about some other ideas I have to improve the whole power saving mechanism. I think it would be really nice if it was possible to update during runtime SuspendExcNodes and SuspendExcParts and make them restart-proof. The interface could be like: ``` scontrol suspendexcnodes=+/-<nodelist> scontrol suspendexcparts=+/-<partitions> ``` What is in SuspendExcNodes and SuspendExcParts should be static and unchangeable, but we could add nodes and partitions dynamically (and be able to revome them later). Also these data should be stored in state files in order to be restorable after restarting slurmctld. What do you think?
This is really a nice idea dynamically update the exclude config for power saving mechanism.
SuspendExcNodes has a peculiar kind of nodelist with the optional ":" separator. This is used to specify groups of nodes from which a certain number should stay online. See: https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendExcNodes Is it important to you to be able to add and remove from lists with this special ":" syntax? If so what is specifically needed for your workflow? -Scott
(In reply to Scott Hilton from comment #6) > SuspendExcNodes has a peculiar kind of nodelist with the optional ":" > separator. This is used to specify groups of nodes from which a certain > number should stay online. > See: https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendExcNodes > > Is it important to you to be able to add and remove from lists with this > special ":" syntax? If so what is specifically needed for your workflow? > > -Scott Hi Scott! No, for our case I would say that this feature of ":" is not needed. As far as I can imagine we will need to exclude from suspension only specific nodes, e.g. because we will want to use them for various reasons (like reserving them for a course or doing some tests on them, running the testsuite etc..) or keeping them online for doing some maintenance, HW work, etc.. Cheers, Valantis
Valantis, We have completed this feature and it should be part of release 23.02. See commits fc5ec8c83f - 77c1c7d7ae. -Scott
(In reply to Scott Hilton from comment #13) > Valantis, > > We have completed this feature and it should be part of release 23.02. See > commits fc5ec8c83f - 77c1c7d7ae. > > > -Scott Hi Scott, that's great! I see that we will be able to dynamically update the excluded states too :) Thank you very much! -Valantis