Summary: | power_save module should not suspend drained nodes | ||
---|---|---|---|
Product: | Slurm | Reporter: | Ole.H.Nielsen <Ole.H.Nielsen> |
Component: | slurmctld | Assignee: | Skyler Malinowski <skyler> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | CC: | c.paschoulas |
Version: | 21.08.8 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | DTU Physics | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Ole.H.Nielsen@fysik.dtu.dk
2022-12-06 04:04:49 MST
Hi Ole, The good news is that we agree that drained nodes should not be powered down by power_save. Looks like bug #15184 will be addressing this. This enhancements could land for 23.11 (not confirmed). In the meantime I can look into this as a bug fix. Given the change to Slurm power_save behavior, this bug fix would likely land for 23.02. For now, there are some workarounds to this issue (with 'idle_on_node_suspend' on): - Update SuspendExc with those nodes and reconfigure. - (22.05) Use a new partition with SuspendTime=INFINITE and move the node into this partition with scontrol. Best, Skyler Hi Skyler, (In reply to Skyler Malinowski from comment #1) > The good news is that we agree that drained nodes should not be powered down > by power_save. Looks like bug #15184 will be addressing this. This > enhancements could land for 23.11 (not confirmed). > > In the meantime I can look into this as a bug fix. Given the change to Slurm > power_save behavior, this bug fix would likely land for 23.02. Yeah, this is a pretty bad bug in the power_save module :-( I hope you can get the fix into 23.02. Nodes in state drained as well as maint should be exempted. > For now, there are some workarounds to this issue (with > 'idle_on_node_suspend' on): > - Update SuspendExc with those nodes and reconfigure. > - (22.05) Use a new partition with SuspendTime=INFINITE and move the node > into this partition with scontrol. I like the idea of adding nodes that require maintenance to SuspendExc in slurm.conf. This adds one extra step for maintenance work, however, but that would be acceptable until a proper bug fix is in place. Thanks, Ole (In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #2) > I like the idea of adding nodes that require maintenance to SuspendExc in > slurm.conf. This adds one extra step for maintenance work, however, but > that would be acceptable until a proper bug fix is in place. Dear Ole, please check Bug 15185 ;) If you agree with that, please push from your side too to convince SchedMD to implement that functionality.. Best Regards, Valantis Based also on discussions in bug 15184 and bug 15185 I would like to urge SchedMD to consider for the 23.02 release some fixes for the power_save plugin so that it will handle on-premise nodes in a sensible way. This is really important for all customers in Europe and other regions where electricity prices have soared recently, and HPC centers are being asked to save money by cutting down power consumption as much as possible. IMHO, it should be reconsidered how the power_save plugin should treat on-premise nodes as well as possible. Among the different node states listed in, e.g., the sinfo manual page, it seems to me that nodes with the following states MUST be exempted automatically from suspension by slurmctld: DOWN, DRAIN (for node in DRAINING or DRAINED states), DRAINED, DRAINING, FAIL, MAINT, NO_RESPOND, REBOOT_ISSUED, REBOOT_REQUESTED, RESV, RESERVED, UNK, and UNKNOWN. Nodes that are "drained" must obviously *NOT* be powered off, but also the "down" state is being used at our site whenever we perform software and firmware updates. In fact, I would like to propose that "idle" should be the only state which should be eligible for suspend/powering-down when dealing with on-premise (non-cloud) nodes! Hopefully this could be implemented in the slurmctld code for all nodes that do *not* have a state=cloud. If this is not feasible, yet another slurm.conf parameter might have to be introduced, for example: SuspendExcStates=down,drained,fail,maint,reboot_issued,reserved,unknown Furthermore, please consider also bug 15184 comment 8, where it has been found that slurmctld will start a job even if not all nodes assigned to the job have yet been resumed successfully. The Jülich customer site has had to develop a lot of complex logic to work around problems with the power_save plugin :-( I hope this request makes sense. (In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #4) > Nodes that are "drained" must obviously *NOT* be powered off, but also the > "down" state is being used at our site whenever we perform software and > firmware updates. In fact, I would like to propose that "idle" should be > the only state which should be eligible for suspend/powering-down when > dealing with on-premise (non-cloud) nodes! I can see the benefit of only considering IDLE nodes for power_save. DOWN and DRAIN in particular would make sense not to suspend when node are in those states or have those flags. Certain interactions will need to be reconsidered (e.g. POWER_DOWN_FORCE, POWER_DOWN_ASAP) and fixed/adjusted accordingly. It is entirely possible that this cannot be fixed as a bug due to the scope or knock-on effects that would ripple into bug 15184. Instead this ticket would be marked as a duplicate of bug 15184 or minimally an info-given because of the workaround. I will need to talk internally about how we want to handle this. > Hopefully this could be implemented in the slurmctld code for all nodes that > do *not* have a state=cloud. I do not like branching/conditional handling for CLOUD vs non-CLOUD nodes. I want both to be handled in the same way otherwise admin/user expectations/understanding can get muddled. Beside, the rationale of not powering down IDLE and DRAIN nodes is the regardless of CLOUD or non-CLOUD nodes -- not having power_save interact with the node for debugging or maintenance reasons. > Furthermore, please consider also bug 15184 comment 8, where it has been > found that slurmctld will start a job even if not all nodes assigned to the > job have yet been resumed successfully. The Jülich customer site has had to > develop a lot of complex logic to work around problems with the power_save > plugin :-( There certainly is room for Slurm improvements! And I would imagine that this will be addressed in a future release of Slurm. Your suggestions do make sense and are appreciated but some are out of the scope of this ticket. We will take your words into consideration for bug 15184 and bug 15185. Thank you for voicing similarly felt shortcoming of Slurm. Hi Skyler, (In reply to Skyler Malinowski from comment #6) > It is entirely possible that this cannot be fixed as a bug due to the scope > or knock-on effects that would ripple into bug 15184. Instead this ticket > would be marked as a duplicate of bug 15184 or minimally an info-given > because of the workaround. I will need to talk internally about how we want > to handle this. OK, I appreciate that the power_save plugin is quite complex and needs to fixed very carefully in a coming release, hopefully in 23.02. > I do not like branching/conditional handling for CLOUD vs non-CLOUD nodes. I > want both to be handled in the same way otherwise admin/user > expectations/understanding can get muddled. Beside, the rationale of not > powering down IDLE and DRAIN nodes is the regardless of CLOUD or non-CLOUD > nodes -- not having power_save interact with the node for debugging or > maintenance reasons. I agree with this argument. > > Furthermore, please consider also bug 15184 comment 8, where it has been > > found that slurmctld will start a job even if not all nodes assigned to the > > job have yet been resumed successfully. The Jülich customer site has had to > > develop a lot of complex logic to work around problems with the power_save > > plugin :-( > > There certainly is room for Slurm improvements! And I would imagine that > this will be addressed in a future release of Slurm. Yes please! The soaring prices of electricity in Europe makes this a high priority concern for HPC sites. > Your suggestions do make sense and are appreciated but some are out of the > scope of this ticket. We will take your words into consideration for bug > 15184 and bug 15185. Thank you for voicing similarly felt shortcoming of > Slurm. I appreciate your attentiveness! I hope that SchedMD will act to help customers with saving on their increasing electricity bills. At this time the power_save plugin is unfortunately somewhat lacking. Best regards, Ole Hi Ole, I will mark this ticket as a duplicate of bug 15184. Said ticket encompasses your request and should be the one to make changes for it. It does not make sense for me to make intermediate changes for this ticket. Thanks, Skyler *** This ticket has been marked as a duplicate of ticket 15184 *** |