Summary: | Reason=ReqNodeNotAvail, but unavailable nodes not in requested partition | ||
---|---|---|---|
Product: | Slurm | Reporter: | pbisbal |
Component: | Scheduling | Assignee: | Tim Wickberg <tim> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | ||
Version: | 17.11.4 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Princeton Plasma Physics Laboratory (PPPL) | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
pbisbal
2018-05-07 15:26:43 MDT
Looks like this is a collision with a maintenance reservation I have starting Friday at 5 PM: # scontrol show res ReservationName=root_10 StartTime=2018-05-11T17:00:00 EndTime=2018-05-12T17:00:00 Duration=1-00:00:00 Nodes=beast002,dawson[027-062,064-065,067-068,071-072,074-083,085-099,101-103,105,107-110,112-162],ellis[001-010],fielder[001-011],ganesh[20-22,24-27],greene[001-048],jassby[001-006],kruskal[001,003-005,007-008,010-016,018,020-028,030-036],mccune[001-040] NodeCnt=279 CoreCnt=7080 Features=(null) PartitionName=(null) Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES TRES=cpu=7112 Users=root Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a We recently changed the max. time limit for our most commonly used partitions to be only 48 hours (down from 8 days), so I wasn't thinking about the smaller, less used partitions that still had longer time limits that could intersect with the maintenance window. So this appears to be my fault, but is it possible to change the reason for jobs that are stuck in pending due to a reservation? "ReqNodeNotAvail" is completely misleading, since the nodes actually are available, just not for the time requested. Changing the reason for this situation will result in a lot less work for me (and all other Slurm admins, I bet) in the days/hours preceding a reservation. Prentice We cannot change this currently on 17.11, but we are working on changing that reason field for the 18.08; I'm closing this bug as a duplicate of bug 4987 which is tracking work on that. - Tim *** This ticket has been marked as a duplicate of ticket 4987 *** That's perfect. Thanks. |