I am testing to exclude users in an account from starting any new jobs during a reservation created as follows: $ scontrol create reservation starttime=11:00:00 duration=12:00:00 flags=ignore_jobs ReservationName=migrate_ecs nodes=ALL Accounts=-ecsstud The starttime is within the hour, and the account does have running jobs that would conflict with the reservation, hence I use the flags=ignore_jobs during my testing. The purpose of this reservation is that I would like to migrate all user home directories in account "ecsstud" from one NFS file server to another NFS file server, so I have to ensure that there are no running jobs from this account during a particular time interval. All other accounts in the system should continue to run jobs unaffected by the reservation. Unfortunately, the above reservation apparently blocks all user jobs in the system: $ squeue | grep Reserved 3589143_12 xeon16 normal job xxxxx catvip PENDING 270196 0:00 2021-04-19 1-00:00:00 2 32 3900M (ReqNodeNotAvail, Reserved for maintenance) 3590179_1 xeon16 normal job xxxx catvip PENDING 269973 0:00 2021-04-19 1-00:00:00 2 32 3900M (ReqNodeNotAvail, Reserved for maintenance) 3590233 xeon16 normal ann yyyyy ecsvip PENDING 267121 0:00 2021-04-19 2-02:00:00 1 16 60G (ReqNodeNotAvail, Reserved for maintenance) (many lines deleted) According to the scontrol man-page, the flags=ignore_jobs, which I used in this test, should not necessarily imply (ReqNodeNotAvail, Reserved for maintenance). Question: Can you confirm the apparent implication ignore_jobs => maintenance? When I omit the flags=ignore_jobs and select a starttime beyond the longest currently running job: $ scontrol create reservation starttime=2021-05-01T11:00:00 duration=12:00:00 ReservationName=migrate_ecs nodes=ALL Accounts=-ecsstud then I don't see the blocked jobs problem. Question: Can you offer me advice on the idea of creating a reservation which excludes ALL nodes for a few selected accounts?
Hi Ole, You've got the right idea for the reservation, but it looks like you're just missing a couple flags that should get it to do what you want. If you add the 'FLEX' flag it will allow jobs that qualify for the reservation to start before the reservation begins and continue after it starts, rather than the default behavior of waiting until they can start during the time of the reservation. Another flag you would want to add is the 'MAGNETIC' flag. This makes it so that any job that qualifies for the reservation is allowed to run in that reservation, without having requested it at submit time. Here's an example of how it would look with these flags added to what you were already doing. $ scontrol create reservation reservationname=exclude_account starttime=12:10:00 duration=30:00 flags=ignore_jobs,magnetic,flex nodes=ALL accounts=-sub1 Reservation created: exclude_account $ scontrol show res ReservationName=exclude_account StartTime=2021-04-19T12:10:00 EndTime=2021-04-19T12:40:00 Duration=00:30:00 Nodes=kitt,node[01-18] NodeCnt=19 CoreCnt=456 Features=(null) PartitionName=(null) Flags=FLEX,IGNORE_JOBS,SPEC_NODES,ALL_NODES,MAGNETIC TRES=cpu=456 Users=(null) Groups=(null) Accounts=-sub1 Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a MaxStartDelay=(null) I submit one job to the account that is excluded (sub1) and another to an account that is able to run (sub2). $ sbatch -N1 -t10:00 -Asub1 --wrap='srun sleep 600' Submitted batch job 26216 $ sbatch -N1 -t10:00 -Asub2 --wrap='srun sleep 600' Submitted batch job 26217 The job requesting the 'sub2' account is able to start while the 'sub1' job is held. $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 26216 debug wrap ben PD 0:00 1 (ReqNodeNotAvail, Reserved for maintenance) 26217 debug wrap ben R 0:02 1 node01 Let me know if you have any questions about this or if you don't see the same behavior. Thanks, Ben
Hi Ben, Thanks for the useful suggestions of flags=flex,magnetic. I just now created a new reservation about 1 hour into the future: $ scontrol create reservation starttime=11:00:00 duration=1:00:00 flags=ignore_jobs,magnetic,flex ReservationName=migrate_ecs nodes=ALL Accounts=-ecsstud Reservation created: migrate_ecs $ scontrol show reservation ReservationName=migrate_ecs StartTime=2021-04-20T11:00:00 EndTime=2021-04-20T12:00:00 Duration=01:00:00 Nodes=a[001-128],b[001-012],c[001-196],d[001-019,021-033,035-054,056-068],g[001-021,024-066,068-110],h[001-002],i[004-050],s[001-004],x[001-192] NodeCnt=753 CoreCnt=21224 Features=(null) PartitionName=(null) Flags=FLEX,IGNORE_JOBS,SPEC_NODES,ALL_NODES,MAGNETIC TRES=cpu=21384 Users=(null) Groups=(null) Accounts=-ecsstud Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a MaxStartDelay=(null) Unfortunately, this still causes queued jobs to get the incorrect state (ReqNodeNotAvail, Reserved for maintenance): $ squeue | grep Reser | head 3597704 xeon16 normal Ba2HNNH- xxxx ecsvip PENDING 260389 0:00 2021-04-20 6-06:00:00 8 128 3900M (ReqNodeNotAvail, Reserved for maintenance) 3569139 xeon24 normal asr.gs@c zzzz camdvip PENDING 260289 0:00 2021-04-10 2-00:00:00 3 72 250000M (ReqNodeNotAvail, Reserved for maintenance) 3596587 xeon24 normal job yyyy catvip PENDING 260246 0:00 2021-04-20 2-00:00:00 6 144 10000M (ReqNodeNotAvail, Reserved for maintenance) 3569162 xeon24 normal asr.gs@c zzzz camdvip PENDING 260181 0:00 2021-04-10 2-00:00:00 3 72 250000M (ReqNodeNotAvail, Reserved for maintenance) 3575434 xeon24 normal asr.gs@c zzzz camdvip PENDING 260096 0:00 2021-04-12 2-02:00:00 3 72 250000M (ReqNodeNotAvail, Reserved for maintenance) 3575447 xeon24 normal asr.gs@c zzzz camdvip PENDING 260068 0:00 2021-04-12 2-00:00:00 3 72 250000M (ReqNodeNotAvail, Reserved for maintenance) 3575448 xeon24 normal asr.gs@c zzzz camdvip PENDING 259983 0:00 2021-04-12 2-00:00:00 3 72 250000M (ReqNodeNotAvail, Reserved for maintenance) 3575466 xeon24 normal asr.gs@c zzzz camdvip PENDING 259915 0:00 2021-04-12 2-00:00:00 3 72 250000M (ReqNodeNotAvail, Reserved for maintenance) 3576381 xeon24 normal asr.gs@c zzzz camdvip PENDING 259747 0:00 2021-04-12 2-00:00:00 3 72 250000M (ReqNodeNotAvail, Reserved for maintenance) 3576437 xeon24 normal asr.gs@c zzzz camdvip PENDING 259728 0:00 2021-04-12 2-00:00:00 3 72 250000M (ReqNodeNotAvail, Reserved for maintenance) Do you have any ideas how to avoid this problem? Thanks, Ole
Hi Ole, I think you may be seeing this Reason for queued jobs that don't have resources to start immediately. When the reservation is in place it's showing that as a reason when there aren't resources free, even though the jobs would be able to run in the reservation. Here's an example showing this. I create a reservation on all my nodes, as shown previously. $ scontrol create reservation reservationname=exclude_account starttime=13:40:00 duration=30:00 flags=ignore_jobs,magnetic,flex nodes=ALL accounts=-sub1 Reservation created: exclude_account Then I submit a job that uses all the nodes and a second job that just requests one. $ sbatch -N19 --exclusive -t10:00 -Asub2 --wrap='srun sleep 600' Submitted batch job 26227 $ sbatch -N1 --exclusive -t10:00 -Asub2 --wrap='srun sleep 10' Submitted batch job 26228 The system-wide job starts first, so my small job isn't able to start and shows a Reason of "ReqNodeNotAvail, Reserved for maintenance". $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 26228 debug wrap ben PD 0:00 1 (ReqNodeNotAvail, Reserved for maintenance) 26227 debug wrap ben R 0:13 19 kitt,node[01-18] In the output you sent I only see the jobs that have this as a reason, so I don't know what the availability of system resources looked like at the time that you got that output. Did you see some jobs start during this time? Let me know if it doesn't look like the example I showed seems to apply in your case. Thanks, Ben
Hi Ben, (In reply to Ben Roberts from comment #3) > I think you may be seeing this Reason for queued jobs that don't have > resources to start immediately. When the reservation is in place it's > showing that as a reason when there aren't resources free, even though the > jobs would be able to run in the reservation. Here's an example showing > this. It seems to me that your reservation exclude_account impacts all accounts (such as sub2 and its two jobs), when my expectation was that it would only be noticeable to users in the sub1 account. > The system-wide job starts first, so my small job isn't able to start and > shows a Reason of "ReqNodeNotAvail, Reserved for maintenance". > $ squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 26228 debug wrap ben PD 0:00 1 > (ReqNodeNotAvail, Reserved for maintenance) > 26227 debug wrap ben R 0:13 19 > kitt,node[01-18] Yes, this is what I don't understand: I would expect job 26228 to be Pending with a state of Resources in stead, just as if the reservation exclude_account didn't exist. > In the output you sent I only see the jobs that have this as a reason, so I > don't know what the availability of system resources looked like at the time > that you got that output. Did you see some jobs start during this time? > Let me know if it doesn't look like the example I showed seems to apply in > your case. I created a new reservation now, just as in Comment 2, and all Pending jobs now have the unexpected Reason (except for jobs with a Dependency): $ squeue -t PD | head JOBID PARTITION QOS NAME USER ACCOUNT STATE PRIORITY TIME SUBMIT_TIM TIME_LIMIT NODES CPUS MIN_MEM NODELIST(REASON) 3603165 xeon16 normal FeCoNi_n xxx camdstud PENDING 349746 0:00 2021-04-21 1-00:00:00 1 16 60000M (ReqNodeNotAvail, Reserved for maintenance) 3603164 xeon16 normal FeCoNi_n xxx camdstud PENDING 349746 0:00 2021-04-21 1-00:00:00 1 16 60000M (ReqNodeNotAvail, Reserved for maintenance) 3603163 xeon16 normal FeCoNi_n xxx camdstud PENDING 349746 0:00 2021-04-21 1-00:00:00 1 16 60000M (ReqNodeNotAvail, Reserved for maintenance) 3603162 xeon16 normal FeCoNi_n xxx camdstud PENDING 349746 0:00 2021-04-21 1-00:00:00 1 16 60000M (ReqNodeNotAvail, Reserved for maintenance) 3603161 xeon16 normal FeCoNi_n xxx camdstud PENDING 349746 0:00 2021-04-21 1-00:00:00 1 16 60000M (ReqNodeNotAvail, Reserved for maintenance) 3603160 xeon16 normal FeCoNi_n xxx camdstud PENDING 349746 0:00 2021-04-21 1-00:00:00 1 16 60000M (ReqNodeNotAvail, Reserved for maintenance) 3603166 xeon16 normal FeCoNi_n xxx camdstud PENDING 349737 0:00 2021-04-21 1-00:00:00 1 16 60000M (ReqNodeNotAvail, Reserved for maintenance) 3598955 xeon16 normal graphene yyy camdvip PENDING 328415 0:00 2021-04-20 2:00:00 2 32 62.50G (Dependency) 3602892 xeon16 normal FeCN zzz ecsvip PENDING 303439 0:00 2021-04-21 7-00:00:00 6 96 3900M (ReqNodeNotAvail, Reserved for maintenance) In the slurmctld logfile I do see that jobs are starting both before and after the reservation's starttime, so that part seems to work correctly. Maybe I don't understand the concept of Reservations deeply enough, but in the present scenario I think the Reason=(ReqNodeNotAvail, Reserved for maintenance) should not be printed for accounts that are unaffected by the reservation. I guess my observations boil down to a request, referring to your example in Comment 3: 1. Jobs in account sub1 should be blocked with Reason=(ReqNodeNotAvail, Reserved for maintenance). 2. Jobs in all other accounts (such as sub2) should simply be Pending with Reason=Resources, just as if the reservationname=exclude_account didn't exist at all. Does this make sense to you? If so, could I ask the Slurm developers to consider this request for a future 20.11.x version? Thanks, Ole
Hi Ole, Thanks for your patience while we looked at this. Changing the behavior to have the Reason for jobs in this situation show "Resources" has been identified as an enhancement. This enhancement will be worked on by another engineer, but there isn't a target version identified for this work. We'll let you know when there is progress on it. Thanks, Ben