Ticket 12369 - Magnetic reservation does not attract jobs if a job on the magnetic reservation node is in 'CG' state'
Summary: Magnetic reservation does not attract jobs if a job on the magnetic reservati...
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 20.02.7
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Dominik Bartkiewicz
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-08-27 06:40 MDT by Bas van der Vlies
Modified: 2021-08-30 03:39 MDT (History)
1 user (show)

See Also:
Site: SURF
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Bas van der Vlies 2021-08-27 06:40:15 MDT
I had to report another issue for magnetic reservation, but it came from:
 * bug 12350, comment 6

And I got a response:
 * bug 12350, comment 11

I will summarize it I create a magnetic reservation on a a 16 core node. The trick to trigger this is not the response test but the test I try to describe:
 * terminal 1: watch -n1 "squeue -u <username> | sort"
 * terminal 2: submit 4 jobs --> these jobs are scheduled on the magnetic reservation node.
 * terminal 1: see if a job is in "CG" state
 * terminal 2: submit quickly another 4 jobs --> These jobs are scheduled on other nodes due to the "CG" state.

If there are no jobs in the "CG" state I can just submit the jobs and they are scheduled to the reservation:
 * terminal 2: submit 4 jobs
 * wait 1 sec 
 * terminal 2: again submit 4 jobs
 * All these jobs end up the magnetic reservation node. 

regards 

Bas
Comment 1 jaap.dijkshoorn@surf.nl 2021-08-27 06:55:39 MDT
This is even triggered with only 1 job on a node. As soon as that job is in the CG state, other jobs are scheduled on another node.
Comment 2 Bas van der Vlies 2021-08-27 08:19:34 MDT
When we add the '--reservation=magentic' on the command line it  will go to the 'PD' state:
 * 2739    shared submit.s      bas PD       0:00      1 (Resources)

and wait till the job with 'CG' state has been finished. Then we can submit jobs again and the node is accepting the jobs till another job is in the 'CG' state. 

The question is is the 'CG' state a blocking state for scheduling jobs on a node and is there option to override it?

Thanks

Bas
Comment 3 Bas van der Vlies 2021-08-30 03:39:56 MDT
I read from the pages that this is the expected behaviour and there are options for it: CompleteWait and reduce_completing_frag, We have to reduce the time spent in the slurmd EpiLog script.

This issue can be closed.

Bas