Created attachment 18305 [details] multipartition gres fix At our site we have a lot of partitions for the different CPU/GPU types. So to make it easy for the user we have written a job_submit.lua script that submit to all cpu partitions or gpu partitions. But this fails when we make use of GRES specification. In this cluster we do not have GPU so i defined a GRES type: * cpu_type We have defined 2 no consumble flagcount only GRES: * e5_2650_v1 * e5_2650_v2 two partitions: * cpu_e5_2650_v1 --> 1 node * cpu_e5_2650_v2 --> 1 node The last partition checked is `cpu_e5_2650_v2`. This important for this example. No we submit a job that require GRES `e5_2650_v1`: * srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash * second job with the same GRES type fails with: ``` srun: error: Unable to allocate resources: Requested node configuration is not available ``` When we use the other GRES type `e5_2650_v2` the second job is queued as I would expect also for the above example. So the error code of the last partition determines the error code. when we use the GRES type `e5_2650_v1` then the last partition `cpu_e5_2650_v2` returns `SLURM_REQUESTED_NODE_CONFIG_UNAVAILABLE = 2014`. That is also returned to the user. But in the first partition it could run but all nodes are busy `ESLURM_NODES_BUSY = 2016`. We should return this state when we have examined all partitions. The patch attached implements this behaviour. Do not know if this is the right approach.
Created attachment 18629 [details] Also add patch for 20.02 version This is the multipartition fix for slurm version 20.02. We are also using this version
Is thers an update on this issue. Will it be addressed or is there a fix in a newer slurm version?