While running multi-node MPI jobs using SLURM, CPUs are fully utilized on the batch host only. While the utilization on the other nodes is very low. It looks like all the processes are sharing a single core on each node. They use independent cores only on the batch host. That makes multi-node jobs run forever on a batch host, utilization is almost 100% for each process 55136 mazatyae 20 0 9213248 8.1g 13180 R 99.7 1.6 8:23.24 xhpl 55143 mazatyae 20 0 9228744 8.2g 12684 R 99.7 1.6 8:24.41 xhpl 55144 mazatyae 20 0 9138108 8.1g 12676 R 99.7 1.6 8:23.72 xhpl 55146 mazatyae 20 0 9142304 8.1g 12804 R 99.7 1.6 8:23.63 xhpl 55132 mazatyae 20 0 9279780 8.1g 13288 R 99.3 1.6 8:23.92 xhpl 55133 mazatyae 20 0 9300568 8.2g 13292 R 99.3 1.6 8:24.21 xhpl 55134 mazatyae 20 0 9299688 8.2g 13380 R 99.3 1.6 8:24.23 xhpl on another node running the same job, utilization is very low. Only one core was utilized and seems all processes were sharing it 34772 mazatyae 20 0 8734076 7.4g 10220 R 3.6 1.5 0:16.15 xhpl 34759 mazatyae 20 0 8819632 7.4g 10212 R 3.3 1.5 0:16.14 xhpl 34760 mazatyae 20 0 8819632 7.4g 10228 R 3.3 1.5 0:16.15 xhpl 34761 mazatyae 20 0 8819632 7.4g 10212 R 3.3 1.5 0:16.14 xhpl
Created attachment 17195 [details] Slurm configuration.
Slurm 20.11.1 PMIX 3.2.2 CentOS 7.9
sbatch file: #!/bin/bash #SBATCH -N 2 #SBATCH -n 64 #SBATCH --tasks-per-node=32 #SBATCH --cpus-per-task=4 #SBATCH --partition=batch #SBATCH -J hpl #SBATCH -o hpl-NPS4-32threads.%N.%J.out #SBATCH -e hpl-NPS4-32threads.%N.%J.err #SBATCH --time=04:10:00 #SBATCH --mem=0 #SBATCH --reservation=IBEX_CS #run the application: module load intelstack-default module load openmpi/4.0.1/.gnu-6.4.0 mpirun -np 64 --mca btl self,vader --report-bindings --map-by l3cache -x OMP_NUM_THREADS=4 -x OMP_PROC_BIND=TRUE -x OMP_PLACES=cores ./xhpl $ squeue -j 13329509 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 13329509 batch hpl wickhagj R 0:08 2 cn506-02-l,cn506-03-l cn506-02-l: top - 14:57:00 up 22:32, 1 user, load average: 25.54, 9.20, 3.39 Tasks: 1463 total, 33 running, 1430 sleeping, 0 stopped, 0 zombie %Cpu(s): 25.0 us, 0.0 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 52820464+total, 41650988+free, 10474582+used, 6948924 buff/cache KiB Swap: 31457276 total, 31447756 free, 9520 used. 41612816+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 80613 wickhagj 20 0 3125448 2.6g 13544 R 100.3 0.5 1:45.12 xhpl 80600 wickhagj 20 0 3064572 2.5g 13604 R 100.0 0.5 1:45.04 xhpl 80601 wickhagj 20 0 3085320 2.5g 13556 R 100.0 0.5 1:45.03 xhpl 80602 wickhagj 20 0 3098100 2.5g 13868 R 100.0 0.5 1:45.04 xhpl 80603 wickhagj 20 0 3064572 2.5g 13484 R 100.0 0.5 1:45.04 xhpl 80604 wickhagj 20 0 3133088 2.6g 13524 R 100.0 0.5 1:45.08 xhpl 80605 wickhagj 20 0 3166268 2.6g 13628 R 100.0 0.5 1:45.09 xhpl 80606 wickhagj 20 0 3154176 2.6g 13832 R 100.0 0.5 1:45.08 xhpl cn506-03-l top - 14:57:24 up 22:33, 1 user, load average: 2.05, 0.89, 0.43 Tasks: 1409 total, 6 running, 1403 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.4 us, 0.5 sy, 0.0 ni, 99.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 52820464+total, 50044947+free, 20958684 used, 6796484 buff/cache KiB Swap: 31457276 total, 31457020 free, 256 used. 49999168+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 80416 wickhagj 20 0 474080 68496 11316 S 4.6 0.0 0:05.88 xhpl 80419 wickhagj 20 0 474080 70472 11316 S 4.6 0.0 0:05.92 xhpl 80420 wickhagj 20 0 474080 68508 11324 S 4.6 0.0 0:05.88 xhpl 80424 wickhagj 20 0 474080 68488 11312 S 4.6 0.0 0:05.90 xhpl 80425 wickhagj 20 0 474080 68504 11324 S 4.6 0.0 0:05.91 xhpl 80427 wickhagj 20 0 474080 68492 11312 S 4.6 0.0 0:05.88 xhpl
Bump.
Greg, Sorry for delay. Could you please try: >export SLURM_WHOLE=1 before mpirun call? This is very likely a duplicate of Bug 10383, where you can find more details. cheers, Marcin
Dear Marcin, Thanks for the work around. Confirming it works for us. Please resolve this ticket. With thanks, -Greg
Resolving as duplicate *** This bug has been marked as a duplicate of bug 10383 ***