Howdy, What ports (or ranges) need to be open to the master and compute nodes for a job submission host? The host will only submit batch and interactive jobs along with running the usual Slurm commands (squeue, scontrol, etc..). I'm testing this on a virtual machine and find that with the firewall down on the submit host and job submission works. With the firewall running (tcp ports 22 and 6188 open to the cluster) I see from running: srun --partition=long --ntasks=1 --mem-per-cpu=1024 --time=48:00:00 --job-name=rsync -vvvvvv --pty /bin/bash That it's using random high range ports ---snip--- srun: debug2: srun PMI messages to port=36789 srun: debug: Entering slurm_allocation_msg_thr_create() srun: debug: port from net_stream_listen is 40867 ... srun: debug2: initialized job control port 44870 ... srun: debug: initialized stdio listening socket, port 37197 ---snip--- Does a submit host need to have all high level TCP ports open to the cluster? If so, what is the range? Thanks, Mike
This is what we have defined in slurm.conf regarding ports: $ grep -i port /etc/slurm/slurm.conf SlurmctldPort=6817 SlurmdPort=6818 #SchedulerPort= In case the following is relevant, the size of our cluster is as follows: Our current cluster has 96 nodes, 2304 CPU cores, 8 GPUs and 4 Phi We are expanding soon to 114 nodes, 2736 CPU cores, 80 GPUs and 4 Phi I was able work around the issue last night by opening ports 30,000 thru 63,000 to the compute nodes and masters, although I suspect that's not the full range.
Hello Mike, "srun" listens on a random port unless you set SrunPortRange in your slurm.conf file. For example: SrunPortRange=60001-63000 See more here: https://slurm.schedmd.com/slurm.conf.html#OPT_SrunPortRange One important note you'll see in the documentation: Note: A sufficient number of ports must be configured based on the estimated number of srun on the submission nodes considering that srun opens 3 listening ports plus 2 more for every 48 hosts. Example: srun -N 48 will use 5 listening ports. srun -N 50 will use 7 listening ports. srun -N 200 will use 13 listening ports. Let me know if you have any further questions or if I'm okay to close this bug. Regards Tim
Hey Mike - I'm marking this resolved/infogiven while Tim's out on vacation. Please reopen if there's anything further we can answer. cheers, - Tim