Bug 3856 - Ports that must be open on a Submit Host's Firewall?
Summary: Ports that must be open on a Submit Host's Firewall?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other bugs)
Version: 16.05.8
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-06-01 10:08 MDT by UAB Research Computing
Modified: 2017-06-05 21:18 MDT (History)
1 user (show)

See Also:
Site: UAB
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description UAB Research Computing 2017-06-01 10:08:24 MDT
Howdy,
 
What ports (or ranges) need to be open to the master and compute nodes for a job submission host? The host will only submit batch and interactive jobs along with running the usual Slurm commands (squeue, scontrol, etc..).
 
I'm testing this on a virtual machine and find that with the firewall down on the submit host and job submission works. With the firewall running (tcp ports 22 and 6188 open to the cluster)
 
I see from running:
 
srun --partition=long --ntasks=1 --mem-per-cpu=1024 --time=48:00:00 --job-name=rsync -vvvvvv --pty /bin/bash
 
That it's using random high range ports
---snip---
srun: debug2: srun PMI messages to port=36789
srun: debug:  Entering slurm_allocation_msg_thr_create()
srun: debug:  port from net_stream_listen is 40867
...
srun: debug2: initialized job control port 44870
...
srun: debug:  initialized stdio listening socket, port 37197
---snip---
 
Does a submit host need to have all high level TCP ports open to the cluster? If so, what is the range?
 
Thanks, Mike
Comment 2 UAB Research Computing 2017-06-01 10:32:42 MDT
This is what we have defined in slurm.conf regarding ports:

$ grep -i port /etc/slurm/slurm.conf
SlurmctldPort=6817
SlurmdPort=6818
#SchedulerPort=

In case the following is relevant, the size of our cluster is as follows:
Our current cluster has 96 nodes, 2304 CPU cores, 8 GPUs and 4 Phi

We are expanding soon to 114 nodes, 2736 CPU cores, 80 GPUs and 4 Phi

I was able work around the issue last night by opening ports 30,000 thru 63,000 to the compute nodes and masters, although I suspect that's not the full range.
Comment 3 Tim Shaw 2017-06-02 08:44:18 MDT
Hello Mike,

"srun" listens on a random port unless you set SrunPortRange in your slurm.conf file.  For example:

SrunPortRange=60001-63000

See more here:

https://slurm.schedmd.com/slurm.conf.html#OPT_SrunPortRange

One important note you'll see in the documentation:

Note: A sufficient number of ports must be configured based on the estimated number of srun on the submission nodes considering that srun opens 3 listening ports plus 2 more for every 48 hosts. Example:

srun -N 48 will use 5 listening ports.
srun -N 50 will use 7 listening ports.
srun -N 200 will use 13 listening ports.

Let me know if you have any further questions or if I'm okay to close this bug.

Regards

Tim
Comment 4 Tim Wickberg 2017-06-05 21:18:10 MDT
Hey Mike -

I'm marking this resolved/infogiven while Tim's out on vacation. Please reopen if there's anything further we can answer.

cheers,
- Tim