Note: though this is suspiciously similar to 13342, I'm not sure of the rules here, so posting as a new bug -- please merge with it if desired. slurmd.conf has LaunchParameters=use_interactive_step PrologFlags=X11 InteractiveStepOptions="--interactive --pty --preserve-env --mpi=none $SHELL" "salloc --x11" runs successfully and connects the user to the allocated node. But running any X program from the node fails with "Error: Can't open display: localhost:xx.0". DISPLAY gets correctly set, xauth has the right cookie copied from source host, so nothing looks obviously wrong. slurm logs show this error: [2022-05-22T00:42:29.439] [91.extern] error: _x11_socket_read: slurm_open_msg_conn: Connection refused Interestingly, everything works fine if the job is submitted from a host other than the slurm control host. This was tested by ensuring the job lands in the same compute node. Mine is a smallish cluster and the login node is also the slurm control node. Before anyone asks: Not using --x11, but running "salloc /usr/bin/bash" followed by "ssh -X $SLURM_NODELIST xeyes" does work irrespective of the submission node.
Solved: all nodes had a line in /etc/hosts as 127.0.1.1 <nodename> Ubuntu adds this when the hostname is set. Removing this on the controller fixes the issue. I guess this entry makes the controller provide 127.0.1.1 as the IP of the starting host to slurmstepd. Not sure why only x11 forwarding is affected.