Bug 6533 - X11 Forwarding Fails with FastX, succeeds with ssh -X
Summary: X11 Forwarding Fails with FastX, succeeds with ssh -X
Status: RESOLVED DUPLICATE of bug 3647
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other bugs)
Version: 18.08.5
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Director of Support
QA Contact:
Depends on:
Reported: 2019-02-15 12:51 MST by Alex Mamach
Modified: 2019-02-19 09:12 MST (History)
0 users

See Also:
Site: Northwestern
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: RHEL
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---

slurm.conf file (16.48 KB, text/plain)
2019-02-15 12:51 MST, Alex Mamach

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Mamach 2019-02-15 12:51:59 MST
Created attachment 9194 [details]
slurm.conf file


We've been working on setting up X11 forwarding for our user applications. Our users typically connect to our cluster using either ssh -X, or, more frequently, a remote display application called FastX.

When we attempt to use X11 forwarding in a job, (for example srun -A myaccount --time 4:00 --partition=mypartition --x11 xclock) after connecting with ssh -X, things work as expected. However, doing the same with FastX generates an error upon job submission: srun: error: Cannot forward to local display. Can only use X11 forwarding with network displays.

After looking into some previous bug reports regarding X11 forwarding, I saw mention that Slurm looks at both the DISPLAY and HOSTNAME variables for the --x11 option. Interestingly, when connecting with ssh -X, DISPLAY will show a value such as localhost:11.0. However, when using FastX, DISPLAY instead has a value like :103, missing the localhost component of the variable.

Interestingly, connecting with FastX allows me to run a GUI application on the login nodes, as well as any compute node I directly connect to with ssh -X, so the issue seems to be particular to Slurm's --x11 flag.

Do you have any thoughts as to what might be going on here? I've attached our slurm.conf file in the event it proves helpful.

Thank you!

Comment 1 Alex Mamach 2019-02-15 13:25:23 MST
Upon further investigation, I believe this may be due to lines 92-96 in x11_util.c:

	if (display[0] == ':') {
		error("Cannot forward to local display. "
		      "Can only use X11 forwarding with network displays.");

If this is in fact the reason, is there any danger in us removing this check? I'm not entirely clear what it's attempting to protect or prevent, and maybe there's a better way for us to navigate than modifying the code.

Thanks again!
Comment 3 Jason Booth 2019-02-15 15:14:57 MST
Hi Alex,

This is a copy and past from https://bugs.schedmd.com/show_bug.cgi?id=6233

Our X11 forwarding implementation cannot connect to unix sockets at this time, this is something we may look at in a future release.

Two options:

- Use "ssh -X localhost", then run "srun --x11" within that SSH session. SSH itself will handle translation between a TCP socket that Slurm's implementation can use to the local unix socket.

- Disable our build-in integration, and use the SPANK X11 plugin instead. Due to differences in how it forwards traffic, it can accommodate use of a unix socket instead of a network socket.

We hope to address these limitations soon and we are actively looking into a possible solution for 19.05 and that work is being tracked through https://bugs.schedmd.com/show_bug.cgi?id=3647.

Comment 4 Jason Booth 2019-02-19 09:12:33 MST
Hi Alex,
 I am resolving this issue for now. The work that we are doing for X11 is targeted for 19.05 via the following issue.


 Please consult the release notes in the upcoming 19.05 for the details once we have officially released.

*** This bug has been marked as a duplicate of bug 3647 ***