Bug 6533 - X11 Forwarding Fails with FastX, succeeds with ssh -X
Summary: X11 Forwarding Fails with FastX, succeeds with ssh -X
Status: RESOLVED DUPLICATE of bug 3647
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other bugs)
Version: 18.08.5
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-02-15 12:51 MST by Alex Mamach
Modified: 2019-02-19 09:12 MST (History)
0 users

See Also:
Site: Northwestern
Alineos Sites: ---
Bull/Atos Sites: ---
Confidential Site: ---
Cray Sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: RHEL
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---


Attachments
slurm.conf file (16.48 KB, text/plain)
2019-02-15 12:51 MST, Alex Mamach
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Mamach 2019-02-15 12:51:59 MST
Created attachment 9194 [details]
slurm.conf file

Hi,

We've been working on setting up X11 forwarding for our user applications. Our users typically connect to our cluster using either ssh -X, or, more frequently, a remote display application called FastX.

When we attempt to use X11 forwarding in a job, (for example srun -A myaccount --time 4:00 --partition=mypartition --x11 xclock) after connecting with ssh -X, things work as expected. However, doing the same with FastX generates an error upon job submission: srun: error: Cannot forward to local display. Can only use X11 forwarding with network displays.

After looking into some previous bug reports regarding X11 forwarding, I saw mention that Slurm looks at both the DISPLAY and HOSTNAME variables for the --x11 option. Interestingly, when connecting with ssh -X, DISPLAY will show a value such as localhost:11.0. However, when using FastX, DISPLAY instead has a value like :103, missing the localhost component of the variable.

Interestingly, connecting with FastX allows me to run a GUI application on the login nodes, as well as any compute node I directly connect to with ssh -X, so the issue seems to be particular to Slurm's --x11 flag.

Do you have any thoughts as to what might be going on here? I've attached our slurm.conf file in the event it proves helpful.

Thank you!

Alex
Comment 1 Alex Mamach 2019-02-15 13:25:23 MST
Upon further investigation, I believe this may be due to lines 92-96 in x11_util.c:

	if (display[0] == ':') {
		error("Cannot forward to local display. "
		      "Can only use X11 forwarding with network displays.");
		exit(-1);
	}

If this is in fact the reason, is there any danger in us removing this check? I'm not entirely clear what it's attempting to protect or prevent, and maybe there's a better way for us to navigate than modifying the code.

Thanks again!
Comment 3 Jason Booth 2019-02-15 15:14:57 MST
Hi Alex,

This is a copy and past from https://bugs.schedmd.com/show_bug.cgi?id=6233

Our X11 forwarding implementation cannot connect to unix sockets at this time, this is something we may look at in a future release.

Two options:

- Use "ssh -X localhost", then run "srun --x11" within that SSH session. SSH itself will handle translation between a TCP socket that Slurm's implementation can use to the local unix socket.

- Disable our build-in integration, and use the SPANK X11 plugin instead. Due to differences in how it forwards traffic, it can accommodate use of a unix socket instead of a network socket.

We hope to address these limitations soon and we are actively looking into a possible solution for 19.05 and that work is being tracked through https://bugs.schedmd.com/show_bug.cgi?id=3647.

-Jason
Comment 4 Jason Booth 2019-02-19 09:12:33 MST
Hi Alex,
 
 I am resolving this issue for now. The work that we are doing for X11 is targeted for 19.05 via the following issue.

https://bugs.schedmd.com/show_bug.cgi?id=3647

 Please consult the release notes in the upcoming 19.05 for the details once we have officially released.

*** This bug has been marked as a duplicate of bug 3647 ***