Ticket 11609 - job_container/tmpfs: private /tmp directory remains owned by root in some cases
Summary: job_container/tmpfs: private /tmp directory remains owned by root in some cases
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: 20.11.6
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Tim McMullan
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-05-12 12:59 MDT by Jake Rundall
Modified: 2021-06-01 14:39 MDT (History)
3 users (show)

See Also:
Site: NCSA
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 21.08pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Jake Rundall 2021-05-12 12:59:00 MDT
When using JobContainerType=job_container/tmpfs it appears that the job-specific/private /tmp directory remains owned by root in certain cases:

(1) User allocates 1+ nodes w/ salloc. Unless the user uses srun the private /tmp directories remain owned by root, e.g.:
/lscratch/job_tmp/509:
drwx------ 2 root root  6 May 12 13:54 .509
A user might allocate 1+ nodes and then SSH to the nodes to do their work. But in this case the job's /tmp is inaccessible to them. If the user executes srun in this allocation then the job's /tmp becomes owned by them on all nodes in the allocation.

(2) Multi-node job allocated using sbatch. In this case the job's /tmp on the ordinal node is owned by the user but it remains owned by root on the other nodes in the allocation. But similar to #1, if/once an srun command is executed from the batch script, the job's /tmp directory becomes owned by the user and is accessible.
Comment 1 Tim McMullan 2021-05-12 15:02:50 MDT
Thanks for the report, I've been able to reproduce what you are describing here!  I'm taking a look to see what the cause is here and I'll let you know what I find!

Thanks,
--Tim
Comment 5 Tim McMullan 2021-05-21 11:46:23 MDT
As I mentioned in 11673, we did push a patch series to master that fixes this problem (and actually cleans the plugin up a bit at the same time), but had a bug when restarting the slurmd with jobs running.

The proper fix for it required changing the job_container API slightly so could only land in master.

There are some possibilities for a workaround in 20.11 though.

It is tied to *something* attempting to join the container, so if you use the interactive step feature instead of salloc + ssh it should handle the issue you noted initially.

The second ovservation is trickier, is it actually causing an issue or is it just a further observation from seeing the first problem?

Thanks!
--Tim
Comment 6 Jake Rundall 2021-05-24 08:13:17 MDT
Thanks, Tim. The 2nd scenario is actually problematic for batch jobs that use SSH to get MPI communication started.

We were able to just forgo use of job_container/tmpfs for now, and that might remain our chosen path until the fix is usable. But it probably depends on the timeline.

Does the fix going into master mean that it won't be available until the next major release of Slurm (~21.X), or might it be included in 20.11.8?
Comment 7 Tim McMullan 2021-05-25 06:18:58 MDT
(In reply to Jake Rundall from comment #6)
> Thanks, Tim. The 2nd scenario is actually problematic for batch jobs that
> use SSH to get MPI communication started.

Thanks for the extra context there!

> We were able to just forgo use of job_container/tmpfs for now, and that
> might remain our chosen path until the fix is usable. But it probably
> depends on the timeline.
> 
> Does the fix going into master mean that it won't be available until the
> next major release of Slurm (~21.X), or might it be included in 20.11.8?

The fix will be in 21.08, but won't be in 20.11.8 (due to the need to change the API).  In my tests, if anything tries to join the container first (eg there is a prolog script) it will change the ownership before we try to launch the job.

Thanks!
--Tim
Comment 8 Tim McMullan 2021-05-28 11:45:25 MDT
I just wanted to check in and see if this answered your question!

Thanks,
--Tim
Comment 9 Jake Rundall 2021-06-01 11:30:26 MDT
Yes, thank you!
Comment 10 Tim McMullan 2021-06-01 14:39:37 MDT
(In reply to Jake Rundall from comment #9)
> Yes, thank you!

Sounds good!  I'll resolve this for now, but let us know if you have any other questions!

Thanks!
--Tim