Summary: | job_container/tmpfs empty at start | ||
---|---|---|---|
Product: | Slurm | Reporter: | Paul Edmon <pedmon> |
Component: | slurmd | Assignee: | Marshall Garey <marshall> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 20.11.5 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Harvard University | Alineos Sites: | --- |
Bull/Atos Sites: | --- | Confidential Site: | --- |
Cray Sites: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
SFW Sites: | --- | SNIC sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Description
Paul Edmon
2021-03-23 13:10:26 MDT
From what I can see you don't have anything misconfigured. Here's how the job_container/tmpfs plugin works. Here's my job_container.conf: # job_container.conf AutoBasePath=true BasePath=/home/marshall/job_containers First, a quick explanation for why my directories will look different than yours: I'm using multiple-slurmd to simulate a real cluster but it's all just on a single node. So with multiple-slurmd there will be a directory inside BasePath for each node, and the job_id directories for the jobs are created inside the node(s) where the job resides. I have nodes n1-[1-10] defined in my slurm.conf. You won't see the nodename directories inside basepath since you aren't using multiple-slurmd, just the job id. Before a job is launched, slurmd creates a mount namespace for the job in /<basepath>/<job_id>/.ns. slurmd makes the root (/) directory a private mount for the job. Then, a private /tmp directory is mounted at <BasePath>/</job_id>/.<job_id>. Then the basepath directory is unmounted so the job can't see the basepath mount, but the job can see the private /tmp mount. From outside the job, you can't see the job's /tmp mount (since it's private) but can see the basepath mount and mount namespace for the job. root can view what's actually in the job's /tmp directories by looking at basepath/jobid/.jobid. In addition, /dev/shm is *not* mounted at the basepath. /dev/shm is unmounted; then a new private tmpfs is created at /dev/shm. Since slurmd runs as root, the directories in basepath are owned by root. To try to illustrate how this is working, here are excerpts from findmnt from inside and outside the job. Viewing mounts from outside the job: $ findmnt -l -o target,source,fstype,propagation TARGET SOURCE FSTYPE PROPAGATION / /dev/nvme0n1p5 ext4 shared /dev/shm tmpfs tmpfs shared /home/marshall/job_containers/n1-1 /dev/nvme0n1p5[/home/marshall/job_containers/n1-1] ext4 private /home/marshall/job_containers/n1-1/609/.ns nsfs[mnt:[4026533234]] nsfs private Viewing mounts from within the job: $ findmnt -l -o target,source,fstype,propagation TARGET SOURCE FSTYPE PROPAGATION / /dev/nvme0n1p5 ext4 private /tmp /dev/nvme0n1p5[/home/marshall/job_containers/n1-1/609/.609] ext4 private /dev/shm tmpfs tmpfs private Here's my current /tmp directory from outside the job: $ ls /tmp cscope.49356/ sddm-:0-DxMYRL= sddm-auth9350208e-6093-4d49-b324-ee6e2eb46be7= ssh-I9zVn59cDxhk/ systemd-private-4864c0321a9141f18c5b1ca5545cd58b-apache2.service-6jY2vg/ systemd-private-4864c0321a9141f18c5b1ca5545cd58b-colord.service-rnwl5g/ systemd-private-4864c0321a9141f18c5b1ca5545cd58b-haveged.service-r7ENHf/ systemd-private-4864c0321a9141f18c5b1ca5545cd58b-ModemManager.service-tB7D9g/ systemd-private-4864c0321a9141f18c5b1ca5545cd58b-systemd-logind.service-ZlsM6g/ systemd-private-4864c0321a9141f18c5b1ca5545cd58b-systemd-resolved.service-HZKwJf/ systemd-private-4864c0321a9141f18c5b1ca5545cd58b-systemd-timesyncd.service-FqkVDi/ systemd-private-4864c0321a9141f18c5b1ca5545cd58b-upower.service-GvkFCi/ Temp-3dca24b8-59a7-44da-bd35-bf28d885c0da/ Temp-8a7d60ac-d3fa-4f72-8c03-644a5fc24bd0/ xauth-1000-_0 And my /tmp directory from inside the job: $ ls /tmp $ <It's empty>. For a user to use these private /tmp or /dev/shm mounts in their job, they simply need to use "/tmp" or "/dev/shm". They can't use /basepath/jobid. The specifics of how these are mounted are hidden from the user and they're cleaned up at the end of the job. So from within the job, if I create a file in /tmp, then I can see it (as root) in /basepath/jobid/.jobid: In the job: $ touch qwerty As root: # ls /home/marshall/job_containers/n1-1/609/.609/ qwerty If I allocate another job on another node and do the same thing: In the job: $ touch asdf As root: # ls /home/marshall/job_containers/n1-2/610/.610/ asdf Does this make sense? Do you have any more questions about how it works? Ah okay, now I see it. So it is working properly. It would be good to add this description to the docs on this as its not clear that its making hidden directories or what exactly it is doing. Thus it is hard to confirm it is actually doing what is intended. I don't know if you want this to be a separate ticket but I have two feature requests for this: 1. Set the mount path to something other than /tmp: We actually have users using /scratch instead of /tmp for local temporary space. So it would be better if we could direct it to create one of these for /scratch instead. 2. Multiple paths: In principle I could see wanting to do this for more than one storage path. So have /tmp, /scratch and maybe /globalscratch all sharded like this. Thus it would be a nice extension to be able to do this for multiple storage paths. Thanks for the info! -Paul Edmon- On 3/24/2021 7:56 PM, bugs@schedmd.com wrote: > > *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=11183#c1> on > bug 11183 <https://bugs.schedmd.com/show_bug.cgi?id=11183> from > Marshall Garey <mailto:marshall@schedmd.com> * > From what I can see you don't have anything misconfigured. Here's how the > job_container/tmpfs plugin works. > > Here's my job_container.conf: > > # job_container.conf > AutoBasePath=true > BasePath=/home/marshall/job_containers > > First, a quick explanation for why my directories will look different than > yours: > > I'm using multiple-slurmd to simulate a real cluster but it's all just on a > single node. So with multiple-slurmd there will be a directory inside BasePath > for each node, and the job_id directories for the jobs are created inside the > node(s) where the job resides. I have nodes n1-[1-10] defined in my slurm.conf. > You won't see the nodename directories inside basepath since you aren't using > multiple-slurmd, just the job id. > > > Before a job is launched, slurmd creates a mount namespace for the job in > /<basepath>/<job_id>/.ns. slurmd makes the root (/) directory a private mount > for the job. Then, a private /tmp directory is mounted at > <BasePath>/</job_id>/.<job_id>. Then the basepath directory is unmounted so the > job can't see the basepath mount, but the job can see the private /tmp mount. > From outside the job, you can't see the job's /tmp mount (since it's private) > but can see the basepath mount and mount namespace for the job. root can view > what's actually in the job's /tmp directories by looking at > basepath/jobid/.jobid. > > In addition, /dev/shm is *not* mounted at the basepath. /dev/shm is unmounted; > then a new private tmpfs is created at /dev/shm. > > Since slurmd runs as root, the directories in basepath are owned by root. > > To try to illustrate how this is working, here are excerpts from findmnt from > inside and outside the job. > > Viewing mounts from outside the job: > > $ findmnt -l -o target,source,fstype,propagation > TARGET SOURCE > FSTYPE PROPAGATION > / /dev/nvme0n1p5 > ext4 shared > /dev/shm tmpfs > tmpfs shared > /home/marshall/job_containers/n1-1 > /dev/nvme0n1p5[/home/marshall/job_containers/n1-1] ext4 private > /home/marshall/job_containers/n1-1/609/.ns nsfs[mnt:[4026533234]] > nsfs private > > > Viewing mounts from within the job: > > $ findmnt -l -o target,source,fstype,propagation > TARGET SOURCE > FSTYPE PROPAGATION > / /dev/nvme0n1p5 > ext4 private > /tmp > /dev/nvme0n1p5[/home/marshall/job_containers/n1-1/609/.609] ext4 private > /dev/shm tmpfs > tmpfs private > > > Here's my current /tmp directory from outside the job: > > $ ls /tmp > cscope.49356/ > sddm-:0-DxMYRL= > sddm-auth9350208e-6093-4d49-b324-ee6e2eb46be7= > ssh-I9zVn59cDxhk/ > systemd-private-4864c0321a9141f18c5b1ca5545cd58b-apache2.service-6jY2vg/ > systemd-private-4864c0321a9141f18c5b1ca5545cd58b-colord.service-rnwl5g/ > systemd-private-4864c0321a9141f18c5b1ca5545cd58b-haveged.service-r7ENHf/ > systemd-private-4864c0321a9141f18c5b1ca5545cd58b-ModemManager.service-tB7D9g/ > systemd-private-4864c0321a9141f18c5b1ca5545cd58b-systemd-logind.service-ZlsM6g/ > systemd-private-4864c0321a9141f18c5b1ca5545cd58b-systemd-resolved.service-HZKwJf/ > systemd-private-4864c0321a9141f18c5b1ca5545cd58b-systemd-timesyncd.service-FqkVDi/ > systemd-private-4864c0321a9141f18c5b1ca5545cd58b-upower.service-GvkFCi/ > Temp-3dca24b8-59a7-44da-bd35-bf28d885c0da/ > Temp-8a7d60ac-d3fa-4f72-8c03-644a5fc24bd0/ > xauth-1000-_0 > > > And my /tmp directory from inside the job: > > $ ls /tmp > $ > > <It's empty>. > > For a user to use these private /tmp or /dev/shm mounts in their job, they > simply need to use "/tmp" or "/dev/shm". They can't use /basepath/jobid. The > specifics of how these are mounted are hidden from the user and they're cleaned > up at the end of the job. > > So from within the job, if I create a file in /tmp, then I can see it (as root) > in /basepath/jobid/.jobid: > > In the job: > $ touch qwerty > > As root: > # ls /home/marshall/job_containers/n1-1/609/.609/ > qwerty > > If I allocate another job on another node and do the same thing: > > In the job: > $ touch asdf > > As root: > # ls /home/marshall/job_containers/n1-2/610/.610/ > asdf > > Does this make sense? Do you have any more questions about how it works? > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. > Hey Paul, No worries! It was confusing to me and others at first, too. We already have an internal bug open to improve the documentation, and I just updated that bug. As to your request, there is already an existing ticket/enhancement request for this: bug 11135. Your requests sound the same as others on there, but can you re-post your requests on bug 11135 so we have a record of it over there? It's good for us to know when lots of people want the same thing. What you can do right now for your requests: #1 - it should be really simple for you to change the plugin. Simply change "tmp" to "scratch" in the plugin where it mounts and unmounts the directory. There will probably be a few places to change it. Another way of doing this might be to symlink /scratch to /tmp in a job prolog, since the container plugin does its work immediately before the job prolog. I haven't tested making a symlink to /tmp in the job prolog but I think it would work. #2 - no workarounds that I can think of. Cool I will do that. Thanks. This is a great new feature by the way, we are eager to use it. -Paul Edmon- On 3/25/2021 10:32 AM, bugs@schedmd.com wrote: > > *Comment # 3 <https://bugs.schedmd.com/show_bug.cgi?id=11183#c3> on > bug 11183 <https://bugs.schedmd.com/show_bug.cgi?id=11183> from > Marshall Garey <mailto:marshall@schedmd.com> * > Hey Paul, > > No worries! It was confusing to me and others at first, too. We already have an > internal bug open to improve the documentation, and I just updated that bug. > > As to your request, there is already an existing ticket/enhancement request for > this:bug 11135 <show_bug.cgi?id=11135>. Your requests sound the same as others on there, but can you > re-post your requests onbug 11135 <show_bug.cgi?id=11135> so we have a record of it over there? It's > good for us to know when lots of people want the same thing. > > What you can do right now for your requests: > > #1 - it should be really simple for you to change the plugin. Simply change > "tmp" to "scratch" in the plugin where it mounts and unmounts the directory. > There will probably be a few places to change it. Another way of doing this > might be to symlink /scratch to /tmp in a job prolog, since the container > plugin does its work immediately before the job prolog. I haven't tested making > a symlink to /tmp in the job prolog but I think it would work. > > #2 - no workarounds that I can think of. > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. > Sounds good - closing this as infogiven. |