Bug 4101 - `salloc --uid` should be rejected from non-root accounts
Summary: `salloc --uid` should be rejected from non-root accounts
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other bugs)
Version: 17.11.x
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-08-22 13:40 MDT by Tim Wickberg
Modified: 2017-10-04 22:44 MDT (History)
1 user (show)

See Also:
Site: SchedMD
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 17.11.0-pre3
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
salloc-setuid-root-only patch for branch slurm-17.02 (1.07 KB, patch)
2017-08-22 20:15 MDT, Philip Kovacs
Details | Diff
salloc_privileges patch (1.34 KB, patch)
2017-08-23 01:06 MDT, Philip Kovacs
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Wickberg 2017-08-22 13:40:02 MDT
`salloc --uid` should only be used as root, not as the SlurmUser. Otherwise, the allocation request will succeed, but salloc will die with:

salloc: error: setgroups: Operation not permitted

(Or "salloc: error: setuid: Operation not permitted" prior to the patch from bug 4095.)

And result in an orphaned job left on the cluster under that account.
Comment 1 Philip Kovacs 2017-08-22 18:58:06 MDT
The salloc opt.c module is defaulting the uid to getuid() and salloc.c is issuing setuid() unless the option uid is -1, i.e. --uid is actually an implicit option executing in all cases.   The setgroups problem you mention below comes from a5b47f7b6, your commit to slurm-17.02 changing my c7e6d864de code which was committed to master.  

master is fine, slurm-17.02 is now problematic.

I suggest reverting a5b47f7b6 which breaks things.  If you want to enforce root only setuid, then do that.
Comment 2 Tim Wickberg 2017-08-22 19:41:00 MDT
(In reply to Philip Kovacs from comment #1)
> The salloc opt.c module is defaulting the uid to getuid() and salloc.c is
> issuing setuid() unless the option uid is -1, i.e. --uid is actually an
> implicit option executing in all cases.   The setgroups problem you mention
> below comes from a5b47f7b6, your commit to slurm-17.02 changing my
> c7e6d864de code which was committed to master.  

Commit c7e6d864de is also on slurm-17.02. (If you're looking at the github interface it may not list all branches that commit exists on for some reason. IIRC it will start showing alternate tags once a new tag is created with that commit.)

Prior to any of these commits, salloc does not work if not run as root, and leaves the stray job as described. The only discrepancy is in which error will cause salloc to exit - setgroups (after my commit), or setuid (prior to any of these commits under discussion).

This is 17.02.7:

tim@redshift:~$ salloc --uid 0
salloc: error: setuid: Operation not permitted
tim@redshift:~$ salloc --gid 0
salloc: error: setgid: Operation not permitted

You'll note there are two orphaned jobs running afterwards.

> master is fine, slurm-17.02 is now problematic.
> 
> I suggest reverting a5b47f7b6 which breaks things.  If you want to enforce
> root only setuid, then do that.

The salloc man page description of '--uid' outlines this perfectly well - " This  option is only valid for user root." - and I plan to move forward to enforcing that on this bug.

I do not plan to alter sbatch at this time - although I'm confident the original intent was never to expose this to non-root users, as this was done as a compatibility mechanism with Moab/Maui which would have been running as root anyways.
Comment 3 Philip Kovacs 2017-08-22 20:15:44 MDT
Created attachment 5127 [details]
salloc-setuid-root-only patch for branch slurm-17.02

Tim, 

This patch for branch slurm-17.02 should be pretty close to what you need.  Warns, but ignores, requests to change uid if not root.

Regards,

Phil
Comment 4 Philip Kovacs 2017-08-22 20:23:09 MDT
and something similar for the setgid() portion higher up in the module....
Comment 5 Tim Wickberg 2017-08-22 20:26:49 MDT
Thanks for the quick response, but I don't want to add warnings here. Given this would not have worked for non-root before, I'm only interested in avoiding the orphaned jobs now with further work.

And the allocation request denial should come from the slurmctld itself (although a secondary check in salloc is fine).
Comment 7 Philip Kovacs 2017-08-23 01:04:14 MDT
I'll add this thought, with a patch to demonstrate.  In my local codebase I've moved the setgid and setuid blocks above the slurm allocation request area, so that the process succeeds or fails with its uid/gid changes well before requesting any allocation from slurmctld.  This is working well for me, and it eliminates the stranded allocations from ctld that concern you.  If ctld doesn't like the altered uid/gid of the caller, per the contents of the options sent to it, you get back an appropriate response and all is well.  

In no case below are theer any stranded allocations created.

# ex 1 (unprivileged user)
#
[phil@pf26vm2 ~]$ salloc whoami
salloc: Granted job allocation 11
phil
salloc: Relinquishing job allocation 11
salloc: Job allocation 11 has been revoked.
[phil@pf26vm2 ~]$ squeue -l
Wed Aug 23 02:51:51 2017
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)

# ex 2 (unprivileged user)
#
[phil@pf26vm2 ~]$ salloc --uid 0 whoami
salloc: error: setgroups: Operation not permitted
[phil@pf26vm2 ~]$ squeue -l
Wed Aug 23 02:54:16 2017
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)

# ex 3 (unprivileged user)
#
[phil@pf26vm2 ~]$ salloc --gid 0 whoami
salloc: error: setgid: Operation not permitted
[phil@pf26vm2 ~]$ squeue -l
Wed Aug 23 02:54:59 2017
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)

# ex 4 (as root)
#
[root@pf26vm2 ~]# salloc whoami
salloc: Granted job allocation 12
root
salloc: Relinquishing job allocation 12
salloc: Job allocation 12 has been revoked.
[root@pf26vm2 ~]# squeue -l
Wed Aug 23 02:56:06 2017
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)

# ex 5 (as root)
#
[root@pf26vm2 ~]# salloc --uid phil whoami
salloc: Granted job allocation 13
phil
salloc: Relinquishing job allocation 13
salloc: Job allocation 13 has been revoked.

etc.
Comment 8 Philip Kovacs 2017-08-23 01:06:29 MDT
Created attachment 5128 [details]
salloc_privileges patch

This patch is against the 17.02.7 tarball source, slightly behind the latest commits.
Comment 11 Tim Wickberg 2017-10-04 22:44:11 MDT
Commit 52086a9bc0ff2 now strictly handles the enforcement for salloc, sbatch, and srun.

Phillip - I could not use your patch as-is. Moving the setuid() call above the allocation request means the slurmctld would receive the RPC signed as that user, not as root, and thus not be able to distinguish between them anymore.

Admittedly, the 'su' command is preferable in almost all instances, and I would suggest anyone with such a workflow move to that rather than this quirky approach. I may look into removing these options altogether before the 18.08 release.

Marking closed.