Bug 10449 - Difference between 'salloc srun' and 'srun' regarding --exclusive
Summary: Difference between 'salloc srun' and 'srun' regarding --exclusive
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other bugs)
Version: 20.11.1
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-12-15 14:49 MST by Luke Yeager
Modified: 2020-12-17 15:00 MST (History)
1 user (show)

See Also:
Site: NVIDIA (PSLA)
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Yeager 2020-12-15 14:49:30 MST
With Slurm 20.11:
> $ salloc srun grep Cpus_allowed_list /proc/self/status
> Cpus_allowed_list:      0,40
> $ srun grep Cpus_allowed_list /proc/self/status
> Cpus_allowed_list:      0-79
In the first case, the srun is getting the new default behavior of '--exclusive', as discussed in bug#10383. But in the second case, it isn't.

I wonder if this is because the '--exclusive' flag is overloaded - it means something different for sbatch/salloc than it does for srun.
Comment 1 Luke Yeager 2020-12-16 10:13:39 MST
This is probably only a relevant issue for partitions with OverSubscribe=EXCLUSIVE.
Comment 2 Michael Hinton 2020-12-16 17:13:35 MST
Hi Luke,

(In reply to Luke Yeager from comment #0)
> With Slurm 20.11:
> > $ salloc srun grep Cpus_allowed_list /proc/self/status
> > Cpus_allowed_list:      0,40
> > $ srun grep Cpus_allowed_list /proc/self/status
> > Cpus_allowed_list:      0-79
> In the first case, the srun is getting the new default behavior of
> '--exclusive', as discussed in bug#10383. But in the second case, it isn't.
> 
> I wonder if this is because the '--exclusive' flag is overloaded - it means
> something different for sbatch/salloc than it does for srun.
Regarding the second case: It's because --exclusive/OverSubscribe=exclusive mean something different when srun itself is *making the allocation* vs. when srun is running *within an existing allocation* created by salloc/sbatch.

When srun is by itself on the command line, it makes an implicit allocation, but then only one step is subsequently made (# of steps match the # of `srun` invocations). So it makes sense that the sole step created gets everything in the allocation by default, because there is no other step that it can share with. (So in effect it seems as if --whole is implied here.)

If the step wasn't given everything in the allocation, how would you indicate what the step *should* get? So I think this isn't a bug, and is behaving how we want it to behave. 

Thanks,
-Michael
Comment 3 Luke Yeager 2020-12-16 17:38:27 MST
(In reply to Michael Hinton from comment #2)
> Regarding the second case: It's because --exclusive/OverSubscribe=exclusive
> mean something different when srun itself is *making the allocation* vs.
> when srun is running *within an existing allocation* created by
> salloc/sbatch.
Yes, I understand this. I think the choice of nomenclature is unfortunate.

> So I think this isn't a bug, and is behaving how we want it to behave.
I would assign a pretty high astonishment factor to this discrepancy. I understood pretty quickly what was going on because I've been in the weeds with this exclusive/whole/overlap stuff this week, and because I immediately looked at how many cores I had access to. But there are bound to be some users who can't figure out why their application runs slower with 'salloc srun' vs. with 'srun'.


There will also be users confused why this hangs (because it didn't hang for them on 20.02):

>(login_node)       $ srun --pty bash
>(within_allocation)$ srun hostname
It's going to take them a while to figure out they need the --overlap flag for the second srun.

(I realize this isn't related to the current bug report, but I'm trying to drive the point home that the changes being discussed in bug#10383 have some pretty far-reaching and surprising implications)


If all this is indeed intended behavior, then I guess I'll close this as INFOGIVEN.
Comment 4 Michael Hinton 2020-12-17 14:43:34 MST
(In reply to Luke Yeager from comment #3)
> (In reply to Michael Hinton from comment #2)
> > So I think this isn't a bug, and is behaving how we want it to behave.
> I would assign a pretty high astonishment factor to this discrepancy. I
> understood pretty quickly what was going on because I've been in the weeds
> with this exclusive/whole/overlap stuff this week, and because I immediately
> looked at how many cores I had access to. But there are bound to be some
> users who can't figure out why their application runs slower with 'salloc
> srun' vs. with 'srun'.
> 
> There will also be users confused why this hangs (because it didn't hang for
> them on 20.02):
> 
> >(login_node)       $ srun --pty bash
> >(within_allocation)$ srun hostname
> It's going to take them a while to figure out they need the --overlap flag
> for the second srun.
Here is what we recommend, and here is what I think you are looking for: Set this in your slurm.conf (new to 20.11):

    LaunchParameters=use_interactive_step

Then educate your users to just use `salloc` instead of `srun --pty bash` or `salloc srun --pty bash`.

What `use_interactive_step` does is make it so salloc creates an "interactive step" for the pty shell on a node in the allocation. This step is analogous to the batch step used by sbatch, so you can think of it like an interactive batch script. Just like a batch step, the interactive step has access to the entire allocation, but does NOT block regular srun steps (i.e. it's like it has an implicit --overlap). I think this is similar to what people expect:

(login_node)       $ salloc --exclusive
salloc: Granted job allocation 438
(within_allocation)$ grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list:      0-5
(within_allocation)$ srun grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list:      0
(within_allocation)$ srun --whole grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list:      0-5

See https://slurm.schedmd.com/faq.html#prompt and https://slurm.schedmd.com/slurm.conf.html#OPT_use_interactive_step. Note that LaunchParameters=use_interactive_step replaces DefaultSallocCommand.

Hopefully that helps, 
-Michael
Comment 5 Luke Yeager 2020-12-17 15:00:05 MST
That doesn't really address my overall concerns about the new behavior - it just weakens the trivial example used to create this bug.

Nonetheless, that's a neat new flag that I missed - thanks for sharing! I like making salloc behave more like sbatch.