Ticket 6183 - salloc --mem=0
Summary: salloc --mem=0
Status: RESOLVED DUPLICATE of ticket 5240
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 17.11.7
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-12-05 17:48 MST by Ben Matthews
Modified: 2018-12-06 10:54 MST (History)
0 users

See Also:
Site: UCAR
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ben Matthews 2018-12-05 17:48:02 MST
So, another variation on a theme for you (the related tickets are assigned to Nate I think):

The salloc man page says that if I ask for --mem=0, I'll get all the memory on the node (presumably all the schedulable memory). This doesn't seem to be the case:

matthews@cheyenne1:~> /sys/fs/cgroup/memory/slurm/uid_24712/job_1946802/memory.soft_limit_in_bytes^C
matthews@cheyenne1:~> salloc -C casper --mem=0 --exclusive -t 12:00 -A sssg0001 srun --mem=0 --pty --export=HOME=$HOME,PATH=$PATH,TERM=$TERM,SHELL=$SHELL bash -l
salloc: Granted job allocation 1946829
salloc: Waiting for resource configuration
salloc: Nodes casper01 are ready for job
Resetting modules to system default
bash-4.2$ cat /sys/fs/cgroup/memory/slurm/uid_24712/job_1946829/memory.limit_in_bytes
67108864000

Of course, if I know to ask for more, I can get it:

matthews@cheyenne1:~> salloc -C casper --mem=200G --exclusive -t 12:00 -A sssg0001 srun --mem=0 -w casper01 --pty --export=HOME=$HOME,PATH=$PATH,TERM=$TERM,SHELL=$SHELL bash -l
salloc: Granted job allocation 1946833
salloc: Waiting for resource configuration
salloc: Nodes casper01 are ready for job
Resetting modules to system default
bash-4.2$ cat /sys/fs/cgroup/memory/slurm/uid_24712/job_1946833/memory.limit_in_bytes
214748364800


From slurm.conf:

NodeName=casper01 NodeAddr=casper01.ucar.edu Sockets=2 CoresPerSocket=18 ThreadsPerCore=2 Features=casper,skylake,mlx5_0 State=UNKNOWN Weight=200 RealMemory=257000

PartitionName=dav Nodes=pronghorn[01-16],geyser[01-16],caldera[01-16],casper[01-26] Default=YES MaxTime=INFINITE Shared=YES DefMemPerCPU=1827 MaxMemPerNode=1160001 TRESBillingWeights="CPU=1.0,Mem=1G" State=UP

DefMemPerCPU*72 is still twice that 62G number (you do get ~130G if you do only --exclusive, so that does work). Why 62G? Shouldn't I have ~257G when I specify --mem=0?
Comment 2 Nate Rini 2018-12-06 10:44:23 MST
(In reply to Ben Matthews from comment #0)
> The salloc man page says that if I ask for --mem=0, I'll get all the memory
> on the node (presumably all the schedulable memory). This doesn't seem to be
> the case:

Looks like you found bug: #5240 which was fixed in 17.11.8+

You can try the patch directly if you want to verify: https://github.com/SchedMD/slurm/commit/d52d8f4f0ce1a5b86bb0691630da0dc3dace1683

--Nate
Comment 4 Ben Matthews 2018-12-06 10:49:22 MST
Ok, thanks. Sorry to bother you then. We're planning on moving to 18.08 next week anyway. Assuming this is also fixed in the latest 18.08.x, I'm fine with closing this ticket.
Comment 5 Nate Rini 2018-12-06 10:54:33 MST
(In reply to Ben Matthews from comment #4)
> Assuming this is also fixed in the latest 18.08.x, I'm fine
> with closing this ticket.

Yes, 18.08 has the fix. Closing ticket.

--Nate

*** This ticket has been marked as a duplicate of ticket 5240 ***