So, another variation on a theme for you (the related tickets are assigned to Nate I think): The salloc man page says that if I ask for --mem=0, I'll get all the memory on the node (presumably all the schedulable memory). This doesn't seem to be the case: matthews@cheyenne1:~> /sys/fs/cgroup/memory/slurm/uid_24712/job_1946802/memory.soft_limit_in_bytes^C matthews@cheyenne1:~> salloc -C casper --mem=0 --exclusive -t 12:00 -A sssg0001 srun --mem=0 --pty --export=HOME=$HOME,PATH=$PATH,TERM=$TERM,SHELL=$SHELL bash -l salloc: Granted job allocation 1946829 salloc: Waiting for resource configuration salloc: Nodes casper01 are ready for job Resetting modules to system default bash-4.2$ cat /sys/fs/cgroup/memory/slurm/uid_24712/job_1946829/memory.limit_in_bytes 67108864000 Of course, if I know to ask for more, I can get it: matthews@cheyenne1:~> salloc -C casper --mem=200G --exclusive -t 12:00 -A sssg0001 srun --mem=0 -w casper01 --pty --export=HOME=$HOME,PATH=$PATH,TERM=$TERM,SHELL=$SHELL bash -l salloc: Granted job allocation 1946833 salloc: Waiting for resource configuration salloc: Nodes casper01 are ready for job Resetting modules to system default bash-4.2$ cat /sys/fs/cgroup/memory/slurm/uid_24712/job_1946833/memory.limit_in_bytes 214748364800 From slurm.conf: NodeName=casper01 NodeAddr=casper01.ucar.edu Sockets=2 CoresPerSocket=18 ThreadsPerCore=2 Features=casper,skylake,mlx5_0 State=UNKNOWN Weight=200 RealMemory=257000 PartitionName=dav Nodes=pronghorn[01-16],geyser[01-16],caldera[01-16],casper[01-26] Default=YES MaxTime=INFINITE Shared=YES DefMemPerCPU=1827 MaxMemPerNode=1160001 TRESBillingWeights="CPU=1.0,Mem=1G" State=UP DefMemPerCPU*72 is still twice that 62G number (you do get ~130G if you do only --exclusive, so that does work). Why 62G? Shouldn't I have ~257G when I specify --mem=0?
(In reply to Ben Matthews from comment #0) > The salloc man page says that if I ask for --mem=0, I'll get all the memory > on the node (presumably all the schedulable memory). This doesn't seem to be > the case: Looks like you found bug: #5240 which was fixed in 17.11.8+ You can try the patch directly if you want to verify: https://github.com/SchedMD/slurm/commit/d52d8f4f0ce1a5b86bb0691630da0dc3dace1683 --Nate
Ok, thanks. Sorry to bother you then. We're planning on moving to 18.08 next week anyway. Assuming this is also fixed in the latest 18.08.x, I'm fine with closing this ticket.
(In reply to Ben Matthews from comment #4) > Assuming this is also fixed in the latest 18.08.x, I'm fine > with closing this ticket. Yes, 18.08 has the fix. Closing ticket. --Nate *** This ticket has been marked as a duplicate of ticket 5240 ***