Ticket 8986 - OverMemoryKill enforces only step memory limit, not total usage
Summary: OverMemoryKill enforces only step memory limit, not total usage
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Limits (show other tickets)
Version: 19.05.6
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-05-05 06:03 MDT by CSC sysadmins
Modified: 2020-05-29 14:04 MDT (History)
0 users

See Also:
Site: CSC - IT Center for Science
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 20.02.4, 20.11
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description CSC sysadmins 2020-05-05 06:03:04 MDT
Nate suggested to open a separate ticket for this case:

https://bugs.schedmd.com/show_bug.cgi?id=8656#c101

"The read values look valid considering memory usage overhead. OverMemoryKill appears to only enforce the step memory limit (instead of the job total) currently, which I believe is worthy of a new ticket."
Comment 1 Nate Rini 2020-05-06 10:53:00 MDT
Tommi

Looking into what Slurm should be doing.

--Nate
Comment 2 CSC sysadmins 2020-05-11 08:53:49 MDT
(In reply to Nate Rini from comment #1)

> Looking into what Slurm should be doing.

Hi,

Only reliable solution what comes to my mind is to combine extern step and running job step pss and verify that it's under the limit? Or do you mean case where --mem-per-cpu is set and one extern step consuming memory also?
Comment 8 Nate Rini 2020-05-29 14:04:39 MDT
(In reply to Tommi Tervo from comment #2)
> (In reply to Nate Rini from comment #1)
> > Looking into what Slurm should be doing.
After consulting internally about how overmemorykill works, we decided that this is a documentation issue. (Updated here: https://github.com/SchedMD/slurm/commit/b82d7c29f4fabea702dba3b08e9581e450c4f064)

Overmemorykill is not suggested due to its inherent limits and instead we suggest using cgroups and 'ConstrainRAMSpace=yes' which will limit the memory on a per job/step basis.

> Only reliable solution what comes to my mind is to combine extern step and
> running job step pss and verify that it's under the limit?

Each step/task (process tree) in a job forks a new slurmstepd instance that would have to communicate with the lead slurmd instance in order to actually implement a limit for the whole job. None of the required RPCs or functionality current exist to implement this with overmemorykill. Extern steps and MPI jobs actual fork secondary tasks instance which also only enforce limits against the single process tree and slurmstepd instances further complicating matters.

> Or do you mean case where --mem-per-cpu is set and one extern step consuming memory also?

Memory limits are set per job and can be set for steps/tasks when using cgroups and 'ConstrainRAMSpace=yes' due to the built in hierarchy of cgroups in the Linux kernel. There is currently no plan to implement this for Overmemorykill as we don't suggest sites use it anymore.

I'm closing this ticket, please reply to this ticket if you have any questions and we can continue from here.

Thanks,
--Nate