Bug 3874

Summary:	Cgroup ConstrainKmemSpace=no problems with lustre 2.5 and 2.8
Product:	Slurm	Reporter:	Mark Schmitz <mschmit>
Component:	Limits	Assignee:	Tim Wickberg <tim>
Status:	RESOLVED FIXED	QA Contact:
Severity:	3 - Medium Impact
Priority:	---	CC:	cvalvin, da, lipari1
Version:	17.02.2
Hardware:	Linux
OS:	Linux
Site:	Sandia National Laboratories	Alineos Sites:	---
Atos/Eviden Sites:	---	Confidential Site:	---
Coreweave sites:	---	Cray Sites:	---
DS9 clusters:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Linux Distro:	---
Machine Name:		CLE Version:
Version Fixed:	17.02.5 17.11.0-pre0	Target Release:	---
DevPrio:	---	Emory-Cloud Sites:	---

Description Mark Schmitz 2017-06-07 09:29:33 MDT

We have a problem with slurm 17.02.2 and lustre 2.5 in TOSS3 when using ConstrainRAMSpace=yes in the cgroup.conf, even with ConstrainKmemSpace=no. It turns out that lustre 2.5 has a bug and even entering the cgroup kmem code will cause problems. There are 2 commits in the slurm code from last year that I think made it into slurm 16.05 forward where in the first commit (3b5befc9e85652a5f826325d4049d2226eeb73a2) they added an if statement in src/plugins/task/cgroup/task_cgroup_memory.c to not set memory.kmem.limit_in_bytes if ConstrainKmemSpace=no. However in commit (084c930861a2461e95f04df865bc027b9c05c8b3) that was reversed and code was added to just set memory.kmem.limit_in_bytes to 100%. So even with ConstrainKmemSpace=no the cgroup kmem codes gets called, and that is enough to cause all sorts of stack traces and crashes caused by lustre 2.5 running under TOSS3.

We have tested with both lustre 2.5 and lustre 2.8 and the bug is fixed in lustre 2.8. However while we are running TOSS3 with lustre 2.8 on a new cluster, we are getting SLAB warnings even with ConstrainKmemSpace=no. They look like this:
kernel: [311915.175925] cache_from_obj: Wrong slab cache. kmalloc-64(77:step_3) but object is from kmem_cache_node

I made a patch to the source code which puts back the if statement in src/plugins/task/cgroup/task_cgroup_memory.c to not set memory.kmem.limit_in_bytes if ConstrainKmemSpace=no. And that seems to solve the problem.

Why in the second commit was it deemed safe to just set memory.kmem.limit_in_bytes instead of not setting it at all? It is the feeling of the admins here that if ConstrainKmemSpace is set to no then the cgroup kmem code should not be called, period.

I'm assuming that the developer who submitted this patch in the first place had a reason for the second commit, so I would like to understand that, and find a way to allow both of these cases to co-exist. We do however need a fix.

Comment 1 Tim Wickberg 2017-06-08 15:26:39 MDT

This does appear to have been an oversight in reviewing commit 084c930.

I agree that, when ConstrainKmemSpace=no is set, that limit should not be applied. I'm looking into correcting that for the next 17.02 maintenance release now.

Comment 2 Mark Schmitz 2017-06-08 15:41:03 MDT

That's great. Thanks Tim.

Comment 5 Tim Wickberg 2017-06-12 11:06:58 MDT

This is fixed with commit ba32ac48219, which will be in 17.02.5 when released.

- Tim