Bug 5504 - slurmstepd: error
Summary: slurmstepd: error
Status: RESOLVED DUPLICATE of bug 5507
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmstepd (show other bugs)
Version: 17.11.7
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-07-31 03:25 MDT by Sathish
Modified: 2018-07-31 08:58 MDT (History)
1 user (show)

See Also:
Site: AstraZeneca
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sathish 2018-07-31 03:25:12 MDT
Hi Team, 

We are seeing the below error message when ever we invoke srun, could you please assist on the same ? 


$srun hostname
srun: job 357102 queued and waiting for resources
srun: error: Lookup failed: Unknown host
srun: job 357102 has been allocated resources
seskscpn084.prim.scp
slurmstepd: error: task/cgroup: unable to add task[pid=85894] to memory cg '(null)'
slurmstepd: error: xcgroup_instantiate: unable to create cgroup '/sys/fs/cgroup/memory/slurm/uid_684277182' : No space left on device
slurmstepd: error: jobacct_gather/cgroup: unable to instanciate user 684277182 memory cgroup


Please do let me know if you need anything from my end to take this further.

Thanks
Sathish
Comment 1 Felip Moll 2018-07-31 06:46:40 MDT
(In reply to Sathish from comment #0)
> Hi Team, 
> 
> We are seeing the below error message when ever we invoke srun, could you
> please assist on the same ? 


Hi Sathish,

Can you show me your kernel version and OS version in the node?
Also, can you attach your latest slurm.conf and cgroup.conf ?
Comment 2 Sathish 2018-07-31 08:58:34 MDT
We can track the same request with 5507

*** This bug has been marked as a duplicate of bug 5507 ***