We are seeing the below error message when ever we invoke srun, could you please assist on the same ?
srun: job 357102 queued and waiting for resources
srun: error: Lookup failed: Unknown host
srun: job 357102 has been allocated resources
slurmstepd: error: task/cgroup: unable to add task[pid=85894] to memory cg '(null)'
slurmstepd: error: xcgroup_instantiate: unable to create cgroup '/sys/fs/cgroup/memory/slurm/uid_684277182' : No space left on device
slurmstepd: error: jobacct_gather/cgroup: unable to instanciate user 684277182 memory cgroup
Please do let me know if you need anything from my end to take this further.
(In reply to Sathish from comment #0)
> Hi Team,
> We are seeing the below error message when ever we invoke srun, could you
> please assist on the same ?
Can you show me your kernel version and OS version in the node?
Also, can you attach your latest slurm.conf and cgroup.conf ?
We can track the same request with 5507
*** This bug has been marked as a duplicate of bug 5507 ***