Ticket 6522

Summary: sbatch: --ntasks not working with --ntasks-per-node
Product: Slurm Reporter: Marcus Wagner <wagner>
Component: SchedulingAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 18.08.5   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Marcus Wagner 2019-02-14 23:14:09 MST
Dear SLURM developers,

in short:

--ntasks=x schedules all tasks onto one host with x cores
--ntasks=x --ntasks-per-node=48, which essentially should be the same, slurm denies submission with the following error message:

sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available


Our node configuration:
NodeName=nihm[001-004]   CPUs=48  Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=185000  Feature=skx8160,hostok,hpcwork,cats                     Weight=10541 State=UNKNOWN
Yes it is a 96 Thread node. The intention of CPUs=48 was to schedule by core instead of scheduling by thread.

we have configured:
$> scontrol show config | grep -i select
SelectType              = select/cons_res
SelectTypeParameters    = CR_CORE_MEMORY,CR_ONE_TASK_PER_CORE
So, only one task gets scheduled onto one core (or should to my understanding).


with --ntasks=48 the job gets scheduled onto one host:

   NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=48,mem=182400M,node=1,billing=48

So, if I understand this right, and I want 48 tasks and want htem all on one host, I should be able to set --ntasks-per-node=48. But then slurm denies the submission.


Have I understood something wrong?
Is there something misconfigured?

Or is this really a bug?


Don't hesitate to aks, if you need any more information from me.

Best
Marcus
Comment 2 Jacob Jenson 2019-02-15 10:03:08 MST
Our system was unable to associate your email address with a supported site. If this is an error please contact your account representative to have this error corrected or email sales@schedmd.com if you unsure who your account representative is. 

If you do not have an active Slurm support contract then please email sales@schedmd.com to inquire about support options. 

Once we have been able to associate your email address with a Slurm support contract or your site purchases Slurm support this ticket will be assigned to one of the SchedMD support engineers who will resolve this issue.