Bug 6522 - sbatch: --ntasks not working with --ntasks-per-node
Summary: sbatch: --ntasks not working with --ntasks-per-node
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other bugs)
Version: 18.08.5
Hardware: Linux Linux
: --- 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-02-14 23:14 MST by Marcus Wagner
Modified: 2019-02-15 10:03 MST (History)
0 users

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus Wagner 2019-02-14 23:14:09 MST
Dear SLURM developers,

in short:

--ntasks=x schedules all tasks onto one host with x cores
--ntasks=x --ntasks-per-node=48, which essentially should be the same, slurm denies submission with the following error message:

sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available


Our node configuration:
NodeName=nihm[001-004]   CPUs=48  Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=185000  Feature=skx8160,hostok,hpcwork,cats                     Weight=10541 State=UNKNOWN
Yes it is a 96 Thread node. The intention of CPUs=48 was to schedule by core instead of scheduling by thread.

we have configured:
$> scontrol show config | grep -i select
SelectType              = select/cons_res
SelectTypeParameters    = CR_CORE_MEMORY,CR_ONE_TASK_PER_CORE
So, only one task gets scheduled onto one core (or should to my understanding).


with --ntasks=48 the job gets scheduled onto one host:

   NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=48,mem=182400M,node=1,billing=48

So, if I understand this right, and I want 48 tasks and want htem all on one host, I should be able to set --ntasks-per-node=48. But then slurm denies the submission.


Have I understood something wrong?
Is there something misconfigured?

Or is this really a bug?


Don't hesitate to aks, if you need any more information from me.

Best
Marcus
Comment 2 Jacob Jenson 2019-02-15 10:03:08 MST
Our system was unable to associate your email address with a supported site. If this is an error please contact your account representative to have this error corrected or email sales@schedmd.com if you unsure who your account representative is. 

If you do not have an active Slurm support contract then please email sales@schedmd.com to inquire about support options. 

Once we have been able to associate your email address with a Slurm support contract or your site purchases Slurm support this ticket will be assigned to one of the SchedMD support engineers who will resolve this issue.