4010 – Exploring Hyper-Threading CPU nodes

Bug 4010 - Exploring Hyper-Threading CPU nodes

Summary: Exploring Hyper-Threading CPU nodes

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Configuration (show other bugs)
Version:	16.05.4
Hardware:	Linux Linux

Importance:	--- 4 - Minor Issue
Assignee:	Marshall Garey
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2017-07-19 01:40 MDT by Damien
Modified:	2017-08-14 01:51 MDT (History)
CC List:	1 user (show)

See Also:
Site:	Monash University
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Damien 2017-07-19 01:40:12 MDT

Dear SchedMD

We are trying to explore have hyper-threading enabled nodes in a different partition in our SLURM cluster and is mainly reading this document 'https://slurm.schedmd.com/mc_support.html' . There is some questions which we need some clarifications. 


I believe SLURM reports what's OS tell it, If hyper-threading is enabled in the BIOS, Typically SLURM will reports the number of cores X 2, Is this a correct observation ?   

For this special partition, let's call it PartitionName=HT, will have all nodes with hyper threading enabled turn ON at the BIOS level. 

If an user submit a job in this partition=HT, but does not want it to run hyper threading mode (For Example, the other available partitions are already busy), Is there a #SBATCH parameter or Flag to submit to this job but doesn't want it to run in hyper-threading mode ?   

We will have hyper-threading enabled and disable partitions/nodes in a single cluster , with multi-factor/fair-share mechanism. How can we define the non-hyper-threading partition to be a higher valued partition from a hyper-threading partition ? Is there a method ?



Kindly advise. Thanks.


Cheers


Damien Leong

Comment 1 Marshall Garey 2017-07-20 14:30:46 MDT

Damien, 

You're correct in assuming slurm will report the number of processors differently if hyperthreading is turned on than if it is turned off. To see your actual hardware configuration, use slurmd -C.
 
If you disable hyperthreading in the BIOS, slurm will correctly report only 1 thread per core. Here's a machine with hyperthreading disabled:

marshall@knc:~/slurm/master/byu/sbin$ ./slurmd -C
NodeName=knc CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=1 RealMemory=23883 TmpDisk=51175                                       
UpTime=90-21:57:37
 
Hyperthreading is enabled on this machine:

marshall@smd-server:~$ slurmd -C 
NodeName=smd-server CPUs=24 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=7966 TmpDisk=555165 
UpTime=7-02:57:48 


Here's a little output of lstopo on the machine with hyperthreading enabled:

Machine (7966MB) 
  Package L#0 + L3 L#0 (12MB) 
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 
      PU L#0 (P#0) 
      PU L#1 (P#12)

  ...

    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
      PU L#12 (P#1)
      PU L#13 (P#13)

Core0 has CPUs 0 and 12, Core6 has CPUs 1 and 13, etc.

With the following configuration:
SelectType=cons_res
SelectTypeParameters=cr_core_memory
TaskPlugin=task/affinity or task/cgroup

srun -n2 will launch 2 tasks packed onto 1 core if both threads are available, but won't allow multiple jobs to share the core:

marshall@smd-server:~/byu/slurm/17.02/smd$ srun -n2 cat /proc/self/status | grep -i cpus_allowed_list
Cpus_allowed_list:      0
Cpus_allowed_list:      12

To force a task to use a whole core, use --ntasks-per-core=1.

marshall@smd-server:~/byu/slurm/17.02/smd$ srun -n2 --ntasks-per-core=1 cat /proc/self/status | grep -i cpus_allowed_list
Cpus_allowed_list:      1,13
Cpus_allowed_list:      0,12

If you want a task to only be bound on a single CPU on a core, use --cpu_bind=threads:

marshall@smd-server:~/byu/slurm/17.02/smd$ srun -n2 --ntasks-per-core=1 --cpu_bind=threads cat /proc/self/status | grep -i cpus_allowed_list
Cpus_allowed_list:      1
Cpus_allowed_list:      0
marshall@smd-server:~/byu/slurm/17.02/smd$ scontrol show job 117 | grep NumCPUs   NumNodes=1 NumCPUs=4 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

Another option is to use --hint=nomultithread, which is functionally equivalent (but it only works if using the task/affinity plugin):

marshall@smd-server:~/byu/slurm/17.02/smd$ srun -n2 --hint=nomultithread cat /proc/self/status | grep -i cpus_allowed_list
Cpus_allowed_list:      1
Cpus_allowed_list:      0
marshall@smd-server:~/byu/slurm/17.02/smd$ scontrol show job 117 | grep NumCPUs   NumNodes=1 NumCPUs=4 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

On the hyperthreaded nodes, the allocated CPU count will be rounded up to include both threads on the core, as shown above.


To make a partition a higher priority than another partition, set its PriorityTier in slurm.conf. Jobs in a partition with higher priority tier will be scheduled before jobs in a partition with a lower priority tier.

I suggest reading our page on cpu_management: https://slurm.schedmd.com/cpu_management.html


Does this do what you need?

Comment 2 Marshall Garey 2017-08-01 11:51:11 MDT

Hi Damien,

I'd just like to follow up with you. Were you able to figure out how to do what you wanted? Do you still have questions?

Comment 3 Marshall Garey 2017-08-08 10:55:21 MDT

Damien,

I'm going to close this as resolved/info given. Please let me know if you have any further questions or problems.

Thanks,
Marshall

Comment 4 Damien 2017-08-14 01:51:19 MDT

Thanks for the information.


Cheers

Damien