Summary: | --cpu_bind=map_cpu:<cpuid> is silently igonored (info is displayed only in slurmd logs, so end-user is not aware of that) | ||
---|---|---|---|
Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
Component: | Other | Assignee: | Marcin Stolarek <cinek> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | acgmecaselog.com |
Version: | 20.11.5 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Stanford | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | 22.05pre1 | Target Release: | --- |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Ticket Depends on: | 11247 | ||
Ticket Blocks: |
Description
Kilian Cavalotti
2021-03-25 19:35:04 MDT
Kilian, Did you allocate the whole node? As documented: >map_cpu:<list> > [...] >Not supported unless the entire node is allocated to the job. Either case non-whole node allocated or map_cpu/mask_cpu pointing to CPUs outside of the available range you should see appropriate info level message coming from slurmstepd (where task binding really happens): either(src/plugins/task/affinity/dist_tasks.c): > 413 info("entire node must be allocated, " > 414 "disabling affinity"); or: > 295 info("Ignoring user CPU binding outside of job " > 296 "step allocation"); Unfortunately, we don't send this message to the end-user today, which is a kind of limitation of architecture (sending log messages from slurmstepd to srun). I'll take a look to check if it's something we can improve, but it sounds like falling into the enhancement area. I hope that makes it more clear for you, Marcin Hi Marcin, (In reply to Marcin Stolarek from comment #2) > Did you allocate the whole node? > As documented: > >map_cpu:<list> > > [...] > >Not supported unless the entire node is allocated to the job. Yes, that's with a whole-node allocation. Here's a complete transcript: $ salloc -p test -N 1 --exclusive salloc: Pending job allocation 21306623 salloc: job 21306623 queued and waiting for resources salloc: job 21306623 has been allocated resources salloc: Granted job allocation 21306623 salloc: Waiting for resource configuration salloc: Nodes sh02-01n60 are ready for job $ srun lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Thread(s) per core: 1 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz Stepping: 1 CPU MHz: 3129.052 CPU max MHz: 3400.0000 CPU min MHz: 1200.0000 BogoMIPS: 4788.75 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19 [...] So, we have 32 CPUs to play with. requesting CPU id 99 generates a non-binding mask, and results in a random CPU being used: $ srun -n 1 --cpu_bind=verbose,map_cpu:99 bash -c 'printf "CPU: %s (pid: %s)\n" $(ps -h -o psr,pid $$)' cpu-bind=MASK - sh02-01n60, task 0 0 [48443]: mask 0xfffff set CPU: 3 (pid: 48443) $ srun -n 1 --cpu_bind=verbose,map_cpu:99 bash -c 'printf "CPU: %s (pid: %s)\n" $(ps -h -o psr,pid $$)' cpu-bind=MASK - sh02-01n60, task 0 0 [48477]: mask 0xfffff set CPU: 5 (pid: 48477) > Either case non-whole node allocated or map_cpu/mask_cpu pointing to CPUs > outside of the available range you should see appropriate info level message > coming from slurmstepd (where task binding really happens): > > either(src/plugins/task/affinity/dist_tasks.c): > > 413 info("entire node must be allocated, " > > 414 "disabling affinity"); > > or: > > 295 info("Ignoring user CPU binding outside of job " > > 296 "step allocation"); Indeed, from the test above, I get this: Mar 30 08:29:44 sh02-01n60.int slurmd[55645]: task/affinity: _validate_map: Ignoring user CPU binding outside of job step allocation Mar 30 08:29:44 sh02-01n60.int slurmd[55645]: task/affinity: lllp_distribution: JobId=21306623 manual binding: verbose,mask_cpu,one_thread > Unfortunately, we don't send this message to the end-user today, which is a > kind of limitation of architecture (sending log messages from slurmstepd to > srun). I'll take a look to check if it's something we can improve, but it > sounds like falling into the enhancement area. I see, but I'd classify this more in the defect area :) The person who will care the most about this message is the user who submits the job: she will be the one most affected by the random binding of her tasks, and she won't get access to the warning message. On the other hand, the only person who can see the message is the sysadmin, who has no direct interest in knowing that affinity was disabled for that job. So I think there's a problem with the direction of that message, it really should be presented to the user, not logged for the sysadmin. One could even argue that because the requested binding (to a CPU outside the allocation) cannot be granted, the step should not run at all and be rejected. I mean, if a user requests 64 CPUs on a 32-CPU node, the job will be rejected, so if she requests a CPU binding that can't be satisfied, maybe it shouldn't go through either (rather than being executed with a different binding that has been requested). What do you think? Thanks, -- Kilian Either case non-whole node allocated or map_cpu/mask_cpu pointing to CPUs outside of the available range you should see appropriate info level message coming from slurmstepd Michel https://acgmecaselog.com/ Kilian, The discussed behavior got changed on the master branch[1] (Slurm 22.05 to be). In case of cpu binding failure, task launch will be rejected with the appropriate error message. cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/85af3bcb8c1fa2e9939263d3f5ef3f3625a5997c Great, thanks Marcin! Cheers, -- Kilian |