Bug 10684 - mpi/pmix: PMIX_NODEID key must be included in process-level at the job-info.
Summary: mpi/pmix: PMIX_NODEID key must be included in process-level at the job-info.
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: PMIx (show other bugs)
Version: 20.11.3
Hardware: Linux Linux
: --- C - Contributions
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on: 7263
Blocks:
  Show dependency treegraph
 
Reported: 2021-01-25 03:02 MST by Boris Karasev
Modified: 2021-02-01 13:53 MST (History)
3 users (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 20.11.4
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
bug10684_2011.patch (1.02 KB, patch)
2021-01-25 03:03 MST, Boris Karasev
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Boris Karasev 2021-01-25 03:02:10 MST
According to the PMIx standard, the PMIX_NODEID key must be included in
the process level data.

Refs: https://pmix.github.io/uploads/2020/12/pmix-standard-v4.0.pdf

Error reproduction:
$srun -N2 --mpi=pmix_v4 ./pmix_client -n 2
==46645== [1611566237.993365] ERROR [pmix_client.c:129:main]: rank 0: PMIx_Get nodeid failed: NOT-FOUND
srun: error: jazz29: task 0: Exited with exit code 210
srun: launch/slurm: _step_signal: Terminating StepId=2.9
srun: error: jazz30: task 1: Terminated
srun: Force Terminated StepId=2.9
Comment 1 Boris Karasev 2021-01-25 03:03:48 MST
Created attachment 17593 [details]
bug10684_2011.patch
Comment 2 Boris Karasev 2021-01-31 23:34:33 MST
(In reply to Boris Karasev from comment #1)
> Created attachment 17593 [details]
> bug10684_2011.patch

This patch depends on https://bugs.schedmd.com/show_bug.cgi?id=7263.
Comment 3 Boris Karasev 2021-01-31 23:49:07 MST
(In reply to Boris Karasev from comment #2)
> (In reply to Boris Karasev from comment #1)
> > Created attachment 17593 [details]
> > bug10684_2011.patch
> 
> This patch depends on https://bugs.schedmd.com/show_bug.cgi?id=7263.

There is no direct dependence on 7263. It is simply reproduced using the PMIx v4 test suite, which support was added in 7263.
Comment 4 Tim Wickberg 2021-02-01 13:53:07 MST
Comment on attachment 17593 [details]
bug10684_2011.patch

commit f950cc9831e2c808e7b25e057950ab8e7e121778
Author:     Boris Karasev <karasev.b@gmail.com>
AuthorDate: Fri Jan 22 06:45:54 2021 +0200

    mpi/pmix: include PMIX_NODEID for each process entry.
    
    According to the PMIx standard, the PMIX_NODEID key must be included in
    the process level data.
    
    Refs: https://pmix.github.io/uploads/2020/12/pmix-standard-v4.0.pdf
    
    Bug 10684.
Comment 5 Tim Wickberg 2021-02-01 13:53:38 MST
Thanks Boris. Committed ahead of 20.11.4.