Ticket 16306 - pmix shmem permissions cause segfault
Summary: pmix shmem permissions cause segfault
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmstepd (show other tickets)
Version: 23.02.0
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-03-17 08:09 MDT by Nate Rini
Modified: 2023-03-29 14:59 MDT (History)
2 users (show)

See Also:
Site: SchedMD
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 23.02.2,23.11.0rc1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
SLURM UID Patch (842 bytes, patch)
2023-03-20 12:42 MDT, Samuel Gutierrez
Details | Diff
patch for 2302 (v1) (1.76 KB, patch)
2023-03-21 11:19 MDT, Nate Rini
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Nate Rini 2023-03-17 08:09:39 MDT
Running PMIx 5.0 branch with PRRTE master branch on Slurm-23.02 results in segfault when shmem is configured with pmix.

found this while working on bug#15536

Originally opened:
> https://github.com/openpmix/openpmix/issues/3019#issuecomment-1473064425

Issue:
$ srun --mpi=pmix env PMIx_MCA_gds=hash /usr/local/src/prrte/examples/hello
[OmicronPersei8:2727328] PMIX ERROR: FILE_OPEN_FAILURE in file pmix_shmem.c at line 121
[OmicronPersei8:2727328] PMIX ERROR: FILE_OPEN_FAILURE in file gds_shmem.c at line 866
[OmicronPersei8:2727328] PMIX ERROR: FILE_OPEN_FAILURE in file gds_shmem.c at line 939
[OmicronPersei8:2727328] PMIX ERROR: FILE_OPEN_FAILURE in file gds_shmem.c at line 1716
[OmicronPersei8:2727328] PMIX ERROR: FILE_OPEN_FAILURE in file gds_shmem.c at line 1748
[OmicronPersei8:2727328] PMIX ERROR: FILE_OPEN_FAILURE in file gds_shmem.c at line 1766
[OmicronPersei8:2727328] PMIX ERROR: UNPACK-FAILURE in file gds_shmem.c at line 1768
srun: error: host1: task 0: Segmentation fault (core dumped)
Comment 1 Nate Rini 2023-03-17 08:20:54 MDT
> $ sudo gdb -x gdb.script --args sbin/slurmd -Dvvvvvv -N host1
> 
> Thread 3.1 "slurmstepd" hit Breakpoint 3, stepd_step_rec_create (msg=0x4623b0, protocol_version=10240) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd_job.c:278
> 278             stepd_step_rec_t *step = NULL;
> #0  stepd_step_rec_create (msg=0x4623b0, protocol_version=10240) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd_job.c:278
> #1  0x0000000000413625 in mgr_launch_tasks_setup (msg=0x4623b0, cli=0x458a10, self=0x458ad0, protocol_version=10240) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/mgr.c:212
> #2  0x00000000004131e7 in _step_setup (cli=0x458a10, self=0x458ad0, msg=0x45eca0) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:782
> #3  0x000000000040fbc9 in main (argc=1, argv=0x7fffffffe548) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:146
> [New Thread 0x7ffff7764640 (LWP 3930380)]
> [New Thread 0x7ffff71ff640 (LWP 3930381)]
> [New Thread 0x7ffff70fe640 (LWP 3930382)]
> Mar 17 08:18:18.324931 3930321 slurmd       0x7ffff73a2640: debug3: _rpc_launch_tasks: return from _forkexec_slurmstepd
> Mar 17 08:18:18.325316 3930321 slurmd       0x7ffff73a2640: debug2: Finish processing RPC: REQUEST_LAUNCH_TASKS
> [New Thread 0x7ffff6ffd640 (LWP 3930384)]
> [New Thread 0x7ffff5d20640 (LWP 3930387)]
> [Switching to Thread 0x7ffff6ffd640 (LWP 3930384)]
> 
> Thread 3.5 "slurmstepd" hit Breakpoint 4, register_nspace (nptr=0x7fffcc001880) at pmdl_ompi.c:417
> 417         pmix_output_verbose(2, pmix_pmdl_base_framework.framework_output,
> (gdb) bt
> #0  register_nspace (nptr=0x7fffcc001880) at pmdl_ompi.c:417
> #1  0x00007ffff7466a5c in pmix_pmdl_base_register_nspace (nptr=0x7fffcc001880) at base/pmdl_base_stubs.c:174
> #2  0x00007ffff72a7fc4 in _register_nspace (sd=-1, args=4, cbdata=0x6efc80) at server/pmix_server.c:1146
> #3  0x00007ffff7782ee8 in ?? () from /lib/x86_64-linux-gnu/libevent_core-2.1.so.7
> #4  0x00007ffff7784bf7 in event_base_loop () from /lib/x86_64-linux-gnu/libevent_core-2.1.so.7
> #5  0x00007ffff730df70 in progress_engine (obj=0x49a8f8) at runtime/pmix_progress_threads.c:228
> #6  0x00007ffff7894b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
> #7  0x00007ffff7926a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81


GDB script: gdb.script
> handle SIG33 nostop noprint
> set pagination off
> set breakpoint pending on
> 
> set follow-fork-mode child
> set print pretty on
> set $has_container=0
> 
> b _fork_all_tasks if step->container
> commands 1
> bt
> c
> end
> 
> b _fork_child_with_wait_info if $has_container
> commands 2
> #set follow-fork-mode child
> bt
> c
> end
> 
> b stepd_step_rec_create
> commands 3
> bt
> set follow-fork-mode parent
> c
> end
> 
> b register_nspace
> 
> r
Comment 2 Ralph Castain 2023-03-17 09:04:08 MDT
Could you please check that the pmix plugin is providing the userid and groupid for the application to the call to "register_nspace"? These are required entries but the requirement may have come after the plugin was originally written.

Also, I gather you have been working with the PMIx master branch, which is now being released as v5.0. It was my understanding that SchedMD maintained the configure logic for the pmix plugin on a per-major-release basis - i.e., that the ability to use v5.0 of PMIx required a change to the configure logic. Has this been included in a release yet? If not, do you have any notion of what release might include that change?
Comment 3 Nate Rini 2023-03-17 09:08:55 MDT
I made a gdb script to trace the active uid for all the MPI entry points. Looks like all of them are run as root.

to trace: sudo gdb -x gdb.script --args sbin/slurmd -Dvvvvvv -N host1 
> Breakpoint 9, mpi_p_conf_options (full_options=0x4a4e20, full_opt_cnt=0x4c79f0) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:319
> 319             transfer_s_p_options(full_options, pmix_options, full_opt_cnt);
> #0  mpi_p_conf_options (full_options=0x4a4e20, full_opt_cnt=0x4c79f0) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:319
> #1  0x00007ffff7e4b09c in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:360
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $1 = 0
> $2 = 0
> 
> Breakpoint 9, mpi_p_conf_options (full_options=0x4a4e28, full_opt_cnt=0x4c79f4) at /home/nate/slurm/bug15536//src/src/plugins/mpi/cray_shasta/mpi_cray_shasta.c:350
> 350     }
> #0  mpi_p_conf_options (full_options=0x4a4e28, full_opt_cnt=0x4c79f4) at /home/nate/slurm/bug15536//src/src/plugins/mpi/cray_shasta/mpi_cray_shasta.c:350
> #1  0x00007ffff7e4b09c in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:360
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $3 = 0
> $4 = 0
> 
> Breakpoint 9, mpi_p_conf_options (full_options=0x4a4e30, full_opt_cnt=0x4c79f8) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:319
> 319             transfer_s_p_options(full_options, pmix_options, full_opt_cnt);
> #0  mpi_p_conf_options (full_options=0x4a4e30, full_opt_cnt=0x4c79f8) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:319
> #1  0x00007ffff7e4b09c in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:360
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $5 = 0
> $6 = 0
> 
> Breakpoint 9, mpi_p_conf_options (full_options=0x4a4e38, full_opt_cnt=0x4c79fc) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:319
> 319             transfer_s_p_options(full_options, pmix_options, full_opt_cnt);
> #0  mpi_p_conf_options (full_options=0x4a4e38, full_opt_cnt=0x4c79fc) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:319
> #1  0x00007ffff7e4b09c in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:360
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $7 = 0
> $8 = 0
> 
> Breakpoint 9, mpi_p_conf_options (full_options=0x4a4e40, full_opt_cnt=0x4c7a00) at /home/nate/slurm/bug15536//src/src/plugins/mpi/none/mpi_none.c:107
> 107     }
> #0  mpi_p_conf_options (full_options=0x4a4e40, full_opt_cnt=0x4c7a00) at /home/nate/slurm/bug15536//src/src/plugins/mpi/none/mpi_none.c:107
> #1  0x00007ffff7e4b09c in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:360
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $9 = 0
> $10 = 0
> 
> Breakpoint 9, mpi_p_conf_options (full_options=0x4a4e48, full_opt_cnt=0x4c7a04) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmi2/mpi_pmi2.c:181
> 181     }
> #0  mpi_p_conf_options (full_options=0x4a4e48, full_opt_cnt=0x4c7a04) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmi2/mpi_pmi2.c:181
> #1  0x00007ffff7e4b09c in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:360
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $11 = 0
> $12 = 0
> Mar 17 09:05:18.466590 3939082 slurmd       0x7ffff7ef0f00: debug2: No mpi.conf file (/home/nate/slurm/bug15536/etc/mpi.conf)
> 
> Breakpoint 10, mpi_p_conf_set (tbl=0x4a7980) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:324
> 324             _reset_pmix_conf();
> #0  mpi_p_conf_set (tbl=0x4a7980) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:324
> #1  0x00007ffff7e4b31f in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:409
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $13 = 0
> $14 = 0
> 
> Breakpoint 7, mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:352
> 352             s_p_hashtbl_t *tbl = s_p_hashtbl_create(pmix_options);
> #0  mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:352
> #1  0x00007ffff7e4b343 in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:416
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $15 = 0
> $16 = 0
> 
> Breakpoint 10, mpi_p_conf_set (tbl=0x0) at /home/nate/slurm/bug15536//src/src/plugins/mpi/cray_shasta/mpi_cray_shasta.c:354
> 354     }
> #0  mpi_p_conf_set (tbl=0x0) at /home/nate/slurm/bug15536//src/src/plugins/mpi/cray_shasta/mpi_cray_shasta.c:354
> #1  0x00007ffff7e4b31f in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:409
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $17 = 0
> $18 = 0
> 
> Breakpoint 7, mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/cray_shasta/mpi_cray_shasta.c:358
> 358             return NULL;
> #0  mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/cray_shasta/mpi_cray_shasta.c:358
> #1  0x00007ffff7e4b343 in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:416
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $19 = 0
> $20 = 0
> 
> Breakpoint 10, mpi_p_conf_set (tbl=0x4ab850) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:324
> 324             _reset_pmix_conf();
> #0  mpi_p_conf_set (tbl=0x4ab850) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:324
> #1  0x00007ffff7e4b31f in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:409
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $21 = 0
> $22 = 0
> 
> Breakpoint 7, mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:352
> 352             s_p_hashtbl_t *tbl = s_p_hashtbl_create(pmix_options);
> #0  mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:352
> #1  0x00007ffff7e4b343 in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:416
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $23 = 0
> $24 = 0
> 
> Breakpoint 10, mpi_p_conf_set (tbl=0x4c0d20) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:324
> 324             _reset_pmix_conf();
> #0  mpi_p_conf_set (tbl=0x4c0d20) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:324
> #1  0x00007ffff7e4b31f in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:409
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $25 = 0
> $26 = 0
> 
> Breakpoint 7, mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:352
> 352             s_p_hashtbl_t *tbl = s_p_hashtbl_create(pmix_options);
> #0  mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:352
> #1  0x00007ffff7e4b343 in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:416
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $27 = 0
> $28 = 0
> 
> Breakpoint 10, mpi_p_conf_set (tbl=0x0) at /home/nate/slurm/bug15536//src/src/plugins/mpi/none/mpi_none.c:111
> 111     }
> #0  mpi_p_conf_set (tbl=0x0) at /home/nate/slurm/bug15536//src/src/plugins/mpi/none/mpi_none.c:111
> #1  0x00007ffff7e4b31f in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:409
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $29 = 0
> $30 = 0
> 
> Breakpoint 7, mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/none/mpi_none.c:115
> 115             return NULL;
> #0  mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/none/mpi_none.c:115
> #1  0x00007ffff7e4b343 in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:416
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $31 = 0
> $32 = 0
> 
> Breakpoint 10, mpi_p_conf_set (tbl=0x0) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmi2/mpi_pmi2.c:185
> 185     }
> #0  mpi_p_conf_set (tbl=0x0) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmi2/mpi_pmi2.c:185
> #1  0x00007ffff7e4b31f in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:409
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $33 = 0
> $34 = 0
> 
> Breakpoint 7, mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmi2/mpi_pmi2.c:189
> 189             return NULL;
> #0  mpi_p_conf_get () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmi2/mpi_pmi2.c:189
> #1  0x00007ffff7e4b343 in _mpi_init_locked (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:416
> #2  0x00007ffff7e4b5c3 in _mpi_init (mpi_type=0x0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:454
> #3  0x00007ffff7e4bc81 in mpi_g_daemon_init () at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:556
> #4  0x000000000040d1b2 in main (argc=4, argv=0x7fffffffe578) at /home/nate/slurm/bug15536//src/src/slurmd/slurmd/slurmd.c:382
> $35 = 0
> $36 = 0
> 
> Thread 3.1 "slurmstepd" hit Breakpoint 10, mpi_p_conf_set (tbl=0x49f800) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:324
> 324             _reset_pmix_conf();
> #0  mpi_p_conf_set (tbl=0x49f800) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:324
> #1  0x00007ffff7e4af23 in _mpi_init_locked (mpi_type=0x7fffffffe1a0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:340
> #2  0x00007ffff7e4ce62 in mpi_conf_recv_stepd (fd=0) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:671
> #3  0x0000000000413013 in _init_from_slurmd (sock=0, argv=0x7fffffffe548, _cli=0x7fffffffe3f8, _self=0x7fffffffe3f0, _msg=0x7fffffffe3e8) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:738
> #4  0x000000000040fbb2 in main (argc=1, argv=0x7fffffffe548) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:142
> $37 = 0
> $38 = 0
> 
> Thread 3.1 "slurmstepd" hit Breakpoint 3, stepd_step_rec_create (msg=0x4623b0, protocol_version=10240) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd_job.c:278
> 278             stepd_step_rec_t *step = NULL;
> #0  stepd_step_rec_create (msg=0x4623b0, protocol_version=10240) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd_job.c:278
> #1  0x0000000000413625 in mgr_launch_tasks_setup (msg=0x4623b0, cli=0x458a10, self=0x458ad0, protocol_version=10240) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/mgr.c:212
> #2  0x00000000004131e7 in _step_setup (cli=0x458a10, self=0x458ad0, msg=0x45eca0) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:782
> #3  0x000000000040fbc9 in main (argc=1, argv=0x7fffffffe548) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:146
> [New Thread 0x7ffff7764640 (LWP 3939287)]
> [New Thread 0x7ffff71ff640 (LWP 3939288)]
> [New Thread 0x7ffff70fe640 (LWP 3939289)]
> Mar 17 09:05:43.315608 3939082 slurmd       0x7ffff73a2640: warning: slurmstepd startup took 17 sec, possible file system problem or full memory
> Mar 17 09:05:43.315708 3939082 slurmd       0x7ffff73a2640: debug3: _rpc_launch_tasks: return from _forkexec_slurmstepd
> Mar 17 09:05:43.315954 3939082 slurmd       0x7ffff73a2640: debug2: Finish processing RPC: REQUEST_LAUNCH_TASKS
> 
> Thread 3.1 "slurmstepd" hit Breakpoint 11, mpi_p_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:215
> 215             pmixp_debug_hang(0);
> #0  mpi_p_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:215
> #1  0x00007ffff7e4b880 in mpi_g_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:498
> #2  0x000000000041612d in job_manager (step=0x46d170) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/mgr.c:1340
> #3  0x000000000040fccd in main (argc=1, argv=0x7fffffffe548) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:188
> $39 = 0
> $40 = 0

gdb script:
> handle SIG33 nostop noprint
> set pagination off
> set breakpoint pending on
> 
> set follow-fork-mode child
> set print pretty on
> set $has_container=0
> 
> b _fork_all_tasks if step->container
> commands 1
> bt
> c
> end
> 
> b _fork_child_with_wait_info if $has_container
> commands 2
> #set follow-fork-mode child
> bt
> c
> end
> 
> b stepd_step_rec_create
> commands 3
> bt
> set follow-fork-mode parent
> c
> end
> 
> b plugin_id
> b mpi_p_client_fini
> b mpi_p_client_prelaunch
> b mpi_p_conf_get
> b mpi_p_conf_get_printable
> b mpi_p_conf_options
> b mpi_p_conf_set
> b mpi_p_slurmstepd_prefork
> b mpi_p_slurmstepd_task
> 
> commands 4-12
> bt
> p (int) getuid()
> p (int) geteuid()
> c
> end
> 
> r
Comment 4 Nate Rini 2023-03-17 09:11:19 MDT
(In reply to Ralph Castain from comment #2)> 
> Also, I gather you have been working with the PMIx master branch, which is
> now being released as v5.0.

I'm testing with PMIX:
> commit 4cd61856ae1bdbd99d83ac94b5f963ac2c7fc0b5 (HEAD -> v5.0, origin/v5.0)

PRRTE:
> commit 4725d89abe53c52343eeb49c90986c4d407d6392 (HEAD -> master, origin/master, origin/HEAD)

Slurm:
> commit 63b10b0ddf7a64667fcb49749880a06c4f409036 (github/master)


> It was my understanding that SchedMD maintained
> the configure logic for the pmix plugin on a per-major-release basis - i.e.,
> that the ability to use v5.0 of PMIx required a change to the configure
> logic. Has this been included in a release yet? If not, do you have any
> notion of what release might include that change?

Support for PMIx v5 was added for the recent slurm-23.02 release. I doubt we have any users running it at this point.
Comment 5 Nate Rini 2023-03-17 09:12:40 MDT
(In reply to Ralph Castain from comment #2)
> Could you please check that the pmix plugin is providing the userid and
> groupid for the application to the call to "register_nspace"? These are
> required entries but the requirement may have come after the plugin was
> originally written.

Looking for this now
Comment 6 Nate Rini 2023-03-17 09:21:06 MDT
(In reply to Nate Rini from comment #5)
> (In reply to Ralph Castain from comment #2)
> > Could you please check that the pmix plugin is providing the userid and
> > groupid for the application to the call to "register_nspace"? These are
> > required entries but the requirement may have come after the plugin was
> > originally written.

Based on the trace in comment#1, register_nspace() is called by progress_engine() which is in a pmix created thread. I traced start_progress_engine() as it appears to spawn this thread to see which Slurm hook is getting called. Looks like mpi_p_slurmstepd_prefork() is the hook and it provides "step" which includes the job uid/gid.

> Thread 3.1 "slurmstepd" hit Breakpoint 2, start_progress_engine (trk=0x49a7a0) at runtime/pmix_progress_threads.c:260
> 260         assert(!trk->ev_active);
> (gdb) bt
> #0  start_progress_engine (trk=0x49a7a0) at runtime/pmix_progress_threads.c:260
> #1  0x00007ffff730eaf6 in pmix_progress_thread_start (name=0x7ffff74cfb50 "PMIX-wide async progress thread") at runtime/pmix_progress_threads.c:393
> #2  0x00007ffff730ca9c in pmix_rte_init (type=2, info=0x4a2790, ninfo=2, cbfunc=0x0) at runtime/pmix_init.c:554
> #3  0x00007ffff72a225a in PMIx_server_init (module=0x7ffff77ff000 <slurm_pmix_cb>, info=0x4a2790, ninfo=2) at server/pmix_server.c:608
> #4  0x00007ffff77ed787 in pmixp_lib_init () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/pmixp_client_v2.c:410
> #5  0x00007ffff77d3c8d in pmixp_libpmix_init () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/pmixp_client.c:493
> #6  0x00007ffff77d85a1 in pmixp_stepd_init (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/pmixp_server.c:421
> #7  0x00007ffff77cf3ce in mpi_p_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:221
> #8  0x00007ffff7e4b880 in mpi_g_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:498
> #9  0x000000000041612d in job_manager (step=0x46d170) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/mgr.c:1340
> #10 0x000000000040fccd in main (argc=1, argv=0x7fffffffe548) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:188
> (gdb) f 7
> #7  0x00007ffff77cf3ce in mpi_p_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:221
> 221             if (SLURM_SUCCESS != (ret = pmixp_stepd_init(step, env))) {
> (gdb) p step->user_name 
> $2 = 0x486eb0 "nate"
> (gdb) p step->u
> uid        user_name  
> (gdb) p step->uid
> $3 = 1000
> (gdb) p step->gid
> $4 = 1000

gdb script:
> handle SIG33 nostop noprint
> set pagination off
> set breakpoint pending on
> 
> set follow-fork-mode child
> set print pretty on
> 
> b stepd_step_rec_create
> commands 1
> bt
> set follow-fork-mode parent
> c
> end
> 
> b start_progress_engine
> 
> r
Comment 7 Ralph Castain 2023-03-17 18:15:32 MDT
(In reply to Nate Rini from comment #6)
> (In reply to Nate Rini from comment #5)
> > (In reply to Ralph Castain from comment #2)
> > > Could you please check that the pmix plugin is providing the userid and
> > > groupid for the application to the call to "register_nspace"? These are
> > > required entries but the requirement may have come after the plugin was
> > > originally written.
> 
> Based on the trace in comment#1, register_nspace() is called by
> progress_engine() which is in a pmix created thread. I traced
> start_progress_engine() as it appears to spawn this thread to see which
> Slurm hook is getting called. Looks like mpi_p_slurmstepd_prefork() is the
> hook and it provides "step" which includes the job uid/gid.
> 
> > Thread 3.1 "slurmstepd" hit Breakpoint 2, start_progress_engine (trk=0x49a7a0) at runtime/pmix_progress_threads.c:260
> > 260         assert(!trk->ev_active);
> > (gdb) bt
> > #0  start_progress_engine (trk=0x49a7a0) at runtime/pmix_progress_threads.c:260
> > #1  0x00007ffff730eaf6 in pmix_progress_thread_start (name=0x7ffff74cfb50 "PMIX-wide async progress thread") at runtime/pmix_progress_threads.c:393
> > #2  0x00007ffff730ca9c in pmix_rte_init (type=2, info=0x4a2790, ninfo=2, cbfunc=0x0) at runtime/pmix_init.c:554
> > #3  0x00007ffff72a225a in PMIx_server_init (module=0x7ffff77ff000 <slurm_pmix_cb>, info=0x4a2790, ninfo=2) at server/pmix_server.c:608
> > #4  0x00007ffff77ed787 in pmixp_lib_init () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/pmixp_client_v2.c:410
> > #5  0x00007ffff77d3c8d in pmixp_libpmix_init () at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/pmixp_client.c:493
> > #6  0x00007ffff77d85a1 in pmixp_stepd_init (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/pmixp_server.c:421
> > #7  0x00007ffff77cf3ce in mpi_p_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:221
> > #8  0x00007ffff7e4b880 in mpi_g_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/interfaces/mpi.c:498
> > #9  0x000000000041612d in job_manager (step=0x46d170) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/mgr.c:1340
> > #10 0x000000000040fccd in main (argc=1, argv=0x7fffffffe548) at /home/nate/slurm/bug15536//src/src/slurmd/slurmstepd/slurmstepd.c:188
> > (gdb) f 7
> > #7  0x00007ffff77cf3ce in mpi_p_slurmstepd_prefork (step=0x46d170, env=0x46d270) at /home/nate/slurm/bug15536//src/src/plugins/mpi/pmix/mpi_pmix.c:221
> > 221             if (SLURM_SUCCESS != (ret = pmixp_stepd_init(step, env))) {
> > (gdb) p step->user_name 
> > $2 = 0x486eb0 "nate"
> > (gdb) p step->u
> > uid        user_name  
> > (gdb) p step->uid
> > $3 = 1000
> > (gdb) p step->gid
> > $4 = 1000
> 
> gdb script:
> > handle SIG33 nostop noprint
> > set pagination off
> > set breakpoint pending on
> > 
> > set follow-fork-mode child
> > set print pretty on
> > 
> > b stepd_step_rec_create
> > commands 1
> > bt
> > set follow-fork-mode parent
> > c
> > end
> > 
> > b start_progress_engine
> > 
> > r

Not exactly what I was asking - I want to know if the user/grp IDs are actually being passed to "register_nspace". You'd see a "PMIX_USERID" and "PMIX_GRPID" key being used in a pmix_info_t attribute.
Comment 8 Samuel Gutierrez 2023-03-20 12:42:55 MDT
Created attachment 29420 [details]
SLURM UID Patch
Comment 9 Samuel Gutierrez 2023-03-20 12:43:42 MDT
Nate and I worked together to test a slightly modified version of the provided path. Thank you, Nate!
Comment 10 Samuel Gutierrez 2023-03-20 12:52:29 MDT
To avoid confusion, in my previous comment I meant to say *provided patch*.
Comment 13 Nate Rini 2023-03-21 11:19:37 MDT
Created attachment 29446 [details]
patch for 2302 (v1)
Comment 22 Nate Rini 2023-03-29 14:59:44 MDT
The patch is now upstream:
> https://github.com/SchedMD/slurm/commit/d23cad68dfa0a64670cc9d8bd1ec333477e4f3ff