Hi, on our 36 cores Broadwell nodes with one QLogic QDR InfiniBand HCA (16 hardware contexts), we set the environment variable PSM_RANKS_PER_CONTEXT=4 to allow more than 16 MPI process per node. Before upgrading SLURM from 17.11.7 to 18.08.3, everything worked fine with jobs sharing the same node. After the upgrade we are now experiencing the following behaviour running two jobs like this $ srun -N 2 --ntasks-per-node=16 -p gll_usr_prod -w node[234-235] -t 6:00:00 --pty bash 1) launching the same MPI application with mpirun, everithing works fine: each job occupies 4 contexts per node (16 tasks per node) 2) launching the same MPI application, one with srun and the second with mpirun, also works fine: each job occupies 4 contexts per node (16 tasks per node) 3) launching the same MPI application with srun, the first job launched works fine (4 contexts per node) while the second fails with the following errors [ibaccare@node234 Programs]$ srun ./hello_mpi_ompi_2.1.1 node234.11959can't open /dev/ipath, network down (err=26) node234.11960can't open /dev/ipath, network down (err=26) node234.11962can't open /dev/ipath, network down (err=26) node234.11963can't open /dev/ipath, network down (err=26) -------------------------------------------------------------------------- PSM was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning. Error: Could not detect network connectivity -------------------------------------------------------------------------- node234.11959ipath_userinit: assign_context command failed: Network is down [...] node235.35701can't open /dev/ipath, network down (err=26) node235.35706can't open /dev/ipath, network down (err=26) node235.35709can't open /dev/ipath, network down (err=26) node235.35710can't open /dev/ipath, network down (err=26) -------------------------------------------------------------------------- PSM was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning. Error: Could not detect network connectivity -------------------------------------------------------------------------- node235.35709ipath_userinit: assign_context command failed: Network is down [...] It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Error" (-1) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) slurmstepd: error: *** STEP 247805.14 ON node234 CANCELLED AT 2018-10-31T14:54:34 *** srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: node235: tasks 16-18: Exited with exit code 1 srun: Terminating job step 247805.14 srun: error: node235: tasks 19-31: Killed srun: error: node234: tasks 0-4: Exited with exit code 1 srun: error: node234: tasks 5-15: Killed You may notice that the error messages are related *ONLY* to 4 out of the 16 processes launched by the second srun command. So the first job is correctly consuming 4 hardware contexts (leaving 12 free hw contexts) but apparently the second job is not allowed to share the remaining free contexts. In fact running only 12 processes per node works: [ibaccare@node234 Programs]$ srun -N 2 -n 24 --ntasks-per-node=12 ./hello_mpi_ompi_2.1.1 <it works!!!> and the number of free hw contexts is obviously zero on both nodes [root@master ~]# xdsh node23[4,5] cat /sys/class/infiniband/qib0/nfreectxts node234: 0 node235: 0 To sum up: running both job with srun, the contexts sharing seems not to be enabled for the second job. This happens for both IntelMPI and OpenMPI. Thanks ale & isa
Ale, Isa, Can you please upload your slurm configuration. Can you also please call: ldd hello_mpi_ompi_2.1.1 (both for Intel and OpenMPI) lsb_release -a ofed_info | head -1 Thanks --Nate
Ale, Isa, Can you also please upload your slurm logs from the affected nodes, along with the node running slurmctld? Thanks --Nate
Created attachment 8218 [details] requested info and logs Hi Nate I'm attaching the following files slurm.conf hello_mpi_ompi_2.1.1.ldd.out (ldd of Intelmpi exe) hello_mpi_sleep_impi.ldd.out (ldd of Openmpi exe) redhat-release proc_version ofed_info.out and the relevant logs of nodes and controller slurmd-node423.log slurmd-node424.log slurmctld.log in the job 259968 we launched the first srun consuming 4 hw contexts [root@master ~]# xdsh node4[23,24] cat /sys/class/infiniband/qib0/nfreectxts node423: 12 node424: 12 in the job 259969 we first launched the second srun that crashes and then srun --ntasks-per-node=12 that consumes the remaining 12 contexts [root@master ~]# xdsh node4[23,24] cat /sys/class/infiniband/qib0/nfreectxts node423: 0 node424: 0 thanks ale & isa
Nate ERRATA: hello_mpi_ompi_2.1.1.ldd.out (ldd of Openmpi exe) hello_mpi_sleep_impi.ldd.out (ldd of Intelmpi exe) sorry for the mistake ;-) thanks ale
(In reply to Cineca HPC Systems from comment #4) > in the job 259969 we first launched the second srun that crashes and then > srun --ntasks-per-node=12 that consumes the remaining 12 contexts Do the processes of the first job die or do they stay around as unkillable zombies?
(In reply to Nate Rini from comment #6) > (In reply to Cineca HPC Systems from comment #4) > > in the job 259969 we first launched the second srun that crashes and then > > srun --ntasks-per-node=12 that consumes the remaining 12 contexts > Do the processes of the first job die or do they stay around as unkillable > zombies? None of the jobs leave any zombies. thanks
Ale, Isa, > None of the jobs leave any zombies. If the job processes are being cleaned up, then this is likely an issue with PSM. Do you have PSM_RANKS_PER_CONTEXT environmental variable set in your job? If not, can you try: > export PSM_RANKS_PER_CONTEXT=4 --Nate
(In reply to Nate Rini from comment #8) > Ale, Isa, > > > None of the jobs leave any zombies. > If the job processes are being cleaned up, then this is likely an issue with > PSM. > > Do you have PSM_RANKS_PER_CONTEXT environmental variable set in your job? > > If not, can you try: > > export PSM_RANKS_PER_CONTEXT=4 > > --Nate As we wrote in the first comment, PSM_RANKS_PER_CONTEXT=4 is already set in the jobs environment. thanks ale
Ale, Slurm should not be affecting PSM directly. Can you attach a copy of /etc/slurm/cgroup.conf to this ticket? Can you also call this srun after the first job that crashes? > srun -N 2 --ntasks-per-node=16 -p gll_usr_prod -w node[234-235] -t 6:00:00 --pty bash -c "env |grep -e PSM -e SLURM;stat /dev/ipath; ipathstats" I would like to verify the environment is being passed around as expected and ipath is visible to the user processes. Thanks, --Nate
Hi Nate this is the cgroup.conf [ibaccare@node186 Programs]$ cat /etc/slurm/cgroup.conf CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes ConstrainDevices=yes ConstrainKmemSpace=no TaskAffinity=no AllowedRamSpace=100 AllowedSwapSpace=0 MaxRAMPercent=100 MaxSwapPercent=100 MinRAMSpace=30 this is the environment [ibaccare@node186 Programs]$ env |grep -e PSM -e SLURM SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint SLURM_NODELIST=node[186-187] SLURM_JOB_NAME=bash SLURMD_NODENAME=node186 SLURM_TOPOLOGY_ADDR=node186 SLURM_NTASKS_PER_NODE=16 SLURM_PRIO_PROCESS=0 SLURM_SRUN_COMM_PORT=36285 SLURM_JOB_QOS=normal SLURM_PTY_WIN_ROW=39 SLURM_TOPOLOGY_ADDR_PATTERN=node SLURM_CPU_BIND_VERBOSE=quiet SLURM_CPU_BIND_LIST=0x000010000,0x000040000,0x000020000,0x000080000,0x000100000,0x000200000,0x000400000,0x000800000,0x001000000,0x002000000,0x004000000,0x008000000,0x010000000,0x020000000,0x040000000,0x080000000 SLURM_NNODES=2 SLURM_STEP_NUM_NODES=2 SLURM_JOBID=296059 SLURM_NTASKS=32 SLURM_LAUNCH_NODE_IPADDR=10.23.16.165 SLURM_STEP_ID=0 SLURM_STEP_LAUNCHER_PORT=36285 SLURM_TASKS_PER_NODE=16(x2) SLURM_WORKING_CLUSTER=galileo:io07:6817:8448 SLURM_JOB_ID=296059 SLURM_JOB_USER=ibaccare SLURM_STEPID=0 SLURM_SRUN_COMM_HOST=10.23.16.165 SLURM_CPU_BIND_TYPE=mask_cpu: SLURM_PTY_WIN_COL=151 SLURM_UMASK=0022 SLURM_JOB_UID=28550 SLURM_NODEID=0 SLURM_SUBMIT_DIR=/galileo/home/userinternal/ibaccare SLURM_TASK_PID=22081 SLURM_NPROCS=32 SLURM_CPUS_ON_NODE=16 SLURM_DISTRIBUTION=block SLURM_PROCID=0 SLURM_JOB_NODELIST=node[186-187] SLURM_PTY_PORT=34011 SLURM_LOCALID=0 PSM_RANKS_PER_CONTEXT=4 SLURM_JOB_GID=25200 SLURM_JOB_CPUS_PER_NODE=16(x2) SLURM_CLUSTER_NAME=galileo SLURM_GTIDS=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 SLURM_SUBMIT_HOST=node165 SLURM_JOB_PARTITION=gll_usr_prod SLURM_STEP_NUM_TASKS=32 SLURM_JOB_ACCOUNT=cin_staff SLURM_JOB_NUM_NODES=2 SLURM_STEP_TASKS_PER_NODE=16(x2) SLURM_STEP_NODELIST=node[186-187] SLURM_CPU_BIND=quiet,mask_cpu:0x000010000,0x000040000,0x000020000,0x000080000,0x000100000,0x000200000,0x000400000,0x000800000,0x001000000,0x002000000,0x004000000,0x008000000,0x010000000,0x020000000,0x040000000,0x080000000 [ibaccare@node186 Programs]$ stat /dev/ipath File: ‘/dev/ipath’ Size: 0 Blocks: 0 IO Block: 4096 character special file Device: 5h/5d Inode: 14361 Links: 1 Device type: f6,0 Access: (0666/crw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-09-28 18:43:12.307213177 +0200 Modify: 2018-09-28 18:43:12.307213177 +0200 Change: 2018-09-28 18:43:12.307213177 +0200 Birth: - we don't have ipathstats command thanks ale
Ale, (In reply to Cineca HPC Systems from comment #11) > [ibaccare@node186 Programs]$ env |grep -e PSM -e SLURM > SLURM_CPU_BIND_TYPE=mask_cpu: > PSM_RANKS_PER_CONTEXT=4 > [ibaccare@node186 Programs]$ stat /dev/ipath > File: ‘/dev/ipath’ > Access: (0666/crw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root Looks like the device is visible and Slurm is not hiding it from the job with cgroups. > we don't have ipathstats command Wasn't required but would have been nice to verify state of the device outside of MPI. It doesn't look like Slurm is affecting PSM directly, your environment also appears correct. >36 cores Broadwell nodes with one QLogic QDR InfiniBand HCA (16 hardware contexts) >$ srun -N 2 --ntasks-per-node=16 -p gll_usr_prod -w node[234-235] -t 6:00:00 --pty bash Looking at the Intel docs: >Each MPI process requires a context. If there are more MPI processes than hardware contexts, the hardware contexts will be shared. They can be shared 2, 3 or 4 ways, supporting a maximum of 4x16=64 processes. Can you try setting this for your test job? >PSM_RANKS_PER_CONTEXT=2 >PSM_SHAREDCONTEXTS=1 I want to verify multiple jobs can be made to work from inside of srun call. PSM_SHAREDCONTEXTS should be 1 by default but might be safer to make sure it is set. >2) launching the same MPI application, one with srun and the second with mpirun, also works fine: each job occupies 4 contexts per node (16 tasks per node) Based on your description, a change is getting applied by srun but nothing should be touching PSM directly. Can you also try disabling cpu binding by setting this argument to your job: >--cpu-bind=none If neither of those work, please try setting: For your Intel MPI job and send all the logs. >export I_MPI_DEBUG=5 Also, please try adding this argument to your OpenMPI mpirun call: >-mca mpi_show_mca_params all It might be worthwhile to open a parallel ticket with Intel about Omni-Path as we could be hitting some bug or issue with the driver. --Nate
Hi Nate running further tests using intelmpi we noticed that we missed an important error reported by it, when two different users share the same node. We first run a job like this with the user ibaccare [ibaccare@node444 ~]$ srun -N 1 -n 1 --ntasks-per-node=1 ~ibaccare/Programs/./hello_mpi_sleep_impi_env |& tee srun-ENV.1 then the user afederic runs the same job [afederic@node444 ~]$ srun -N 1 -n 1 --ntasks-per-node=1 ~ibaccare/Programs/./hello_mpi_sleep_impi_env |& tee srun-ENV.2 [...] Error attaching to shared memory object in shm_open: Permission denied (err=9) [0] MPI startup(): tmi fabric is not available and fallback fabric is not enabled srun: error: node444: task 0: Exited with exit code 254 srun: Terminating job step 422673.41 while the first job is running we checked the psm file created in /dev/shm node444: -rwx------ 1 ibaccare interactive 6352896 Nov 23 17:06 psm_shm.0fff0fff-0000-0000-0000-0fff0fff0fff If the first job is run by afederic the error occurs in the job of ibaccare and the psm file created has the same name but it's owned by afederic node444: -rwx------ 1 afederic interactive 99557376 Nov 23 16:47 psm_shm.0fff0fff-0000-0000-0000-0fff0fff0fff Hence our impression is that the second job tries to use the same file instead of creating a new one. In addition to that, running the following jobs with openmpi [ibaccare@node444 ~]$ srun -n 32 --ntasks-per-node=16 ~ibaccare/Programs/./hello_mpi_sleep_ompi_env [afederic@node444 ~]$ srun -n 24 --ntasks-per-node=12 ~ibaccare/Programs/./hello_mpi_sleep_ompi_env we noticed that 2 psm files are created in /dev/shm [root@master ~]# xdsh node[444,455] ls -l /dev/shm \| grep -v check node444: total 170256 node444: -rwx------ 1 afederic interactive 74702848 Nov 23 16:43 psm_shm.1a000000-1173-0000-1a00-00001a000000 node444: -rwx------ 1 ibaccare interactive 99557376 Nov 23 16:43 psm_shm.1a000000-8a73-0000-1a00-00001a000000 node455: total 170336 node455: -rwx------ 1 afederic interactive 74702848 Nov 23 16:43 psm_shm.1a000000-1173-0000-1a00-00001a000000 node455: -rwx------ 1 ibaccare interactive 99557376 Nov 23 16:43 psm_shm.1a000000-8a73-0000-1a00-00001a000000 but the second job uses one context per rank in spite of having set PSM_RANKS_PER_CONTEXT=4 [root@master ~]# xdsh node[444,455] cat /sys/class/infiniband/qib0/nfreectxts node444: 0 node455: 0 So while the first job is using 4 contexts per node (shared by the 16 tasks per node), the second uses all the remaining 12 contexts, one per rank. When using mpirun to do same tests [ibaccare@node444 ~]$ mpirun -n 1 ~ibaccare/Programs/./hello_mpi_sleep_impi_env |& tee mpirun-ENV.1 [afederic@node444 ~]$ mpirun -n 1 ~ibaccare/Programs/./hello_mpi_sleep_impi_env |& tee mpirun-ENV.2 there are two different psm files in /dev/shm node444: -rwx------ 1 ibaccare interactive 6352896 Nov 23 17:25 psm_shm.48260000-cdc9-896d-577b-050011bc0a17 node444: -rwx------ 1 afederic interactive 6352896 Nov 23 17:25 psm_shm.68260000-dd90-066e-577b-050011bc0a17 and everything works fine. We are attaching the environment files {s,mpi}run-ENV.[1,2] produced with PSM_VERBOSE_ENV=1 and I_MPI_DEBUG=5 on both srun and mpirun. thanks ale & isa
Created attachment 8406 [details] mpirun and srun environment
>node444.9711Error attaching to shared memory object in shm_open: Permission denied (err=9) Please activate debug logging on slurmd to see how the cgroups are being configured. Can you please add this line to your slurm.conf on the test nodes and SIGHUP your slurmd daemons: >SlurmdDebug=debug3 Can you try using strace to see which file it is failing to open? >[ibaccare@node444 ~]$ srun -n 32 --ntasks-per-node=16 strace -e open -tff -s9999 ~ibaccare/Programs/./hello_mpi_sleep_ompi_env >[afederic@node444 ~]$ srun -n 24 --ntasks-per-node=12 strace -e open -tff -s9999 ~ibaccare/Programs/./hello_mpi_sleep_ompi_env Please remove the SlurmdDebug line after testing to avoid filling your logs. Please attach the compressed log to this ticket. >the second job uses one context per rank in spite of having set PSM_RANKS_PER_CONTEXT=4 The logs provided show the env is being passed correctly to the application by Slurm: >srun-ENV.1:node444.9711env PSM_RANKS_PER_CONTEXT Number of ranks per context => 4 (default was 1) >srun-ENV.2:node444.9683env PSM_RANKS_PER_CONTEXT Number of ranks per context => 4 (default was 1) I suggest opening a bug with the MPI provider about this issue. > there are two different psm files in /dev/shm > > node444: -rwx------ 1 ibaccare interactive 6352896 Nov 23 17:25 > psm_shm.48260000-cdc9-896d-577b-050011bc0a17 > node444: -rwx------ 1 afederic interactive 6352896 Nov 23 17:25 > psm_shm.68260000-dd90-066e-577b-050011bc0a17 > > and everything works fine. Are you using pam_namespace.so or pam_slurm or pam_slurm_adopt to control /dev/shm instances? Using mpirun outside of Slurm is likely escaping any kind of cgroup containment setup in your cgroup.conf (which your config has constraints active on devices). Have you tried telling the job to only use tmi to see if jobs work without shm? >export I_MPI_FABRICS=tmi --Nate
(In reply to Nate Rini from comment #16) > Please activate debug logging on slurmd to see how the cgroups are being > configured. Can you please add this line to your slurm.conf on the test > nodes and SIGHUP your slurmd daemons: > >SlurmdDebug=debug3 > > > Can you try using strace to see which file it is failing to open? > >[ibaccare@node444 ~]$ srun -n 32 --ntasks-per-node=16 strace -e open -tff -s9999 ~ibaccare/Programs/./hello_mpi_sleep_ompi_env > >[afederic@node444 ~]$ srun -n 24 --ntasks-per-node=12 strace -e open -tff -s9999 ~ibaccare/Programs/./hello_mpi_sleep_ompi_env Strace and logs are attached (strace-and-slurmd-logs.tgz) > The logs provided show the env is being passed correctly to the application > by Slurm: > >srun-ENV.1:node444.9711env PSM_RANKS_PER_CONTEXT Number of ranks per context => 4 (default was 1) > >srun-ENV.2:node444.9683env PSM_RANKS_PER_CONTEXT Number of ranks per context => 4 (default was 1) > > I suggest opening a bug with the MPI provider about this issue. Why we should open an issue to Intel or OpenMPI when everything is working fine using mpirun to launch MPI applications? > Are you using pam_namespace.so or pam_slurm or pam_slurm_adopt to control > /dev/shm instances? Using mpirun outside of Slurm is likely escaping any > kind of cgroup containment setup in your cgroup.conf (which your config has > constraints active on devices). We are only using pam_slurm_adopt in this way [root@node479 ~]# grep account /etc/pam.d/sshd account required pam_nologin.so account include password-auth account sufficient pam_slurm_adopt.so account required pam_access.so but the MPI apps were always launched inside the shell opened by srun. > Have you tried telling the job to only use tmi to see if jobs work without > shm? > >export I_MPI_FABRICS=tmi yes, same results. thanks ale & isa
Created attachment 8439 [details] strace of mpi apps and slurmd logs
(In reply to Cineca HPC Systems from comment #17) > > The logs provided show the env is being passed correctly to the application > > by Slurm: > > >srun-ENV.1:node444.9711env PSM_RANKS_PER_CONTEXT Number of ranks per context => 4 (default was 1) > > >srun-ENV.2:node444.9683env PSM_RANKS_PER_CONTEXT Number of ranks per context => 4 (default was 1) > > > > I suggest opening a bug with the MPI provider about this issue. Looking at the strace logs, it appears the MPI is locking other users out or possibly it should be using a new UUID for the psm_shm file per user: First run: > [pid 26995] 16:40:20 open("/dev/shm/psm_shm.0fff0fff-0000-0000-0000-0fff0fff0fff", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0700) = 6 mode = 0700 = -rwx------ Second run: > [pid 27028] 16:40:27 open("/dev/shm/psm_shm.0fff0fff-0000-0000-0000-0fff0fff0fff", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0700) = -1 EEXIST (File exists) > [pid 27028] 16:40:27 open("/dev/shm/psm_shm.0fff0fff-0000-0000-0000-0fff0fff0fff", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = -1 EACCES (Permission denied) > > Why we should open an issue to Intel or OpenMPI when everything is working > fine using mpirun to launch MPI applications? I suspect the MPI is detecting the srun and altering the run behavior causing the failure. The environment is being handed down by Slurm as expected. Slurm has no direct control of how PSM is implemented. Cgroups are not active either against /dev/shm or the PSM drivers based on your setup.
Hi Nate, we found a solution to force the IntelMPI library to create different /dev/shm/psm_shm.UUID files when launching with srun. We dumped the environment of all the process launched by Intel mpirun. The process tree is 1. mpirun launches Intel mpiexec.hydra 2. mpiexec.hydra launches srun 3. srun launches Intel pmi_proxy via slurmctld 4. pmi_proxy launches the MPI processes Looking at MPI process environment variables we found the variable I_MPI_HYDRA_UUID and looking at the file in /dev/shm we see that the UUID matched that of the variable. When the MPI processes are launched with srun the variable I_MPI_HYDRA_UUID is not set. So by setting it before running srun we can now force the creation of two different psm files. For example [ibaccare@node476 Programs]$ export I_MPI_HYDRA_UUID=`uuidgen` [ibaccare@node476 Programs]$ srun -n 32 --ntasks-per-node=16 ~ibaccare/Programs/hello_mpi_sleep_impi [afederic@node476 ~]$ export I_MPI_HYDRA_UUID=`uuidgen` [afederic@node476 ~]$ srun -n 32 --ntasks-per-node=16 ~ibaccare/Programs/hello_mpi_sleep_impi [root@master ~]# xdsh node[476,477] ls -ls /dev/shm/psm\* node476: 97224 -rwx------ 1 ibaccare interactive 99557376 Nov 29 17:18 /dev/shm/psm_shm.122f7fba-a77b-47ed-ad43-4887319f8e44 node476: 97224 -rwx------ 1 afederic interactive 99557376 Nov 29 17:18 /dev/shm/psm_shm.3c038879-af69-428d-ac7c-463e85790821 node477: 97224 -rwx------ 1 ibaccare interactive 99557376 Nov 29 17:18 /dev/shm/psm_shm.122f7fba-a77b-47ed-ad43-4887319f8e44 node477: 97224 -rwx------ 1 afederic interactive 99557376 Nov 29 17:18 /dev/shm/psm_shm.3c038879-af69-428d-ac7c-463e85790821 We cannot find a similar solution for OpenMPI. thanks ale
Sorry Nate, I meant slurmstepd in the line below > 3. srun launches Intel pmi_proxy via slurmctld
Ale, (In reply to Cineca HPC Systems from comment #21) > we found a solution to force the IntelMPI library to create different > /dev/shm/psm_shm.UUID files when launching with srun. That is good to know. Thanks for reporting that back. > We cannot find a similar solution for OpenMPI. We are not aware of a similar solution but this does look like a bug that the Open MPI team should look at. We suggest that you get in contact with them for a solution rather than have us look into a workaround. For now, I am going to resolve this issue since the best course of action would be to talk with the OpenMPI and Intel MPI teams. Please reply to reopen this ticket. --Nate