Bug 12376 - PMIx auto-detection not working on Ubuntu
Summary: PMIx auto-detection not working on Ubuntu
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Build System and Packaging (show other bugs)
Version: 21.08.0
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Tim McMullan
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-08-27 15:21 MDT by Felix Abecassis
Modified: 2022-04-28 12:28 MDT (History)
5 users (show)

See Also:
Site: NVIDIA (PSLA)
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 22.05pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Abecassis 2021-08-27 15:21:38 MDT
Inside a Ubuntu 20.04 docker container, libpmix-dev is installed:
$ dpkg -l | grep pmix
ii  libpmix-dev:amd64                  3.1.5-1                           amd64        Development files for the PMI Exascale library
ii  libpmix2:amd64                     3.1.5-1                           amd64        Process Management Interface (Exascale) library

$ ./configure
[...]

$ grep HAVE_PMIX_ config.log
HAVE_PMIX_FALSE=''
HAVE_PMIX_TRUE='#'

$ make && make install
[...]

$ srun --mpi=list
srun: MPI types are...
srun: cray_shasta
srun: none
srun: pmi2


PMIx support was not built automatically despite the presence of libpmix-dev on Ubuntu. The problem doesn't seem to happen on CentOS.

It's likely caused by Ubuntu putting pmix.h in a different location that is not in the normal include path for the compiler:
$ dpkg -L libpmix-dev | grep pmix.h
/usr/lib/x86_64-linux-gnu/pmix/include/pmix.h


Using "./configure --with-pmix=/usr/lib/x86_64-linux-gnu/pmix" solves the issue, but it's easy to miss the fact that PMIx support was not enabled during a build. Hopefully it should be a simple fix.

Note that OpenMPI 5.0 (due this year) will remove PMI2 support, so PMIx support when building Slurm will become a requirement for OpenMPI users:
https://github.com/open-mpi/ompi/blob/9fecace67f26752d4b86a92c5ee6443bd97f79b7/config/ompi_deleted_options.m4#L32-L42
Comment 1 Felix Abecassis 2021-08-27 16:00:59 MDT
I was about to say that the build system could rely on "pkg-config", but libpmix-dev in Ubuntu 20.04 does not ship pmix.pc; and libpmix-dev in Ubuntu 21.04 ships a bogus pmix.pc which does not use the correct paths. Debian Sid has a working pmix.pc though.
Comment 3 Tim McMullan 2021-08-30 10:15:29 MDT
Hi Felix,

I'm looking at a fix for pmix auto-detection on ubuntu, but I also wanted to make you aware of a new behavior in 21.08.

If you specify "--with-pmix" without a path and we can't find it, configure will simply fail.  In the past this was not the case and even passing the option wouldn't result in "find the library or fail".  This is true if you specify a path that doesn't contain pmix as well.

Thanks!
--Tim
Comment 4 Felix Abecassis 2021-08-30 10:19:01 MDT
Ah, that's a welcome change that should lead to less confusion, thanks!

I've filed a bug against Ubuntu and the pkg-config support of the pmix package will be fixed in Ubuntu 21.10 (22.04 for the next LTS), in case you want to rely on pkg-config too.

https://bugs.launchpad.net/ubuntu/+source/pmix/+bug/1941927
Comment 8 Felix Abecassis 2021-08-31 18:53:56 MDT
On Ubuntu 21.04, I can't compile Slurm 21.08 with PMIx support when using the libpmix-dev package from the distro.

I used the following Dockerfile:
```
ROM ubuntu:21.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y --no-install-recommends \
        curl \
        ca-certificates \
        build-essential \
        autoconf \
        automake \
        libtool \
        libpmix-dev \
        libhwloc-dev \
        libmysqlclient-dev \
        libcurl4-gnutls-dev \
        libmunge-dev \
        liblua5.3-dev lua5.3 \
        python3 \
        libyaml-dev && \
    rm -rf /var/lib/apt/lists/*

RUN cd /usr/local/src && \
    curl --proto '=https' -fSsL https://github.com/schedmd/slurm/archive/slurm-21-08-0-1.tar.gz | tar -xz && \
    cd slurm-* && \
    ./configure --prefix=/usr/local --sysconfdir=/etc/slurm --disable-debug --with-pmix=/usr/lib/x86_64-linux-gnu/pmix2 && \
    make -j && \
    make install-strip && \
    ldconfig

RUN mkdir -p /etc/slurm && \
    printf 'ClusterName=test\nSlurmctldHost=localhost\n' > /etc/slurm/slurm.conf
```

./configure doesn't complain about the pmix path, as it can find pmix.h in this path, but later it silently discards this path as it seems to be confused by the fact that it's PMIx v4. So looks like there are still lingering issues on this side.

Inside the container:
$ srun --mpi=list
srun: MPI types are...
srun: cray_shasta
srun: none
srun: pmi2

$ grep HAVE_PMIX config.log
HAVE_PMIX_FALSE=''
HAVE_PMIX_TRUE='#'
HAVE_PMIX_V1_FALSE=''
HAVE_PMIX_V1_TRUE='#'
HAVE_PMIX_V2_FALSE=''
HAVE_PMIX_V2_TRUE='#'
HAVE_PMIX_V3_FALSE=''
HAVE_PMIX_V3_TRUE='#'
HAVE_PMIX_V4_FALSE='#'
HAVE_PMIX_V4_TRUE=''

$ ls /usr/local/lib/slurm/mpi_*.so
/usr/local/lib/slurm/mpi_cray_shasta.so  /usr/local/lib/slurm/mpi_none.so  /usr/local/lib/slurm/mpi_pmi2.s


HAVE_PMIX_V4_TRUE and HAVE_PMIX_FALSE at the same time.

A test for v4 probably needs to be added here:
https://github.com/SchedMD/slurm/blob/c9ccc37e6db7baad002bd1ef8fb97bae812ddf13/auxdir/x_ac_pmix.m4#L189-L191
I patched it manually, and mpi_pmix_v4.so was generated, but somehow it still doesn't show in "srun --mpi=list", so something else might be missing.
Comment 9 Artem Polyakov 2021-08-31 18:59:27 MDT
Please see
https://bugs.schedmd.com/show_bug.cgi?id=12396
Comment 10 Felix Abecassis 2021-08-31 19:03:58 MDT
This other issue mentions PMIx > 4 (so v5), are you saying it also fixes PMIx v4 support?
Comment 11 Artem Polyakov 2021-08-31 21:00:29 MDT
It should, v4 isn’t supported in the current codebase.
Comment 12 Artem Polyakov 2021-08-31 21:03:02 MDT
We were working on adding v4 support + some functionality for quite a while:
https://bugs.schedmd.com/show_bug.cgi?id=7263

But that one unfortunately got stuck, so the fix above only allows building with PMIx v4 without new functionality
Comment 21 Tim McMullan 2022-04-28 11:07:03 MDT
Hey Felix,

This definitely was caught up in some other PMIx work.  Some of that has settled out for 22.05, so we've now landed a fix for this in 22.05.

I did add something that should work if the pkg-config data is correct in as well.

Thanks, and let me know if you have any issues!  I'll resolve this for now.
--Tim
Comment 22 Felix Abecassis 2022-04-28 12:28:56 MDT
I confirm the autodetection is now working fine on the "master" branch with Ubuntu 22.04 and libpmix-dev=4.1.2-2ubuntu1

Thank you!