Ticket 4222 - Add documentation for Slurm pmix support
Summary: Add documentation for Slurm pmix support
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Contributions (show other tickets)
Version: 17.02.7
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Alejandro Sanchez
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-10-04 19:23 MDT by Chris Samuel
Modified: 2017-12-19 03:56 MST (History)
1 user (show)

See Also:
Site: University of Melbourne
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 17.11.1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Chris Samuel 2017-10-04 19:23:58 MDT
Hi there,

I'm helping out the Research Platforms folks here and they're upgrading from 16.5.x to 17.02.7.   As part of this I thought it would be useful to get PMIx support enabled, but it looks like that is broken.

As posted to the mailing list (in case Ralph saw it and could shed some light for me):

PMIX v1.2.2: Slurm complains and tells me it wants v2.

PMIX v2.0.1: Slurm can't find it because the header files are not where it is looking for them, and when I do a symlink hack to make PMIX detection work it then fails to compile, saying:

/bin/sh ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../.. -I../../../../slurm  -I../../../.. -I../../../../src/common -I/usr/include -I/usr/local/pmix/latest/include -DHAVE_PMIX_VER=2   -g -O0 -pthread -Wall -g -O0 -fno-strict-aliasing -MT mpi_pmix_v2_la-pmixp_client.lo -MD -MP -MF .deps/mpi_pmix_v2_la-pmixp_client.Tpo -c -o mpi_pmix_v2_la-pmixp_client.lo `test -f 'pmixp_client.c' || echo './'`pmixp_client.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../.. -I../../../../slurm -I../../../.. -I../../../../src/common -I/usr/include -I/usr/local/pmix/latest/include -DHAVE_PMIX_VER=2 -g -O0 -pthread -Wall -g -O0 -fno-strict-aliasing -MT mpi_pmix_v2_la-pmixp_client.lo -MD -MP -MF .deps/mpi_pmix_v2_la-pmixp_client.Tpo -c pmixp_client.c  -fPIC -DPIC -o .libs/mpi_pmix_v2_la-pmixp_client.o
pmixp_client.c: In function ‘_set_procdatas’:
pmixp_client.c:468:24: error: request for member ‘size’ in something not a structure or union
   kvp->value.data.array.size = count;
                        ^
pmixp_client.c:482:24: error: request for member ‘array’ in something not a structure or union
   kvp->value.data.array.array = (pmix_info_t *)info;
                        ^
make[4]: *** [mpi_pmix_v2_la-pmixp_client.lo] Error 1


So I'm guessing that either I'm missing something (the documentation for PMIX in Slurm seems pretty much non-existent) or this is broken.

Any ideas?

All the best,
Chris
Comment 2 Alejandro Sanchez 2017-10-05 04:10:25 MDT
Hi Chris,

(In reply to Chris Samuel from comment #0)
> Hi there,
> 
> I'm helping out the Research Platforms folks here and they're upgrading from
> 16.5.x to 17.02.7.   As part of this I thought it would be useful to get
> PMIx support enabled, but it looks like that is broken.

I've just tried to build locally and it worked for me. I've used the following components version:
pmix v1.2
slurm 17.02.7
gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
 
> As posted to the mailing list (in case Ralph saw it and could shed some
> light for me):
> 
> PMIX v1.2.2: Slurm complains and tells me it wants v2.

slurm 17.02.7 doesn't complain for me about that. 

How are you building pmix and slurm?
Can I see the configure options / paths you use?
Which compiler / version are you using?
 
> PMIX v2.0.1: Slurm can't find it because the header files are not where it
> is looking for them, and when I do a symlink hack to make PMIX detection
> work it then fails to compile, saying:

Slurm does not support pmix v2 yet. There's bug 4131 open to track this.

> So I'm guessing that either I'm missing something (the documentation for
> PMIX in Slurm seems pretty much non-existent) or this is broken.

I wrote an internal document based upon team advice on how to build pmix, slurm and optionally ompi too. I'll update the schedmd webpage somewhere in the mpi guide to transfer that internal knowledge publicly.
 
> Any ideas?

In the meantime, this is an excerpt of the procedure indicated in that document and worked for me. You can try to follow it yourself and see if that helps or report the steps you took and where you got stuck otherwise:

1. Install the following packages (doing so through APT worked well for me):

libevent-dev
libhwloc-dev
flex

2. Install pmix:

alex@ibiza:~/git$ git clone git@github.com:pmix/master.git pmix
alex@ibiza:~/git$ cd pmix
alex@ibiza:~/git/pmix$ git branch -a
alex@ibiza:~/git/pmix$ git checkout v1.2
alex@ibiza:~/git/pmix$ ./autogen.sh
alex@ibiza:~/git/pmix$ cd ..
alex@ibiza:~/git$ mkdir pmix_build
alex@ibiza:~/git$ cd pmix_build
alex@ibiza:~/git/pmix_build$ mkdir ../pmix_install
alex@ibiza:~/git/pmix_build$ ../pmix/configure --prefix=/home/alex/git/pmix_install
alex@ibiza:~/git/pmix_build$ make -j install >/dev/null
alex@ibiza:~/git/pmix_build$ cd ../pmix_install
alex@ibiza:~/git/pmix_install$ ls
include  lib  share
alex@ibiza:~/git/pmix_install$

3. Install slurm:

alex@ibiza:~/slurm/17.02/ibiza/slurm$ ../../slurm/configure \
--prefix=/home/alex/slurm/17.02/ibiza --enable-multiple-slurmd \
--enable-developer --enable-memory-leak-debug \
--with-pmix=/home/alex/git/pmix_install
...
checking for hwloc installation... /usr
checking for pmix installation... /home/alex/git/pmix_install
...
alex@ibiza:~/slurm/17.02/ibiza/slurm$ make -j install > /dev/null
alex@ibiza:~/slurm/17.02/ibiza/slurm$ ls -l ../lib/slurm | grep pmi
-rw-r--r-- 1 alex alex  979866 sep  8 13:19 mpi_pmi2.a
-rwxr-xr-x 1 alex alex     962 sep  8 13:19 mpi_pmi2.la
-rwxr-xr-x 1 alex alex  385960 sep  8 13:19 mpi_pmi2.so
lrwxrwxrwx 1 alex alex      16 sep  8 13:19 mpi_pmix.so -> ./mpi_pmix_v1.so
-rw-r--r-- 1 alex alex  907236 sep  8 13:19 mpi_pmix_v1.a
-rwxr-xr-x 1 alex alex    1118 sep  8 13:19 mpi_pmix_v1.la
-rwxr-xr-x 1 alex alex  400256 sep  8 13:19 mpi_pmix_v1.so
alex@ibiza:~/slurm/17.02/ibiza/slurm$ cd contribs/pmi2
alex@ibiza:~/slurm/17.02/ibiza/slurm/contribs/pmi2$ make install
alex@ibiza:~/slurm/17.02/ibiza/slurm/contribs/pmi2$ cd ../../../lib
alex@ibiza:~/slurm/17.02/ibiza/lib$ ls -l | grep pmi2
-rw-r--r-- 1 alex alex   114512 sep  8 16:46 libpmi2.a
-rwxr-xr-x 1 alex alex      958 sep  8 16:46 libpmi2.la
lrwxrwxrwx 1 alex alex       16 sep  8 16:46 libpmi2.so -> libpmi2.so.0.0.0
lrwxrwxrwx 1 alex alex       16 sep  8 16:46 libpmi2.so.0 -> libpmi2.so.0.0.0
-rwxr-xr-x 1 alex alex    81976 sep  8 16:46 libpmi2.so.0.0.0
alex@ibiza:~/slurm/17.02/ibiza/lib$
 
> All the best,
> Chris
Comment 3 Chris Samuel 2017-10-05 04:11:02 MDT
Hi there, thanks for your email.

I now work part time at Melbourne Bioinformatics (MB, formerly known as VLSCI).

Currently I work Monday, Wednesday and Thursday.

If your email is about the MB supercomputers then can you
please resend it to the VLSCI helpdesk at:

    help@vlsci.org.au

For requests to join the Beowulf list please wait for my response.

For other aspects of MB please see our website for details:

https://www.melbournebioinformatics.org.au/contact-us/

Otherwise I will attend to it on my return.

All the best,
Chris
--
 Christopher Samuel        Senior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: samuel@unimelb.edu.au Phone: +61 (0)3 903 55545
Comment 4 Alejandro Sanchez 2017-10-05 04:13:35 MDT
One note. When configuring Slurm, you don't need the following options:

--enable-multiple-slurmd --enable-developer --enable-memory-leak-debug
Comment 5 Chris Samuel 2017-10-08 22:36:25 MDT
Hi there,

I've just had a chance to try again and now it works.

I suspect it might be because of fixing a problem I hit trying to get PMIx v2 installed which was the people who built this system didn't install a C++ compiler which PMIx v2 failed during configure saying that pthreads wasn't available (took a while to track that one down).

Anyway I've just done a test configure/build and it worked fine now so I think this was a local system config issue.  Hey ho!   Too late for their outage window so we'll stick with PMI2 for now.

Sorry to bother you all..

All the best,
Chris
Comment 6 Alejandro Sanchez 2017-10-11 03:53:59 MDT
(In reply to Chris Samuel from comment #5)
> Hi there,
> 
> I've just had a chance to try again and now it works.
> 
> I suspect it might be because of fixing a problem I hit trying to get PMIx
> v2 installed which was the people who built this system didn't install a C++
> compiler which PMIx v2 failed during configure saying that pthreads wasn't
> available (took a while to track that one down).
> 
> Anyway I've just done a test configure/build and it worked fine now so I
> think this was a local system config issue.  Hey ho!   Too late for their
> outage window so we'll stick with PMI2 for now.
> 
> Sorry to bother you all..
> 
> All the best,
> Chris

No problem. I'll keep this open to add some guidance in our webpage on how to build Slurm with pmix support.
Comment 7 Chris Samuel 2017-10-11 03:54:28 MDT
Hi there, thanks for your email.

I now work part time at Melbourne Bioinformatics (MB, formerly known as VLSCI).

Normally I work Monday, Wednesday and Thursday but
this week I am working Monday to Wednesday and away
Thursday & Friday.

If your email is about the MB supercomputers then can you
please resend it to the VLSCI helpdesk at:

    help@vlsci.org.au

For requests to join the Beowulf list please wait for my response.

For other aspects of MB please see our website for details:

https://www.melbournebioinformatics.org.au/contact-us/

Otherwise I will attend to it on my return.

All the best,
Chris
--
 Christopher Samuel        Senior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: samuel@unimelb.edu.au Phone: +61 (0)3 903 55545
Comment 8 Chris Samuel 2017-12-18 05:14:04 MST
Hi there, thanks for your email.

I'm afraid that I'm leaving the University of Melbourne to take a position at Swinburne University of Technology Centre for Astrophysics and Supercomputing as part of the ARC Centre of Excellence for Gravitational Wave Discovery (OzGrav).

If your email is about the MB supercomputers then can you
please resend it to the VLSCI helpdesk at:

    help@vlsci.org.au

For management issues please contact Andrew Isaac at:

    aisaac@unimelb.edu.au

For other aspects of MB please see our website for details:

https://www.melbournebioinformatics.org.au/contact-us/

All the best,
Chris
--
 Christopher Samuel        Senior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: samuel@unimelb.edu.au Phone: +61 (0)3 903 55545
Comment 11 Alejandro Sanchez 2017-12-19 03:55:26 MST
Documentation added in following 17.11.1+ commit:

commit 34209c471a29aeb5cf44e3521c9172c30f4b8dbb (HEAD -> slurm-17.11, origin/slurm-17.11)
Author:     Alejandro Sanchez <alex@schedmd.com>
AuthorDate: Tue Dec 19 11:53:34 2017 +0100
Commit:     Alejandro Sanchez <alex@schedmd.com>
CommitDate: Tue Dec 19 11:53:34 2017 +0100

    Docs - add Slurm/PMIx and OpenMPI build notes to the mpi_guide page.
    
    Bug 4222.