Ticket 4627

Summary: mpi/pmix plugin needs to load libpmix symbols dynamically
Product: Slurm Reporter: Philip Kovacs <pkdevel>
Component: ContributionsAssignee: Unassigned Developer <dev-unassigned>
Status: OPEN --- QA Contact:
Severity: 5 - Enhancement    
Priority: --- CC: artpol84, cvalvin, karasev.b, mschmit, sts
Version: 17.11.2   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=2443
https://bugs.schedmd.com/show_bug.cgi?id=7806
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm_pmix_symbols.patch
slurm_pmix_symbols.patch
harden.tar.bz2
PMIx plugin patch: fixes symbol issue
harden.tar.bz2 patch

Description Philip Kovacs 2018-01-13 19:43:05 MST
Created attachment 5918 [details]
slurm_pmix_symbols.patch

ref: git show c539d34684 
ref: git show 05f239055b

The plugin mpi/pmix was changed with the above commits so that it uses dlopen on  libpmix.so instead of directly linking to it.

The problem is that the pmix symbols are still referenced in the plugin code as if the library was directly linked to mpi_pmix.so.   This renders the mpi_pmix.so plugin unloadable in hardened environments since the pmix symbols are only available after the dlopen occurs.  

The proper way to handle this is to dlopen libpmix and then dlsym each symbol needed.

Patch attached.
Comment 1 Philip Kovacs 2018-01-13 19:53:12 MST
Created attachment 5919 [details]
slurm_pmix_symbols.patch
Comment 2 Philip Kovacs 2018-01-15 18:43:27 MST
Created attachment 5922 [details]
harden.tar.bz2

Also attached is a small sample program I wrote to demonstrate the issue w.r.t. hardened linker flags.
Comment 3 Artem Polyakov 2018-01-16 14:32:58 MST
Philip,
We talked to Danny and we think we have a better solution for this. Need to check it and we will update shortly.
The main reason of dlopen’ing Pmix library is that we need to make sure that symbols in it have global visibility. This is important for Pmix components to work.
Danny says they were solving it by linking directly and then additionally dlopening with global flag.
If this will work as expected the fix would be much simpler: just change compile flags to link against Pmix.

One question: can you clarify term hardened environments?
Comment 4 Philip Kovacs 2018-01-16 15:12:23 MST
I attached a sample program yesterday (harden.tar.bz) that very clearly documents the hardened environment problem in simplest terms.  Please take a moment look at it.  

tar -xjf harden.tar.bz2
cd harden
make
./client

Essentially, LDFLAGS=-Wl,z,relro,-z,now will disallow any attempt to dlopen using RTDL_LAZY and instead use RTDL_NOW.

You should verify that plugins are built in this manner with 

readelf -d <plugin.so> | grep BIND_NOW

If you see BIND_NOW, the plugin cannot be loaded with unresolved symbols.

I hope that you will look at this problem generally and take steps toward hardening slurm.  

On a different bug I supplied an extensive patch to weaken certain symbols that exist in one context but not another.  

For example, both srun and slurmd load the select plugins, but only one of the two executables resolves all of the symbols in certain select plugin.
Comment 5 Artem Polyakov 2018-01-16 15:39:19 MST
Thank you Philip.
We will carefully check all of the info you provided and reply ASAP (I’m traveling now so haven’t checked tarball from my phone).
We will check with you before applying the changes.
Comment 6 Boris Karasev 2018-01-17 09:01:40 MST
Created attachment 5930 [details]
PMIx plugin patch: fixes symbol issue
Comment 7 Boris Karasev 2018-01-17 09:02:41 MST
Created attachment 5931 [details]
harden.tar.bz2 patch
Comment 8 Boris Karasev 2018-01-17 09:07:08 MST
Philip,
I looked the attached "harden" sample program, and applied to it the fix that mentioned above: direct plugin linking during compile. This fix is resolve the symbols problem for "harden".
Also I reproduced symbols issue for PMIx plugin by built the plugin with "LDFLAGS=-Wl,-z,relro,-z,now". Then applied the similar fix to the plugin, which allowed to avoid the symbol issue for PMIx plugin.
Please see the "harden" patch and PMIx pluging patch in attachment.
Can you confirm those patches are resolve the issue? Please don't forget rerun ./autogen after Slurm patching.
Comment 9 Philip Kovacs 2018-01-17 11:27:02 MST
Boris,

I applied your patch and the mpi/pmix plugin is able to load and run w/hardened flags now.  

$ srun --mpi=list hostname
srun: MPI types are...
srun: none
srun: openmpi
srun: pmi2
srun: pmix
srun: pmix_v1

$ srun --mpi=pmix hostname
p28vm1

$ readelf -d mpi_pmix_v1.so | grep BIND_NOW
 0x0000000000000018 (BIND_NOW) 


It's a little unusual to have to adjust the visibility of the symbols in libpmix with dlopen and RTLD_GLOBAL.  I suppose libpmix on some systems was built with the local visibility settings and you're forced to change them... 

All good here now.  Thanks much for your help.
Comment 10 Philip Kovacs 2018-01-17 11:32:15 MST
Also,  this bug https://bugs.schedmd.com/show_bug.cgi?id=4361 is the exact same issue (relro, now) for a different plugin.   In that case, a link to libnuma is missing.
Comment 11 Tim Wickberg 2018-01-17 13:47:01 MST
Hey folks -

At the moment, SchedMD has no customers pushing for these hardened linking flags, and we internally have no interest in applying these right now.

Even if we do explore these changes, they are significant enough that they'd need to wait for an upcoming major release for consideration.

If we do tackle this, PMIx is not the only affected library, and work will need to be done to ensure all similarly affected libraries are handled (e.g., UCX, Cray PMI), rather than trying to handle this piecemeal.

I'm retagging this as an unassigned enhancement request, to possibly be explored at a later point in time.

- Tim