Ticket 8414 - undefined symbols within shared libs for CentOS 8.1 build of slurm 19.05.5
Summary: undefined symbols within shared libs for CentOS 8.1 build of slurm 19.05.5
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 19.05.5
Hardware: Linux Linux
: --- 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-01-29 09:40 MST by stephan.walter
Modified: 2020-02-13 08:41 MST (History)
3 users (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description stephan.walter 2020-01-29 09:40:46 MST
Hi

while building slurm 19.05.5 within a CentOS 8 container was possible after applying the following patch to the slurm.spec file, we see problems with the slurm shared libraries, that are not there if we build the same source within a CentOS 7 container.

diff org/slurm-19.05.5/slurm.spec centos8/slurm-19.05.5/slurm.spec
65c65
< BuildRequires: python
---
> BuildRequires: python3

Some functions like scontrol seems to work   
    root@slurm_master / $  scontrol PING
    Slurmctld(primary) at slurm_master is UP

but others like sinfo not
root@slurm_master / $  sinfo
    sinfo: error: plugin_load_from_file: 
    dlopen(/usr/lib64/slurm/select_cons_res.so): 
    /usr/lib64/slurm/select_cons_res.so: undefined symbol: 
    powercap_get_cluster_current_cap
    sinfo: error: Couldn't load specified plugin name for select/cons_res: 
    Dlopen of plugin file failed
    sinfo: error: plugin_load_from_file: 
    dlopen(/usr/lib64/slurm/select_cons_tres.so): 
    /usr/lib64/slurm/select_cons_tres.so: undefined symbol: 
    powercap_get_cluster_current_cap
    sinfo: error: Couldn't load specified plugin name for select/cons_tres: 
    Dlopen of plugin file failed
    sinfo: error: plugin_load_from_file: 
    dlopen(/usr/lib64/slurm/select_cray_aries.so): 
    /usr/lib64/slurm/select_cray_aries.so: undefined symbol: 
    unlock_slurmctld
    sinfo: error: Couldn't load specified plugin name for select/cray_aries: 
    Dlopen of plugin file failed
    sinfo: error: plugin_load_from_file: 
    dlopen(/usr/lib64/slurm/select_linear.so):
    /usr/lib64/slurm/select_linear.so: undefined symbol: 
    slurm_job_preempt_mode
    sinfo: error: Couldn't load specified plugin name for select/linear: 
    Dlopen of plugin file failed
    sinfo: fatal: Can't find plugin for select/cons_res

Maybe this problem is related to https://bugs.schedmd.com/show_bug.cgi?id=2443

If required we can provide the resulting rpms ~19MB 

Best Regards,

Stephan Walter
Comment 1 Regine Gaudin 2020-02-03 03:42:45 MST
Hi I'm updating this bug as I have the same after build of slurm 19.05.5 within a CentOS 8

sacct
sacct: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/accounting_storage_slurmdbd.so): /usr/lib64/slurm/accounting_storage_slurmdbd.so: undefined symbol: unlock_slurmctld
sacct: error: Couldn't load specified plugin name for accounting_storage/slurmdbd: Dlopen of plugin file failed
sacct: error: cannot create accounting_storage context for accounting_storage/slurmdbd
Slurm unable to initialize storage plugin

slurmctld -D -vv
slurmctld: debug:  Log file re-opened
slurmctld: pidfile not locked, assuming no running daemon
slurmctld: error: Configured MailProg is invalid
slurmctld: slurmctld version 19.05.3-2 started on cluster vm
slurmctld: Munge credential signature plugin loaded
slurmctld: debug:  Munge authentication plugin loaded
slurmctld: Cray/Aries node selection plugin loaded
slurmctld: Linear node selection plugin loaded with argument 4356
slurmctld: Consumable Resources (CR) Node Selection plugin loaded with argument 4356
slurmctld: select/cons_tres loaded with argument 4356
slurmctld: preempt/none loaded
slurmctld: debug:  Checkpoint plugin loaded: checkpoint/none
slurmctld: debug:  AcctGatherEnergy NONE plugin loaded
slurmctld: debug:  AcctGatherProfile NONE plugin loaded
slurmctld: debug:  AcctGatherInterconnect NONE plugin loaded
slurmctld: debug:  AcctGatherFilesystem NONE plugin loaded
slurmctld: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/jobacct_gather_linux.so): /usr/lib64/slurm/jobacct_gather_linux.so: undefined symbol: proctrack_g_get_pids
slurmctld: error: Couldn't load specified plugin name for jobacct_gather/linux: Dlopen of plugin file failed
slurmctld: error: cannot create jobacct_gather context for jobacct_gather/linux
slurmctld: fatal: failed to initialize jobacct_gather plugin

It has an high impact.
Thanks
Comment 2 stephan.walter 2020-02-13 07:34:41 MST
I have also tested the version 20.02.0-0rc1, but still see undefined symbol errors. It is even worse. The slurmctld exit immediately.

slurmctld: slurmctld version 20.02.0-0rc1 started on cluster linux
slurmctld: Munge credential signature plugin loaded
slurmctld: debug:  Munge authentication plugin loaded
slurmctld: Cray/Aries node selection plugin loaded
slurmctld: preempt/none loaded
slurmctld: debug:  AcctGatherEnergy NONE plugin loaded
slurmctld: debug:  AcctGatherProfile NONE plugin loaded
slurmctld: debug:  AcctGatherInterconnect NONE plugin loaded
slurmctld: debug:  AcctGatherFilesystem NONE plugin loaded
slurmctld: debug2: No acct_gather.conf file (/etc/slurm/acct_gather.conf)
slurmctld: debug:  Job accounting gather NOT_INVOKED plugin loaded
slurmctld: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/prep_script.so): /usr/lib64/slurm/prep_script.so: undefined symbol: run_script
slurmctld: error: Couldn't load specified plugin name for prep/script: Dlopen of plugin file failed
slurmctld: error: prep_plugin_init: cannot create prep context for prep/script
slurmctld: fatal: failed to initialize prep plugin
Comment 3 stephan.walter 2020-02-13 08:40:57 MST
Hi, 

I was able to solve the problem with the explanation from https://bugs.schedmd.com/show_bug.cgi?id=2443

The problem is also the hardening. The following patch fixed the problem for me.

309a310,313
> %undefine _hardened_build
> %global _hardened_cflags "-Wl,-z,lazy"
> %global _hardened_ldflags "-Wl,-z,lazy"
>


It would be great if this problem could be resolved without this modification.

Best Regards,

Stephan
Comment 4 stephan.walter 2020-02-13 08:41:45 MST
I have forgotten to mention the modified file. 

slurm.spec