Ticket 5615 - sshare gives error undefined symbol: sort_part_tier
Summary: sshare gives error undefined symbol: sort_part_tier
Status: RESOLVED DUPLICATE of ticket 5552
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 17.11.9
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Alejandro Sanchez
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-08-23 07:15 MDT by Bjørn-Helge Mevik
Modified: 2018-08-24 02:49 MDT (History)
1 user (show)

See Also:
Site: Sigma2 Norway
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Log of rpmbuild command (1.29 MB, text/x-log)
2018-08-23 07:15 MDT, Bjørn-Helge Mevik
Details
Main slurm config file (5.16 KB, text/plain)
2018-08-23 07:16 MDT, Bjørn-Helge Mevik
Details
Node definition file (1.63 KB, text/plain)
2018-08-23 07:17 MDT, Bjørn-Helge Mevik
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Bjørn-Helge Mevik 2018-08-23 07:15:22 MDT
Created attachment 7679 [details]
Log of rpmbuild command

After upgrading from 17.11.9 to 17.11.9-2 on our test cluster, whenever we run the sshare command, we get the error

sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier
sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed
sshare: error: cannot create priority context for priority/multifactor

Like so:

# sshare
             Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
root                                          1.000000      927728      1.000000 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier
sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed
sshare: error: cannot create priority context for priority/multifactor
  0.000000 
 root                      root          1    0.333333      273705      0.295027 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier
sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed
sshare: error: cannot create priority context for priority/multifactor
  0.000000 
 normal                                  1    0.333333      654023      0.704973 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier
sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed
sshare: error: cannot create priority context for priority/multifactor
  0.000000 
  nn9999k                                1    0.333333      654023      0.704973 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier
sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed
sshare: error: cannot create priority context for priority/multifactor
  0.000000 
 optimist                                1    0.333333           0      0.000000 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier
sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed
sshare: error: cannot create priority context for priority/multifactor
  0.000000 
  nn9999o                                1    0.333333           0      0.000000 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier
sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed
sshare: error: cannot create priority context for priority/multifactor
  0.000000 

We have the following RPMs installed (on the controller node):
# rpm -qa|grep slurm|sort
slurm-17.11.9-2.el7.x86_64
slurm-devel-17.11.9-2.el7.x86_64
slurm-libpmi-17.11.9-2.el7.x86_64
slurm-perlapi-17.11.9-2.el7.x86_64
slurm-slurmctld-17.11.9-2.el7.x86_64
slurm-slurmdbd-17.11.9-2.el7.x86_64

Are we missing some RPMs?

We build the RPMs with
MKDIR_P='mkdir -p' rpmbuild -tb --clean --rmsource --without zlib --without debug --with lua slurm-17.11.9-2.tar.bz2

I've attached a log of the output of this command.  I've also attached the slurm config file for the cluster.

For what it's worth, I saw in the NEWS file of 17.11.9-2 that there was a fix regarding sorting of multi-partition jobs.  In our config, we have two partition with different PriorityTier (normal and optimist).  Could that be related?
Comment 1 Bjørn-Helge Mevik 2018-08-23 07:16:44 MDT
Created attachment 7680 [details]
Main slurm config file
Comment 2 Bjørn-Helge Mevik 2018-08-23 07:17:19 MDT
Created attachment 7681 [details]
Node definition file
Comment 3 Alejandro Sanchez 2018-08-23 07:17:32 MDT
This was fixed in 17.11.9-2

https://github.com/SchedMD/slurm/commit/21d2ab6ed16

*** This ticket has been marked as a duplicate of ticket 5579 ***
Comment 4 Alejandro Sanchez 2018-08-23 07:21:22 MDT
Sorry, went too fast. You're already on 17.11.9-2. Can you try with applying this on top of that?

https://github.com/SchedMD/slurm/commit/67a82c369a7530ce7838e6294973af0082d8905b

which will be in 17.11.10?
Comment 5 Alejandro Sanchez 2018-08-23 07:48:15 MDT
(In reply to Alejandro Sanchez from comment #4)
> Sorry, went too fast. You're already on 17.11.9-2. Can you try with applying
> this on top of that?
> 
> https://github.com/SchedMD/slurm/commit/
> 67a82c369a7530ce7838e6294973af0082d8905b
> 
> which will be in 17.11.10?

and btw I went so fast that wanted to reference this bug when marking as duplicate, where the undefined symbol missing problem is tracked and the .10 mentioned patch is discussed in a private comment:

https://bugs.schedmd.com/show_bug.cgi?id=5552
Comment 6 Bjørn-Helge Mevik 2018-08-24 01:36:43 MDT
I tried the patch now, and it worked fine!  Thanks for the quick response! :D
Comment 7 Alejandro Sanchez 2018-08-24 02:49:25 MDT
(In reply to Bjørn-Helge Mevik from comment #6)
> I tried the patch now, and it worked fine!  Thanks for the quick response! :D

Great, thanks for the feedback.

*** This ticket has been marked as a duplicate of ticket 5552 ***