Summary: | sshare gives error undefined symbol: sort_part_tier | ||
---|---|---|---|
Product: | Slurm | Reporter: | Bjørn-Helge Mevik <b.h.mevik> |
Component: | User Commands | Assignee: | Alejandro Sanchez <alex> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | alex |
Version: | 17.11.9 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Sigma2 Norway | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Attachments: |
Log of rpmbuild command
Main slurm config file Node definition file |
Created attachment 7680 [details]
Main slurm config file
Created attachment 7681 [details]
Node definition file
This was fixed in 17.11.9-2 https://github.com/SchedMD/slurm/commit/21d2ab6ed16 *** This ticket has been marked as a duplicate of ticket 5579 *** Sorry, went too fast. You're already on 17.11.9-2. Can you try with applying this on top of that? https://github.com/SchedMD/slurm/commit/67a82c369a7530ce7838e6294973af0082d8905b which will be in 17.11.10? (In reply to Alejandro Sanchez from comment #4) > Sorry, went too fast. You're already on 17.11.9-2. Can you try with applying > this on top of that? > > https://github.com/SchedMD/slurm/commit/ > 67a82c369a7530ce7838e6294973af0082d8905b > > which will be in 17.11.10? and btw I went so fast that wanted to reference this bug when marking as duplicate, where the undefined symbol missing problem is tracked and the .10 mentioned patch is discussed in a private comment: https://bugs.schedmd.com/show_bug.cgi?id=5552 I tried the patch now, and it worked fine! Thanks for the quick response! :D (In reply to Bjørn-Helge Mevik from comment #6) > I tried the patch now, and it worked fine! Thanks for the quick response! :D Great, thanks for the feedback. *** This ticket has been marked as a duplicate of ticket 5552 *** |
Created attachment 7679 [details] Log of rpmbuild command After upgrading from 17.11.9 to 17.11.9-2 on our test cluster, whenever we run the sshare command, we get the error sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed sshare: error: cannot create priority context for priority/multifactor Like so: # sshare Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- root 1.000000 927728 1.000000 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed sshare: error: cannot create priority context for priority/multifactor 0.000000 root root 1 0.333333 273705 0.295027 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed sshare: error: cannot create priority context for priority/multifactor 0.000000 normal 1 0.333333 654023 0.704973 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed sshare: error: cannot create priority context for priority/multifactor 0.000000 nn9999k 1 0.333333 654023 0.704973 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed sshare: error: cannot create priority context for priority/multifactor 0.000000 optimist 1 0.333333 0 0.000000 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed sshare: error: cannot create priority context for priority/multifactor 0.000000 nn9999o 1 0.333333 0 0.000000 sshare: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/priority_multifactor.so): /usr/lib64/slurm/priority_multifactor.so: undefined symbol: sort_part_tier sshare: error: Couldn't load specified plugin name for priority/multifactor: Dlopen of plugin file failed sshare: error: cannot create priority context for priority/multifactor 0.000000 We have the following RPMs installed (on the controller node): # rpm -qa|grep slurm|sort slurm-17.11.9-2.el7.x86_64 slurm-devel-17.11.9-2.el7.x86_64 slurm-libpmi-17.11.9-2.el7.x86_64 slurm-perlapi-17.11.9-2.el7.x86_64 slurm-slurmctld-17.11.9-2.el7.x86_64 slurm-slurmdbd-17.11.9-2.el7.x86_64 Are we missing some RPMs? We build the RPMs with MKDIR_P='mkdir -p' rpmbuild -tb --clean --rmsource --without zlib --without debug --with lua slurm-17.11.9-2.tar.bz2 I've attached a log of the output of this command. I've also attached the slurm config file for the cluster. For what it's worth, I saw in the NEWS file of 17.11.9-2 that there was a fix regarding sorting of multi-partition jobs. In our config, we have two partition with different PriorityTier (normal and optimist). Could that be related?