Bug 14565

Summary: Building Slurm on EL 9 distros
Product: Slurm Reporter: Chrysovalantis Paschoulas <c.paschoulas>
Component: Build System and PackagingAssignee: Tim McMullan <mcmullan>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 21.08.8   
Hardware: Linux   
OS: Linux   
Site: Jülich Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 22.05.3 23.02pre1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Chrysovalantis Paschoulas 2022-07-19 01:53:22 MDT
We couldn't build Slurm on Rocky 9 (EL 9 based distro) because LTO is enabled and there were undefined symbol errors at link time, like these:
```
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of .libs/bitstring.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.0.20392c668d20c6cc37ece527871fed88]' of .libs/bitstring.o
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of .libs/bitstring.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of .libs/bitstring.o
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of .libs/bitstring.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.config.h.8.ee6033b670b40b241d00b6031e34bef8]' of .libs/bitstring.o
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of .libs/bitstring.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.features.h.19.94fa84bfdc4fa1f32c117154c6101507]' of .libs/bitstring.o
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of .libs/bitstring.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.wordsize.h.4.baf119258a1e53d8dba67ceac44ab6bc]' of .libs/bitstring.o
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of .libs/bitstring.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.features.h.395.c91ebb1d3ab5e81df8f0ef2b8e5bffdc]' of .libs/bitstring.o
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of .libs/bitstring.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.cdefs.h.19.e56fcfbda476bcaa644cad6c858886ea]' of .libs/bitstring.o
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of .libs/bitstring.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.cdefs.h.556.0c88d1957e1b03d016e873dfd9ca7b60]' of .libs/bitstring.o
...
```
Comment 1 Chrysovalantis Paschoulas 2022-07-19 09:18:31 MDT
I guess the fix it to disable lto in the spec file.

We don't need lto enabled, right?
Comment 2 Tim McMullan 2022-07-19 12:24:25 MDT
(In reply to Chrysovalantis Paschoulas from comment #1)
> I guess the fix it to disable lto in the spec file.
> 
> We don't need lto enabled, right?

I was going to ask if this was an rpmbuild thing or if you found this another way.

Disabling lto in the spec file would be a way around it if it was an rpmbuild thing.

I'm going to look into exactly why lto is breaking things on *el9 and see what all the potential options for fixing it are.
Comment 3 Chrysovalantis Paschoulas 2022-07-20 02:50:21 MDT
(In reply to Tim McMullan from comment #2)
> (In reply to Chrysovalantis Paschoulas from comment #1)
> > I guess the fix it to disable lto in the spec file.
> > 
> > We don't need lto enabled, right?
> 
> I was going to ask if this was an rpmbuild thing or if you found this
> another way.
> 
> Disabling lto in the spec file would be a way around it if it was an
> rpmbuild thing.
> 
> I'm going to look into exactly why lto is breaking things on *el9 and see
> what all the potential options for fixing it are.

Great thanks!

FYI we could build Slurm on el9 by disabling lto in spec file with:
```
%define _lto_cflags %{nil}
```

Is this the best approach to solve this issue?
Comment 5 Tim McMullan 2022-07-20 10:23:14 MDT
(In reply to Chrysovalantis Paschoulas from comment #3)
> (In reply to Tim McMullan from comment #2)
> > (In reply to Chrysovalantis Paschoulas from comment #1)
> > > I guess the fix it to disable lto in the spec file.
> > > 
> > > We don't need lto enabled, right?
> > 
> > I was going to ask if this was an rpmbuild thing or if you found this
> > another way.
> > 
> > Disabling lto in the spec file would be a way around it if it was an
> > rpmbuild thing.
> > 
> > I'm going to look into exactly why lto is breaking things on *el9 and see
> > what all the potential options for fixing it are.
> 
> Great thanks!
> 
> FYI we could build Slurm on el9 by disabling lto in spec file with:
> ```
> %define _lto_cflags %{nil}
> ```
> 
> Is this the best approach to solve this issue?

My fix looks like:
> %undefine _lto_cflags

But the result is really the same :)

There is some documentation from redhat on alternate ways of handling this problem and I'm exploring those before I officially propose a solution here, but I think its safe to proceed using either method of disabling lto in the spec file.
Comment 10 Tim McMullan 2022-07-28 08:11:16 MDT
While doing more digging on this it looks like %define _lto_cflags %{nil} is actually the preferred way to do this.  We've landed a patch that should be in 22.05.3 to disable lto.  https://github.com/SchedMD/slurm/commit/85efa455

Thank you for pointing this out!
--Tim