Summary: | slurm RPM dependency on libnvidia-ml.so | ||
---|---|---|---|
Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
Component: | Build System and Packaging | Assignee: | Tim Wickberg <tim> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | CC: | bmundim, cinek, luca.capello, lyn.gerner, novosirj, sthiell, uemit.seren |
Version: | 19.05.3 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Stanford | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | Sherlock | CLE Version: | |
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Attachments: | SPEC patch |
Description
Kilian Cavalotti
2019-10-11 12:29:30 MDT
Created attachment 11934 [details]
SPEC patch
Here's a patch to the SPEC file to not make the slurm RPMs depend on libnvidia-ml.so, even if it's been enabled at configure time.
Cheers,
--
Kilian
Hi we have basically the same issue. We want to use configless slurm with 20.02.2 and would like to have a single gres.conf with AutoDetect=nvml for all GPU nodes of our heterogeneous HPC cluster. Any chance this patch can be upstreamed ? We have an active support contract with SchedMD We at NASA/NCCS are also experiencing this bug in 20.02.6 in multiple environments with GPUs. We've leveraged the workaround (egrep -v). We don't understand why find-requires would error out even when libnvidia-ml is installed in a system default location like /usr/lib64. The resolution we would like to see is a slurm spec file that is aware of libnvidia-ml.so.1, but which also does not fail to install if it doesn't find it--or else maybe it is smarter about how to find it. Seems to be fixed in 20.06.1: https://github.com/SchedMD/slurm/commit/1be5492c274e170451ed18763e7eeea826f57cb7 This is fixed in the spec file shipped alongside Slurm starting with 20.02.6 / 20.11.0. - Tim *** This ticket has been marked as a duplicate of ticket 9525 *** |