Summary: | Cannot load auth_munge plugin after upgrade | ||
---|---|---|---|
Product: | Slurm | Reporter: | David Matthews <david.matthews> |
Component: | slurmstepd | Assignee: | Tim Wickberg <tim> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 15.08.6 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Met Office | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | 15.08.7 16.05.0-pre1 | Target Release: | --- |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
David Matthews
2016-01-08 00:21:41 MST
Embarrassingly the comment you've highlighted notes what the fix is. I'll have a patch available shortly, and this'll be included in 15.08.7 which we expect to release towards the end of January. Actually, changing to check major/minor only would not help. The real solution is to load the authentication plugin up-front, rather than lazy-loading it when the job completes - slurmstepd only communicates after the job has finished since it receives the initial job information directly from slurmd at the start. If I change that check upgrading between versions would still fail. Commit 870273ca1499 fixes this, and will be in 15.08.7 due out in a few weeks. If you want to apply this patch now you can download it here: https://github.com/SchedMD/slurm/commit/870273ca1499.patch While reproducing this I noticed that slurmstepd will not cleanup properly - you may want to check for stray slurmstepd's on your test nodes and kill them manually. - Tim Tim - thanks for the fix. As far as I can see, slurmstepd exited after it timed out (~1 hour after job completion - see the slurmd.log messages I reported) so I didn't have to do any clean-up Do you plan on putting out something on the mailing list? I assume anyone who is already running 15.08 is going to hit this when they next upgrade unless they patch their existing release first. (In reply to David Matthews from comment #3) > Tim - thanks for the fix. Certainly, that's what we're here for. I hope that didn't cause too many issues. > Do you plan on putting out something on the mailing list? I assume anyone > who is already running 15.08 is going to hit this when they next upgrade > unless they patch their existing release first. We'll note the upgrade issue on the next point release, and may have to warn about it out on the next major as well. I don't think upgrading the installation while the node is live like you've done is especially common though - a lot of sites tend to upgrade it in a node image + reimage the node during maintenance, or have it installed on a central NFS mount and maintain a "current" symlink to flip between releases (which would still have allowed slurmstepd to dlload() the correct version from its own install directory). |