Bug 14925 - Power saving prerequisites are missing in documentation
Summary: Power saving prerequisites are missing in documentation
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Documentation (show other bugs)
Version: 22.05.3
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Caden Ellis
QA Contact:
URL:
Depends on: 14536
Blocks:
  Show dependency treegraph
 
Reported: 2022-09-09 06:45 MDT by Ole.H.Nielsen@fysik.dtu.dk
Modified: 2022-11-08 13:47 MST (History)
0 users

See Also:
Site: DTU Physics
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 22.05.6 23.03.0pre
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ole.H.Nielsen@fysik.dtu.dk 2022-09-09 06:45:12 MDT
The Slurm Power Saving Guide https://slurm.schedmd.com/power_save.html states that:

> A job to node mapping is available in JSON format 

I've been told by a colleague that if you enable power saving in slurm.conf, for example: 

> ResumeTimeout=600
> SuspendTime=300  
> SuspendTimeout=120
> ResumeProgram=/usr/local/bin/cloudresume
> SuspendProgram=/usr/local/bin/cloudsuspend

but JSON was not built into slurmctld, then slurmctld will crash when it is restarted!  An error message referring to JSON is given (we don't have the logfile any more).

It seems that the power saving feature is subject to the same prerequisites as listed in the REST https://slurm.schedmd.com/rest.html page, or at least this RPM package is required on an RPM-based system:  

> $ yum install json-c-devel

The presence of JSON in slurmctld can be determined with:

> $ strings `which slurmctld ` | grep HAVE_JSON
> HAVE_JSON_C_INC 1
> HAVE_JSON_INC 1
> HAVE_JSON 1

Question: Could you kindly confirm that the json-c-devel package is a prerequisite when building Slurm packages?  Can you please add this documentation of prerequisites to the Slurm Power Saving Guide?

The crashing of slurmctld is of course very bad, and somewhat difficult to diagnose.

Question: Could slurmctld check whether it has JSON support built in when loading power saving parameters from slurm.conf?  If JSON is absent, slurmctld should ignore any parameters that require JSON.

Thanks,
Ole
Comment 2 Caden Ellis 2022-09-21 22:12:05 MDT
Hey Ole,

It is in general a good idea to have the json library installed when building slurm packages. We will update the documentation on which features require the json-c package explicitly, but for now make sure you have them.

I am still trying to reproduce the slurmctld crash when the json library is not built in and power saving is enabled. I will let you know when I have an update. 

Caden
Comment 3 Caden Ellis 2022-10-06 14:37:11 MDT
Ole,

I have reproduced the issue and we are modifying the docs and discussing internally how else to address this. 

Caden
Comment 9 Caden Ellis 2022-11-08 13:47:00 MST
Ole,

The dependency of the slurmctld on json-c with power saving has been removed. This will take effect in 22.05.6.

Commit 15977b96597 and commit c1fb7396816 have the behavior change as well as the doc change. 

The SLURM_RESUME_FILE will not exist for the resume program related to power saving if json-c is unavailable, however.

Thanks for pointing this out and logging the bug.

Caden