Ticket 8954 - advice needed: switching from select/cons_res to select/cons_tres?
Summary: advice needed: switching from select/cons_res to select/cons_tres?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 19.05.4
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Ben Roberts
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-04-28 13:04 MDT by Jim Lawson
Modified: 2020-05-28 15:02 MDT (History)
0 users

See Also:
Site: U of Vermont
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name: deepgreen
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Jim Lawson 2020-04-28 13:04:19 MDT
Hi friends at SchedMD,

As has been discussed, most of the new scheduling features are in cons_tres, and cons_res is being deprecated.

When we first installed SLURM, we stayed with cons_res.  Now I would like to switch to cons_tres.  Are there any gotchas that we should be aware of, which other sites have run into? 

Reverting: It looks like we can switch back to cons_res from cons_tres without losing the queue, according to https://slurm.schedmd.com/SLUG19/Slurm_19.05.pdf
Is this still OK?  (Given that jobs submitted using new flags only supported by cons_tres won't work)

Should we update to 20.02 before switching to cons_tres?

Thanks for your time, once again.

Jim Lawson
Comment 2 Ben Roberts 2020-04-28 15:45:18 MDT
Hi Jim,

I'm not aware of problems related to changing the SelectType plugin from cons_res to cons_tres.  We have the following note in the SelectType section of the slurm.conf documentation:
SelectType
    Identifies the type of resource selection algorithm to be used. Changing this value can only be done by restarting the slurmctld daemon. When changed, all job information (running and pending) will be lost, since the job state save format used by each plugin is different. The only exception to this is when changing from cons_res to cons_tres or from cons_tres to cons_res. However, if a job contains cons_tres-specific features and then SelectType is changed to cons_res, the job will be canceled, since there is no way for cons_res to satisfy requirements specific to cons_tres. 


That being said, it's always better to be safe than sorry.  When changing the plugin type I would recommend making a backup of the directory you have defined for your StateSaveLocation.  The steps I would recommend would be the following:
1.  Change the SelectType in your slurm.conf to cons_tres.
2.  Shutdown slurmctld.
3.  Back up the directory defined as your StateSaveLocation.
4.  Start slurmctld.
5.  Verify that the jobs that were in the queue previously are still there and are able to start successfully.  If there is a problem you can shut slurm down again, restore the backup, change the plugin and pick up where you left off.  

If you're more risk averse, you can wait for all the running jobs to complete and make this change during a maintenance window, but the running jobs shouldn't have a problem with a change on the slurmctld side.  

Feel free to let me know if you've got questions about any of this.

Thanks,
Ben
Comment 3 Ben Roberts 2020-05-28 15:02:26 MDT
Hi Jim,

I think the information I sent should have answered your question and I haven't heard a follow up question.  I'm going to go ahead and close this ticket.  Feel free to respond if something does come up.

Thanks,
Ben