2413 – SLURM high availability with centos 7.2

Bug 2413 - SLURM high availability with centos 7.2

Summary: SLURM high availability with centos 7.2

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Other (show other bugs)
Version:	14.11.3
Hardware:	Linux Linux

Importance:	--- 4 - Minor Issue
Assignee:	Tim Wickberg
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2016-02-03 11:11 MST by Koji Tanaka
Modified:	2016-02-11 11:56 MST (History)
CC List:	0 users

See Also:
Site:	OIST
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Koji Tanaka 2016-02-03 11:11:45 MST

Dear all,

we are testing the environment in order to rebuild our cluster Sango with CentOS 7.2 (kernel 3.10.0-327.el7.x86_64).
We will use the latest SLURM version available and we will keep it updated.
Since we have two scheduling nodes one of crucial point is slurm high availability.
Do you have any recommendation of tools to use or any suggestion to implement this feature?
We would like also to keep slurmdb updated and shared between the two scheduling nodes.

Please let us know, any advice would be really appreciated.

Thank you in advance

Comment 1 Tim Wickberg 2016-02-05 08:22:00 MST

Hi - 

One quick note - please submit any questions as Sev4 or higher. Sev5 is reserved for longer-term development projects, and does not get out immediate attention.

Slurm has native support for primary/backup operations, both for the slurmctld process as well as the database. We recommend running slurmctld as primary/backup, but usually only a single slurmdbd process.

For slurmdbd, the MySQL server tends to be the single point of failure, and we don't recommend any MySQL HA at this point in time. Communication from slurmctld to slurmdbd is designed to withstand a duration of slurmdbd being unavaiable, and slurmctld will cache and records it would need to write out up to a limit.

slurmctld can run as a HA pair - both of them will need access to shared storage to store the current job spool and state data. Most sites use an NFS server for this, although any other shared filesystem should work.

The slurm.conf man page has details on the various options:

http://slurm.schedmd.com/slurm.conf.html

Briefly, you would simply need to set
BackupController
BackupAddr
and ensure that the
StateSaveLocation=
is accessible on both hosts, then start slurmctld on both systems. The `scontrol ping` command can give a brief summary of the status of both systems, and `scontrol takeover` can be used to force the backup controller to assume control.

Comment 2 Tim Wickberg 2016-02-11 07:58:50 MST

Did you have any further questions on the setup, or should I go ahead and mark this as resolved?

cheers,
- Tim

Comment 3 Koji Tanaka 2016-02-11 10:26:12 MST

Dear Tim, 

thank you so much and sorry for the late reply.
Right now our HA configuration is with 2 schedulers with DRDB and peacemaker configured.
This configuration was not made by us but by the vendor and soon we are going to rebuild everything by our own and with CentOS 7.2 so we would like to understand what are your recommendation about HA.
The reason mainly is that we are not sure if we are going to replicate the same configuration.
Do you know if other places implemented something similar?

Thank you so much again

Comment 4 Tim Wickberg 2016-02-11 11:36:12 MST

(In reply to Francesca Tartaglione from comment #3)
> Dear Tim, 
> 
> thank you so much and sorry for the late reply.
> Right now our HA configuration is with 2 schedulers with DRDB and peacemaker
> configured.

We do not recommend running that way, Slurm's native primary/backup mechanism is definitely preferable.

You do need to have the state directory shared between them; usually this is a shared NFS filesystem, but it could be DRDB, GFS, or OCFS2, or something from a commercial vendor.

> This configuration was not made by us but by the vendor and soon we are
> going to rebuild everything by our own and with CentOS 7.2 so we would like
> to understand what are your recommendation about HA.
> The reason mainly is that we are not sure if we are going to replicate the
> same configuration.
> Do you know if other places implemented something similar?

I've seen some products implement a primary/standby failover method with pacemaker and it seems to work, but again we don't recommend it.

Once you get a single slurmctld process up, adding the second is just a matter of adding the 
BackupController=
BackupAddr=

lines to the configuration, ensuring that the 
StateSaveLocation=
is accessible on both systems, and starting slurmctld on both.

I'd recommend sticking with a single slurmdbd instance, against a non-HA MySQL - all of Slurm's database interactions are cached in the event that slurmdbd is unavailable, and it's less complex than introducing MySQL replication into the mix. Simpler is better from our perspective - the fewer technologies in the mix the less there is to troubleshoot.

Please let me know if you run into any problems when testing, or need further details.

- Tim

Comment 5 Koji Tanaka 2016-02-11 11:43:43 MST

Dear Tim,

thank you for your quick reply and the information.
We will test on our test cluster the configuration you recommended and we will go from there.

Thank you so much for the clarification.

Best regards

Comment 6 Tim Wickberg 2016-02-11 11:56:37 MST

Certainly.

I'm going ahead and marking this as resolved/infogiven for now, please re-open if you have further questions.