2457 – best practices for archiving and purging of slurmdbd

Bug 2457 - best practices for archiving and purging of slurmdbd

Summary: best practices for archiving and purging of slurmdbd

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Database (show other bugs)
Version:	14.11.10
Hardware:	Linux Linux

Importance:	--- 3 - Medium Impact
Assignee:	Tim Wickberg
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2016-02-18 10:42 MST by Steve McMahon
Modified:	2016-02-19 08:17 MST (History)
CC List:	1 user (show)

See Also:
Site:	CSIRO
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Steve McMahon 2016-02-18 10:42:33 MST

Hi,

We are having performance problems with slurmdbd.  We know that we can turn on archiving and purging but wanted to get a feel for best practices before we do this.  Are there any best practices?  Should we just use the defaults?

If we turned on archiving and purging does that happend straight away.

We do keep backups of the the MySQL database of course.

We would like to keep the data over time to keep an eye on trends etc.

Regards.
Steve McMahon
CSIRO.

Comment 1 Tim Wickberg 2016-02-18 10:55:37 MST

(In reply to Steve McMahon from comment #0)
> Hi,
> 
> We are having performance problems with slurmdbd.  We know that we can turn
> on archiving and purging but wanted to get a feel for best practices before
> we do this.  Are there any best practices?  Should we just use the defaults?

The slurmdbd.conf man page goes over the settings and how frequent they are, we don't have any specific recommendations beyond that.

For performance though, you may want to check on a few other things as well:

innodb_buffer_pool_size can have a huge impact - we'd recommend setting this as high as half the RAM available on the slurmdbd server. You can check the current setting in MySQL like so:

mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

I will also note that 15.08 did improve the database performance - some of the indices have been rearranged, and you might find that alleviates your current concerns.

You can safely update and run a newer slurmdbd while leaving the rest of the cluster on an older release (within two releases) - this is the upgrade process we recommend anyways. I've personally found that it's easier to do the database upgrade a week ahead of the main cluster, but it's fine over a longer period.

Note that the upgrade process can take quite a while - the table rearrangement can take quite a while.

Comment 2 Steve McMahon 2016-02-18 14:31:30 MST

Thanks for the advice Tim.

We have already done the tuning on MySQL.  We are using Bright Cluster Manager which packages slurm (including slurmdbd).  We’d like to stay with the update cycle with that.  There will be an update soon that takes us to slurm 15.x anyway.

We’ll go with the defaults for purging and archiving except for ArchiveDir.

So we can close this ticket.

Steve McMahon
Solution Architect and Senior System Administrator | Scientific Computing
Information Management and Technology
CSIRO
T +61 2 6214 2968 Alt +61 4 0077 9318
steve.mcmahon@csiro.au<mailto:steve.mcmahon@csiro.au> | www.csiro.au
1 Wilf Crane Crescent, Yarralumla ACT 2600

PLEASE NOTE
The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.
Please consider the environment before printing this email.

From: bugs@schedmd.com [mailto:bugs@schedmd.com]
Sent: Friday, 19 February 2016 11:56 AM
To: McMahon, Steve (IM&T, Yarralumla) <Steve.Mcmahon@csiro.au>
Subject: [Bug 2457] best practices for archiving and purging of slurmdbd

Tim Wickberg<mailto:tim@schedmd.com> changed bug 2457<http://bugs.schedmd.com/show_bug.cgi?id=2457>
What

Removed

Added

Assignee

support@schedmd.com<mailto:support@schedmd.com>

tim@schedmd.com<mailto:tim@schedmd.com>

Comment # 1<http://bugs.schedmd.com/show_bug.cgi?id=2457#c1> on bug 2457<http://bugs.schedmd.com/show_bug.cgi?id=2457> from Tim Wickberg<mailto:tim@schedmd.com>

(In reply to Steve McMahon from comment #0<show_bug.cgi?id=2457#c0>)

> Hi,

>

> We are having performance problems with slurmdbd.  We know that we can turn

> on archiving and purging but wanted to get a feel for best practices before

> we do this.  Are there any best practices?  Should we just use the defaults?



The slurmdbd.conf man page goes over the settings and how frequent they are, we

don't have any specific recommendations beyond that.



For performance though, you may want to check on a few other things as well:



innodb_buffer_pool_size can have a huge impact - we'd recommend setting this as

high as half the RAM available on the slurmdbd server. You can check the

current setting in MySQL like so:



mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';



I will also note that 15.08 did improve the database performance - some of the

indices have been rearranged, and you might find that alleviates your current

concerns.



You can safely update and run a newer slurmdbd while leaving the rest of the

cluster on an older release (within two releases) - this is the upgrade process

we recommend anyways. I've personally found that it's easier to do the database

upgrade a week ahead of the main cluster, but it's fine over a longer period.



Note that the upgrade process can take quite a while - the table rearrangement

can take quite a while.

________________________________
You are receiving this mail because:

  *   You reported the bug.

Comment 3 Tim Wickberg 2016-02-19 08:17:36 MST

You're certainly welcome, that's what we're here for. 

If I can help further or you run into any problems please let me know.