2346 – slurmdbd may exceed MAX_BUF_SIZE on responses, filling log files

Ticket 2346 - slurmdbd may exceed MAX_BUF_SIZE on responses, filling log files

Summary: slurmdbd may exceed MAX_BUF_SIZE on responses, filling log files

Status:	RESOLVED FIXED

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Database (show other tickets)
Version:	16.05.x
Hardware:	Linux Linux

Importance:	--- 4 - Minor Issue
Assignee:	Tim Wickberg
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2016-01-14 04:09 MST by Tim Wickberg
Modified:	2017-08-29 20:40 MDT (History)
CC List:	2 users (show)

See Also:	2344 3624
Site:	SchedMD
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:	17.11.0-pre3
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Tim Wickberg 2016-01-14 04:09:00 MST

slurmdbd may exceed MAX_BUF_SIZE on responses for especially large queries, filling log files and DoS'ing the server.

The pack() functions print an error message "Buffer size limit exceeded" but returns - successive pack() calls continue to hit this same check and generate additional messages. Eventually slurmdbd.log fills up (we're generating ~80 characters of log for every packed structure... so a potential 8GB response message can become 60+GB of log file) and bad things(tm) happen to the server.

Two things should probably change:

1) Check if we're exceeding the buffer size, and fail gracefully rather than blindly continue. Check at least once per record if the pack() calls have started to fail, then bail out and cleanup nicely. Otherwise we risk OOM and other bad things.

2) Ideally, put some sanity checks into slurmdbd and a configuration option for how much memory / time queries can take. Otherwise badly envisioned queries run by users can DoS the box, and potentially prevent incoming accounting messages from being committed properly leading to data loss. Something like new MaxRespTime and MaxRespMem settings limiting each thread. At present the response could be up to 4GB of text spat out to a user terminal, which is going to cause some exciting issues for the user if they're not sending it to a file.

Comment 1 Tim Wickberg 2017-05-03 11:34:49 MDT

Marking as a duplicate of newer bug 3624.

*** This ticket has been marked as a duplicate of ticket 3624 ***

Comment 2 Tim Wickberg 2017-08-07 18:59:39 MDT

Reopening this. Part (1) has been handled in bug 3614, but I'd still like to consider options for part (2).

Comment 8 Tim Wickberg 2017-08-29 20:40:33 MDT

Committed with some minor tweaks as f09d5587e6, and documented by 40068480. This will be in 17.11 when released.

- Tim