slurmdbd may exceed MAX_BUF_SIZE on responses for especially large queries, filling log files and DoS'ing the server. The pack() functions print an error message "Buffer size limit exceeded" but returns - successive pack() calls continue to hit this same check and generate additional messages. Eventually slurmdbd.log fills up (we're generating ~80 characters of log for every packed structure... so a potential 8GB response message can become 60+GB of log file) and bad things(tm) happen to the server. Two things should probably change: 1) Check if we're exceeding the buffer size, and fail gracefully rather than blindly continue. Check at least once per record if the pack() calls have started to fail, then bail out and cleanup nicely. Otherwise we risk OOM and other bad things. 2) Ideally, put some sanity checks into slurmdbd and a configuration option for how much memory / time queries can take. Otherwise badly envisioned queries run by users can DoS the box, and potentially prevent incoming accounting messages from being committed properly leading to data loss. Something like new MaxRespTime and MaxRespMem settings limiting each thread. At present the response could be up to 4GB of text spat out to a user terminal, which is going to cause some exciting issues for the user if they're not sending it to a file.
Marking as a duplicate of newer bug 3624. *** This ticket has been marked as a duplicate of ticket 3624 ***
Reopening this. Part (1) has been handled in bug 3614, but I'd still like to consider options for part (2).
Committed with some minor tweaks as f09d5587e6, and documented by 40068480. This will be in 17.11 when released. - Tim