We recently found that some sacct commands cause the slurmdbd memory usage to balloon up to 50 GB or more. Additionally, the slurmdbd.log filled with errors like: [2016-01-09T03:44:15.787] error: packdouble: Buffer size limit exceeded (4294917478 > 4294901760) [2016-01-09T03:44:15.787] error: pack32: Buffer size limit exceeded (4294917478 > 4294901760) and filled the disk. Either the memory or disk usage causes the slurmdbd process to die, and needs to be restarted. The first time we noticed this problem was a week after updating to 14.11.10, but we can't be sure the upgrade and problem are necessarily related. After some investigation, we've found that a command like sacct -a -npo 'user,account,jobid,state' --starttime=2015-01-13T09:00:00 can - hang and not produce any output - alternatively, produce an error like sacct: error: slurmdbd: Getting response to message type 1444 sacct: error: slurmdbd: DBD_GET_JOBS_COND failure: No error - cause the slurmdbd process to steadily start using more memory - slurmdbd memory usage still increases after Ctrl-C'ing out of the sacct process - the slurmdbd log fills with errors like the above and grows by 20-30 GB in a few minutes. For the slurm user: ulimit -a core file size (blocks, -c) 1 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 257437 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 257437 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited There haven't been any changes to slurmdbd.conf since May 2015 (except for increasing the debug level just now to try to debug this problem). The slurmacct database has about 11M jobs + steps. Happy to provide any additional info you need.
(In reply to Susan Chacko from comment #0) > We recently found that some sacct commands cause the slurmdbd memory usage > to balloon up to 50 GB or more. Additionally, the slurmdbd.log filled with > errors like: > [2016-01-09T03:44:15.787] error: packdouble: Buffer size limit exceeded > (4294917478 > 4294901760) > [2016-01-09T03:44:15.787] error: pack32: Buffer size limit exceeded > (4294917478 > 4294901760) > > and filled the disk. Either the memory or disk usage causes the slurmdbd > process to die, and needs to be restarted. > > The first time we noticed this problem was a week after updating to > 14.11.10, but we can't be sure the upgrade and problem are necessarily > related. How many jobs / job steps are you running per day? It looks like the response message is trying to grow to over 4GB. Whether that's appropriate or not for your query it's hard to say, although that certainly does sound like a bug. How large is the raw mysql database at this point? > After some investigation, we've found that a command like > > sacct -a -npo 'user,account,jobid,state' --starttime=2015-01-13T09:00:00 > > can > - hang and not produce any output > > - alternatively, produce an error like > sacct: error: slurmdbd: Getting response to message type 1444 > sacct: error: slurmdbd: DBD_GET_JOBS_COND failure: No error > > - cause the slurmdbd process to steadily start using more memory > > - slurmdbd memory usage still increases after Ctrl-C'ing out of the sacct > process It's asynchronous - slurmdbd will still complete the query even if sacct disconnects. There's no mechanism to cancel the query at present, although that does sound like a good improvement to be made. Can you run a command directly on the database like: select count(*) from cluster_job_table where time_start > unix_timestamp('2016-01-13'); Actually - I see the problem here - you're asking for over a year of records in one shot (2015-01-13T09:00:00). I don't know if that was intentional or not - it's 2016 now. I could easily see a year+ of your system resulting in a huge result that goes over our response buffer. You'd also probably not like to experience a terminal window getting 4GB of text dumped to it. If you're intentionally trying to grab a year of results, you'll need to split it up into multiple query ranges to avoid issues. Admittedly, slurm should handle exceedingly large queries like this better - setting some sort of upper bound on per-query resource use to prevent users from DOS'ing slurmdbd, and checking whether our calls to pack() are succeeding or not (rather than continuing to print junk to the log and filling the filesystem), and stopping the query sooner. I can file an enhancement request for 16.05 to explore adding such checks if you'd like. > - the slurmdbd log fills with errors like the above and grows by 20-30 GB in > a few minutes. > > For the slurm user: > ulimit -a > core file size (blocks, -c) 1 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 257437 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) unlimited > cpu time (seconds, -t) unlimited > max user processes (-u) 257437 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > There haven't been any changes to slurmdbd.conf since May 2015 (except for > increasing the debug level just now to try to debug this problem). > > The slurmacct database has about 11M jobs + steps. > > Happy to provide any additional info you need.
(In reply to Tim Wickberg from comment #1) > > select count(*) from cluster_job_table where time_start > > unix_timestamp('2016-01-13'); Looks like about 110K jobs/steps > Actually - I see the problem here - you're asking for over a year of records > in one shot (2015-01-13T09:00:00). I don't know if that was intentional or > not - it's 2016 now. I could easily see a year+ of your system resulting in > a huge result that goes over our response buffer. You'd also probably not > like to experience a terminal window getting 4GB of text dumped to it. Shoot, you're right. At least some of our test queries had the wrong year. It's possible one of our users was also running queries with the wrong timestamp. This brings up another couple of questions: At this point our db contains all jobs (~ 11 M jobs/steps) from about a year ago when we first started up Slurm. Have we reached the point where we should be purging old completed jobs from the db? And is there a recommended method of archiving old job data from the database? We've thought of stopping slurmdbd, then copying the db file (or doing a mysql dump), deleting old completed jobs, and then restarting slurmdbd. Is that the best method? > Admittedly, slurm should handle exceedingly large queries like this better - > setting some sort of upper bound on per-query resource use to prevent users > from DOS'ing slurmdbd, and checking whether our calls to pack() are > succeeding or not (rather than continuing to print junk to the log and > filling the filesystem), and stopping the query sooner. I can file an > enhancement request for 16.05 to explore adding such checks if you'd like. That would be very useful. We can be careful going forward, but it's always possible that a user will inadvertently run a large query and cause this problem. Thanks! Susan
> Shoot, you're right. At least some of our test queries had the wrong year. It's > possible one of our users was also running queries with the wrong timestamp. > > This brings up another couple of questions: At this point our db contains all > jobs (~ 11 M jobs/steps) from about a year ago when we first started up Slurm. > Have we reached the point where we should be purging old completed jobs from > the db? And is there a recommended method of archiving old job data from the > database? We've thought of stopping slurmdbd, then copying the db file (or > doing a mysql dump), deleting old completed jobs, and then restarting slurmdbd. > Is that the best method? You've probably passed the point that the historical records are useful, but that does vary by customer demands. slurmdbd can do all that for you... we usually try to avoid having people interact directly with MySQL except for debugging purposes - we do change some of the table structures and schema each release, and our supported API is through sacct / sacctmgr, not direct to MySQL. You can load these archive files back in at any time with sacctmgr. Please see http://slurm.schedmd.com/slurmdbd.conf.html for more details. ArchiveDir=/path/to/archives/ ArchiveJobs=yes # these aren't as important, you may want to leave them off: ArchiveEvents=yes ArchiveResvs=yes ArchiveSteps=yes # figure out which records to get rid of, and at what frequency PurgeEventAfter=1month PurgeJobAfter=12month PurgeResvAfter=1month PurgeStepAfter=1month PurgeSuspendAfter=1month >> Admittedly, slurm should handle exceedingly large queries like this better - >> setting some sort of upper bound on per-query resource use to prevent users >> from DOS'ing slurmdbd, and checking whether our calls to pack() are >> succeeding or not (rather than continuing to print junk to the log and >> filling the filesystem), and stopping the query sooner. I can file an >> enhancement request for 16.05 to explore adding such checks if you'd like. > > That would be very useful. We can be careful going forward, but it's always > possible that a user will inadvertently run a large query and cause this > problem. Purging sufficient records such that they couldn't possible request that much data is the easiest fix for now. I'm guessing if you keep ~2-3 months of data you'd be safe for now. I've opened two enhancement requests related to this bug now, and we'll consider them as part of our normal release engineering process. (They're marked as internal, I can add you to them if you're interested.) - Tim
(In reply to Tim Wickberg from comment #3) Thanks a lot, Tim. We know what we need to do now, so you can close this. > I've opened two enhancement requests related to this bug now, and we'll > consider them as part of our normal release engineering process. > (They're marked as internal, I can add you to them if you're interested.) Yes, please. Susan.
Closing. I've added you both to the enhancement requests now as you've probably seen. No promises on when we get to those though, 16.05 is coming up soon and we've got a fairly full plate for that release. - Tim