7858 – sreport query result exceeds size limit

Ticket 7858 - sreport query result exceeds size limit

Summary: sreport query result exceeds size limit

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Accounting (show other tickets)
Version:	- Unsupported Older Versions
Hardware:	Linux Linux

Importance:	--- 3 - Medium Impact
Assignee:	Gavin D. Howard
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2019-10-03 11:38 MDT by Bruno Mundim
Modified:	2019-10-16 15:46 MDT (History)
CC List:	2 users (show)

See Also:
Site:	SciNet
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	CentOS
Machine Name:	Niagara
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Our current slurmdb.conf (402 bytes, text/plain) 2019-10-04 09:23 MDT, Bruno Mundim	Details
Our current slurm.conf (7.06 KB, text/plain) 2019-10-04 09:29 MDT, Bruno Mundim	Details
number of jobs and job steps per quarter on database tables (3.56 KB, text/plain) 2019-10-15 08:24 MDT, Bruno Mundim	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Bruno Mundim 2019-10-03 11:38:59 MDT

We got the following error when running sreport for the previous quarter:

[root@nia-m6-sch02 ~]# sreport job sizesbyaccount -nP grouping=80,160,400,800,1600,4000 start=2019-07-01 end=2019-10-01
sreport: error: slurmdbd: Query result exceeds size limit
 Problem with job query.

This query worked fine the previous quarters we ran it and the number
of jobs per quarter don't change that much. I see that MaxQueryTimeRange 
might exceed the 3GB maximum size as explained on slurmdbd.conf man page.
However, we need to produce a usage report and this query is very important 
to us. So I have a few questions:

1) Is there a way to increase the maximum size so that our query is accommodated?

2) Why is this happening in the first place since the number of jobs don't 
change that much from quarter to quarter?

3) Do you have another suggestion to work around this issue?

Thanks,
Bruno.

Comment 4 Gavin D. Howard 2019-10-03 14:12:55 MDT

It's hard to see exactly why the query is failing now. It could be any number of reasons. The best first guess is that more of your jobs have more steps, which will also count against the limit.

Can you tell me what version of Slurm you are running? The bug report just says unsupported older version.

Also, please send me your slurm.conf and slurmdbd.conf.

Comment 5 Bruno Mundim 2019-10-04 09:23:44 MDT

Created attachment 11826 [details]
Our current slurmdb.conf

Comment 6 Bruno Mundim 2019-10-04 09:29:11 MDT

Created attachment 11827 [details]
Our current slurm.conf

We are running slurm 17.11.12 with some patches regarding 
porting X11 commits from 18 into 17.11.12 as mentioned at

https://bugs.schedmd.com/show_bug.cgi?id=6618

I will try to count the number of jobs steps and jobs from 
one quarter to the other, but I do have the totals as a few
weeks old:

niagara_step_table             19907799
niagara_assoc_usage_hour_table  2630390
niagara_job_table               1800385

since I was timing the database upgrade from 17.11 to 19.05.

Thanks,
Bruno.

Comment 7 Gavin D. Howard 2019-10-09 12:22:48 MDT

So, after looking at this, I think the best workaround would be to split the `sreport` command into three, one for each month:

> [root@nia-m6-sch02 ~]# sreport job sizesbyaccount -nP grouping=80,160,400,800,1600,4000 start=2019-07-01 end=2019-08-01
> [root@nia-m6-sch02 ~]# sreport job sizesbyaccount -nP grouping=80,160,400,800,1600,4000 start=2019-08-01 end=2019-09-01
> [root@nia-m6-sch02 ~]# sreport job sizesbyaccount -nP grouping=80,160,400,800,1600,4000 start=2019-09-01 end=2019-10-01

I say this because I notice that you are using the `-P` option, which is usually used for scripts to parse the output, not for humans, so I presume a script is processing the data. While we work on figuring out what happened, that may be your easiest, and least intrusive, workaround, even if it it requires some work.

A workaround that I do NOT recommend, but is still possible, is to do as you suggested and manually raise the limit. The limit is `REASONABLE_BUF_SIZE` defined in `src/common/pack.h` near the top (at least it is in 19.05). This will require you to rebuild `slurmdbd` and all of your client commands. It may also require rebuilding `slurmctld` and `slurmd`, so if you do this, I would recommend rebuilding and redeploying everything. However, if you do this, you may break Slurm in many ways because there is probably code that relies on the assumption that `REASONABLE_BUF_SIZE` is what it is. SchedMD and I cannot help you if you do this and things go wrong.

In the meantime, while you do that, I will talk to other engineers about this problem. I cannot promise anything at all. The reason is because there is only one place in the code where this error can be raised: in `slurm_pack_list()` in `src/common/slurm_protocol_pack.c`, and it just simply checks the size of the buffer it is filling with data on every iteration of the filling loop.

One thing you can do is to run `slurmdbd` in a debugger and read how big the size of the data is. You can also do that with previous quarters to get an idea of how big they were and how close they were to the limit. However, there is not much I can do to figure out why you suddenly got that message now and not before, except in the unlikely case that you can send me your actual database.

One thing that will help me is if you can send me the actual statistics from the problem quarter and from one or two quarters before, as soon as you get them. Thank you.

Comment 8 Bruno Mundim 2019-10-15 08:24:20 MDT

Created attachment 11955 [details]
number of jobs and job steps per quarter on database tables

I have attached what I hopefully think it is the number of jobs and the number of job steps per quarter since Niagara came to production. While the number of jobs doesn't seem to have increased considerably during this period, I have noticed an explosion in number of job steps from 4402294 steps on the last quarter the command worked at once for the entire quarter time window to 12989617 on the last quarter, 2019-07-01 to 2019-10-01. This fact might be causing slurmdbd to choke.

On the other hand, I tried to run the command again and check on slurmdbd memory consumption. slurmdbd started off with only 6228KiB on resident memory as output by the command top. As I ran the command for the previous quarter

sreport job sizesbyaccount -nP grouping=80,160,400,800,1600,4000 start=2019-04-01 end=2019-07-01

slurmdbd memory jumped to 4.8GiB. And as I tried again for the current quarter:

sreport job sizesbyaccount -nP grouping=80,160,400,800,1600,4000 start=2019-07-01 end=2019-10-01

the command choke and slurmdbd memory jumped to 14.3GiB (as output by top).
This high memory consumption doesn't seem natural to me. I would like to run it through a debugger but I would like some advice on how to do it and what to look for since I see at least 6 threads running out of slurmdbd and I am not sure how they are spawned or how they are related to each other. 

Also I should note that I ran these tests on a copy of a recent production database backup, but I ran them on a vm where I had slurm 19.05.3-2 installed.
So when I restarted slurmdbd I had to wait 1h34' before the database was updated to the newer slurm format. By the way, is this upgrade operation a serial one? What is the usual bottleneck? I/O? It doesn't seem to be cpu bound...

Thanks,
Bruno.

Comment 9 Gavin D. Howard 2019-10-15 11:59:49 MDT

(In reply to Bruno Mundim from comment #8)
> Created attachment 11955 [details]
> number of jobs and job steps per quarter on database tables
> 
> I have attached what I hopefully think it is the number of jobs and the
> number of job steps per quarter since Niagara came to production. While the
> number of jobs doesn't seem to have increased considerably during this
> period, I have noticed an explosion in number of job steps from 4402294
> steps on the last quarter the command worked at once for the entire quarter
> time window to 12989617 on the last quarter, 2019-07-01 to 2019-10-01. This
> fact might be causing slurmdbd to choke.

I definitely think that is the reason.

> On the other hand, I tried to run the command again and check on slurmdbd
> memory consumption. slurmdbd started off with only 6228KiB on resident
> memory as output by the command top. As I ran the command for the previous
> quarter
> 
> sreport job sizesbyaccount -nP grouping=80,160,400,800,1600,4000
> start=2019-04-01 end=2019-07-01
> 
> slurmdbd memory jumped to 4.8GiB. And as I tried again for the current
> quarter:
> 
> sreport job sizesbyaccount -nP grouping=80,160,400,800,1600,4000
> start=2019-07-01 end=2019-10-01
> 
> the command choke and slurmdbd memory jumped to 14.3GiB (as output by top).
> This high memory consumption doesn't seem natural to me.

As far as I can tell, this is mostly due to two things:

1) Various parts of Slurm duplicate information.
2) Slurm uses linked lists internally, and such linked lists can increase the memory consumption a lot, especially if the data per node is small.

> I would like to run
> it through a debugger but I would like some advice on how to do it and what
> to look for since I see at least 6 threads running out of slurmdbd and I am
> not sure how they are spawned or how they are related to each other. 

If you are focused on memory usage, you can set a breakpoint in `xmalloc()`, `xcalloc()`, `xrealloc()`, and `xfree()`. Those are wrappers for `malloc()` and and friends.

> Also I should note that I ran these tests on a copy of a recent production
> database backup, but I ran them on a vm where I had slurm 19.05.3-2
> installed.
> So when I restarted slurmdbd I had to wait 1h34' before the database was
> updated to the newer slurm format. By the way, is this upgrade operation a
> serial one? What is the usual bottleneck? I/O? It doesn't seem to be cpu
> bound...

The bottleneck is I/O because Slurm runs direct MySQL queries to update the database. Thus, it is I/O bound, and it is not as efficient as it could be.

Comment 10 Gavin D. Howard 2019-10-16 15:46:05 MDT

I am going to close this, but please feel free to reopen if you have further questions.