Ticket 2315 - sreport top users fails to run on edison system
Summary: sreport top users fails to run on edison system
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Database (show other tickets)
Version: 15.08.6
Hardware: Cray XC Linux
: --- 4 - Minor Issue
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-01-06 23:40 MST by Doug Jacobsen
Modified: 2016-05-31 04:19 MDT (History)
0 users

See Also:
Site: NERSC
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
working query (250.66 KB, text/plain)
2016-03-15 10:52 MDT, Doug Jacobsen
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Doug Jacobsen 2016-01-06 23:40:58 MST
Hello,

When running "sreport top users cluster=edison" on the edison system, after a 15 minutes I get an error:

dmj@edison09:~> time sreport user topuser cluster=edison
sreport: error: slurmdbd: Getting response to message type 1415
sreport: error: slurmdbd: DBD_GET_USERS failure: No error
 Problem with user query.

real	15m0.367s
user	0m0.228s
sys	0m0.008s



During this time the mysql server is pegged at 100% and the offending query seems to be:

| 40795 | slurm | edique01:46690 | slurm_acct_db | Query       |    190 | Copying to tmp table                                                  | select t3.id_assoc, t1.id_tres, t1.time_start, t1.alloc_secs from "edison_assoc_usage_day_table" as  |

These queries are left running after sreport times out, and I have manually killed a couple of them.

We have over 10000 associations in the database, but then again, so does cori and this query works on cori's database (different instances of slurmdbd and mysql).

I don't see any particular output in slurmdbd.log that is informative (already set to debug2).

Do you know what might be going on here?

Thanks,
Doug
Comment 1 Tim Wickberg 2016-01-07 01:13:40 MST
15 minutes is the mysql execution timeout on slurmdbd, so for whatever reason that query isn't return in time. slurmdbd's unlikely to have logged anything interesting, aside from the fact that query didn't complete in time.

One quick set of adjustments to mysql that may help (and might not have happened for edison before, but were done for cori at some point?) is to increase the innodb_buffer_pool_size value. The default is quite low; assuming slurmdbd is standalone you could increase that to, say, 16GB. Increasing innodb_buffer_pool_instances as well (default is one, multi-socket boxes with plenty of memory should go higher) may help as well.
Comment 2 Doug Jacobsen 2016-01-07 02:09:03 MST
Hi Tim,

Thanks for getting back to me on this.  It turns out that I had
configured innodb_buffer_pool_size to 16G but a later typo reset it to
nothing.  It is likely this bug is on cori as well (since the same ansible
config was used to deploy them).  I'll check on that.

I've set innodb_buffer_pool_size to 16G correctly (and confirmed that we're
getting >16G allocated in the mysql address space).  Also set the instances
to 8 following recommendations from the MySQL documentation (>1G per
instance).

I restarted mysql and let is hang out for a bit to get those caches filled.

sreport user topusers still hangs on that query trying to copy to temporary
tables.

I've copied by my.cnf below in case you see something else that is of
concern.

There are other significant differences between cori and edison's database
servers, namely that cori's has 512G of memory and edison's has just 24G.
However, while the query to copy data to temporary tables is running, I'm
not seeing a spike in consumed resident memory, it's staying constant on
edison at around 1.2G.  I wonder if we're getting hung up on some heuristic
in mysql's memory management code...

edique01:~ # cat /etc/my.cnf
[client]
port            = 3306
socket          = /var/lib/mysql/mysql.sock


# Here follows entries for some specific programs

# The MySQL server
[mysqld]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
# Change following line if you want to store your database elsewhere
datadir = /var/data/mysql
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
log-error=/var/log/mysql/mysqld.log

#### REPLICATION ####
log-bin=mysql-bin
binlog_format=mixed
binlog-ignore-db=mysql
sync_binlog=1

server-id = 1
#### END REPLICATION ####

#innodb_data_file_path = ibdata1:10M:autoextend
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 16G
innodb_buffer_pool_instances=8
innodb_additional_mem_pool_size = 10M
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 1G
innodb_log_buffer_size = 4M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50

# The safe_mysqld script
[safe_mysqld]
log-error       = /var/log/mysql/mysqld.log
socket          = /var/lib/mysql/mysql.sock

!includedir /etc/mysql

[mysqldump]
socket          = /var/lib/mysql/mysql.sock
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

[mysqld_multi]
mysqld     = /usr/bin/mysqld_safe
mysqladmin = /usr/bin/mysqladmin
log        = /var/log/mysqld_multi.log



----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
dmjacobsen@lbl.gov

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________


On Thu, Jan 7, 2016 at 7:13 AM, <bugs@schedmd.com> wrote:

> *Comment # 1 <http://bugs.schedmd.com/show_bug.cgi?id=2315#c1> on bug 2315
> <http://bugs.schedmd.com/show_bug.cgi?id=2315> from Tim Wickberg
> <tim@schedmd.com> *
>
> 15 minutes is the mysql execution timeout on slurmdbd, so for whatever reason
> that query isn't return in time. slurmdbd's unlikely to have logged anything
> interesting, aside from the fact that query didn't complete in time.
>
> One quick set of adjustments to mysql that may help (and might not have
> happened for edison before, but were done for cori at some point?) is to
> increase the innodb_buffer_pool_size value. The default is quite low; assuming
> slurmdbd is standalone you could increase that to, say, 16GB. Increasing
> innodb_buffer_pool_instances as well (default is one, multi-socket boxes with
> plenty of memory should go higher) may help as well.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 3 Tim Wickberg 2016-01-07 02:16:47 MST
Are there any significant differences in storage between those?

Does Cori have the database stored on a 1TB NVMe card, where Edison has 
some spinning rust that it's trying to pull from?

I haven't looked through those queries extensively, that temporary table 
may be getting flushed back to disk somehow. I think your guess that 
some MySQL memory heuristic is getting in the way may be valid as well, 
I'd expect it to fill that up faster.

One other setting that may help is CommitDelay in slurmdbd.conf. Setting 
it to 1 (= 1 second) may alleviate some pressure on MySQL and allow that 
query to complete faster rather than fight over internal locks. (Note 
that setting means a crash could loose up to a second of accounting 
records, without it slurmdbd writes everything to the database ASAP.)
Comment 4 Tim Wickberg 2016-01-14 03:26:40 MST
Hey Doug -

Did you get a chance to try out setting CommitDelay ? Just want to make sure you're not waiting on me for anything at the moment...

- Tim
Comment 5 Doug Jacobsen 2016-01-14 10:52:48 MST
Hi Tim,

Sorry, the delay was on my end.  I was temporarily delayed by total loss of the edison database, but now that that is all set and the allocation year turnover activities are calming down, I can look at this again.

Both edison and cori have nearly identical slurmdbd.conf, just the hostnames and database credentials are changed.  edison and cori each have mysql running on local disk, though the disk in edison is significantly older.  I have ordered upgrades which should arrive soon, those this is more about mirroring and capacity than speed.

cori does not have SSDs, nor are edison's disks quite at the level of "spinning rust" - though I'll remember the term -- got a laugh out of me.


Regarding CommitDelay, that is (and was) set to 1.

-Doug
Comment 6 Tim Wickberg 2016-01-14 12:07:17 MST
On 01/14/2016 07:52 PM, bugs@schedmd.com wrote:

> Sorry, the delay was on my end.  I was temporarily delayed by total loss of the
> edison database, but now that that is all set and the allocation year turnover
> activities are calming down, I can look at this again.

Gah... I've had a cold this week that's been messing with me, I should 
have put it together that the DB loss means this is probably not 
reproducible at the moment.

> Both edison and cori have nearly identical slurmdbd.conf, just the hostnames
> and database credentials are changed.  edison and cori each have mysql running
> on local disk, though the disk in edison is significantly older.  I have
> ordered upgrades which should arrive soon, those this is more about mirroring
> and capacity than speed.
>
> cori does not have SSDs, nor are edison's disks quite at the level of "spinning
> rust" - though I'll remember the term -- got a laugh out of me.

I'm surprised you haven't heard that before. I don't mean to besmirch 
either systems disks - technically all mechanical drives are 'spinning 
rust', although the specific oxide coating does vary some.

> Regarding CommitDelay, that is (and was) set to 1.

Hmm... I'm have to come up with some other suggestions then if you're 
still seeing issues, that's usually our ace-in-the-hole for improving 
slurmdbd performance.
Comment 7 Tim Wickberg 2016-02-02 07:34:06 MST
Sorry for the delay on this. Are you still seeing timeouts on these?

If so, can you briefly turn on DebugFlags=DB_QUERY in slurmdbd.conf to find the exact offending query string in the log, then try timing that query when run directly in mysql?
Comment 8 Tim Wickberg 2016-02-11 07:52:56 MST
Hey Doug -

Is this still an issue for you? I haven't seen anything on this in awhile, although I know you've been busy with some other unrelated issues.

- Tim
Comment 9 Doug Jacobsen 2016-02-11 07:54:50 MST
Hi Tim,

Yes, this is still an issue.  I started to extract the queries that sreport
was generating (they are huge), but have turned to other matters.  However
top users still does not function on edison.

I'll see if I can pull those up soon.

-Doug

On Thu, Feb 11, 2016 at 1:53 PM <bugs@schedmd.com> wrote:

> *Comment # 8 <http://bugs.schedmd.com/show_bug.cgi?id=2315#c8> on bug 2315
> <http://bugs.schedmd.com/show_bug.cgi?id=2315> from Tim Wickberg
> <tim@schedmd.com> *
>
> Hey Doug -
>
> Is this still an issue for you? I haven't seen anything on this in awhile,
> although I know you've been busy with some other unrelated issues.
>
> - Tim
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 10 Tim Wickberg 2016-03-04 01:08:46 MST
Doug - just checking back in.

(In reply to Doug Jacobsen from comment #9)
> Hi Tim,
> 
> Yes, this is still an issue.  I started to extract the queries that sreport
> was generating (they are huge), but have turned to other matters.  However
> top users still does not function on edison.
> 
> I'll see if I can pull those up soon.
> 
> -Doug
Comment 11 Tim Wickberg 2016-03-15 08:10:13 MDT
*poke*

I know we've got a handful of NERSC issues in progress, but wanted to check and see if this was still persisting...
Comment 12 Doug Jacobsen 2016-03-15 10:28:40 MDT
Good news: it started working.

Bad news: no idea why.

It does take a long time to run this query (though finishes after about a minute now (instead of failing after 15 in the past)

-Doug

gonna have to attach it, query is more than 64k
Comment 13 Doug Jacobsen 2016-03-15 10:52:46 MDT
Created attachment 2861 [details]
working query
Comment 14 Tim Wickberg 2016-03-18 08:07:10 MDT
I wonder if that somewhat large block of (t3.id_assoc==14713 || t3.id_assoc...) is causing the MariaDB query planner to choke. It may also explain the difference between cori and edison if edison is running with significantly more associations?

Do you mind timing that query, and then timing the mostly-equivalent (but missing the t3.id_assoc block) query of:

select t3.id_assoc, t1.id_tres, t1.time_start, t1.alloc_secs from "edison_assoc_usage_day_table" as t1, "edison_assoc_table" as t2, "edison_assoc_table" as t3 where (t1.time_start < 1458025200 && t1.time_start >= 1457938800) && t1.id=t2.id_assoc && t2.lft between t3.lft and t3.rgt order by t3.id_assoc, time_start ;

I'm guessing that second form is significantly faster, although it may return some records the first form did not.
Comment 15 Tim Wickberg 2016-04-26 05:54:36 MDT
Doug -

Is this still an active concern for you, or should I go ahead and close it for the time being?

- Tim
Comment 16 Tim Wickberg 2016-05-31 04:19:12 MDT
Doug - Never saw a reply last month, so I'm assuming this isn't an active concern. Please reopen if it's still an issue.