Ticket 2828 - slurmdbd mysql duplicate entry
Summary: slurmdbd mysql duplicate entry
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Database (show other tickets)
Version: 15.08.10
Hardware: Linux Linux
: --- 3 - Medium Impact
Assignee: Danny Auble
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-06-14 09:47 MDT by David Richardson
Modified: 2016-06-14 15:35 MDT (History)
0 users

See Also:
Site: University of Utah
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 15.08.13 16.05.1 17.02.0-pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description David Richardson 2016-06-14 09:47:20 MDT
Hi,

I'm running slurm 15.08.10. I have a single slurmdbd host that is used by four different clusters, each with their own slurmctld. Three of the clusters work fine. The fourth doesn't record job info to slurmdbd, although jobs continue to start, run, and exit.

On the slurmctld host, I see the errors:
Jun  8 12:11:23 kprm slurmctld[3239]: error: slurmdbd: agent queue filling, RESTART SLURMDBD NOW
Jun  8 12:11:23 kprm slurmctld[3239]: *** RESTART SLURMDBD NOW ***

On the slurmdbd host, I see: 
[2016-06-07T11:37:20.754] error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'fan speed config error' rs', 'kp184', '181', 1496857040, 1465321040, '1=20') on ' at line 1
insert into "kingspeak_resv_table" (id_resv, assoclist, flags, resv_name, nodelist, node_inx, time_end, time_start, tres) values (175, '9763,9762', 32833, 'kp184 'fan speed config error' rs', 'kp184', '181', 1496857040, 1465321040, '1=20') on duplicate key update deleted=0, assoclist='9763,9762', flags=32833, resv_name='kp184 'fan speed config error' rs', nodelist='kp184', node_inx='181', time_end=1496857040, time_start=1465321040, tres='1=20';

Both these errors have repeated hundreds of times. Restarting the dbd and the ctld have not resolved the problem. It looks like there's a duplicate entry in one of the DB tables (explaining why the problems are limited to one cluster), but don't know how it got there or how to resolve it.

Your help is greatly appreciated.
Thanks,
DR
Comment 1 David Richardson 2016-06-14 10:08:48 MDT
Looking more closely at the database, it appears that this may not be a duplicate entry SQL error, but because of the single quote in the reservation message:
resv_name='kp184 'fan speed config error' rs'

So, perhaps a better question would be how to tell ctld to stop trying to record that reservation in the database (or modify the name) without losing the other records it has queued up.

Thanks,
DR
Comment 2 Danny Auble 2016-06-14 10:20:42 MDT
David, I can recreate this situation.  Your analysis is correct, the single quote in the name of the reservation is what is halting the addition. I just added commit de52855461d which fixes the issue.  

All you should have to do is apply the patch rebuild the plugin and restart the slurmdbd and the database will continue on.  It doesn't appear any data will be lost or anything else is required.

Please let me know if this fixes your issue or not.
Comment 3 David Richardson 2016-06-14 15:28:49 MDT
Thanks, Danny! It looks good.

I was able to get the patch you made (my git skills are pretty weak, so it took longer than it should have, but that's my problem). It applied and compiled cleanly. After restarting the dbd, I see no more errors on it or the ctld, and my sacct commands return data now.

I think we can call this one done!
Thanks,
DR
Comment 4 Danny Auble 2016-06-14 15:33:40 MDT
No problem David, glad it worked out! I'm sure your got skills will improve with time.  It was super confusing for me at first but I think it's the best now :).
Comment 5 Danny Auble 2016-06-14 15:35:36 MDT
Git skills ;)