Hi, slurmdbd crashed with this error in log. Any way to fix this? [2014-12-08T00:45:04.530] error: mysql_query failed: 1205 Lock wait timeout exceeded; try restarting transaction update "london_last_ran_table" set hourly_rollup=1417786695, daily_rollup=1417786695, monthly_rollup=1417786695 [2014-12-08T00:45:04.530] fatal: mysql gave ER_LOCK_WAIT_TIMEOUT as an error. The only way to fix this is restart the calling program
As the message implies, the database wasn't responding for a long time, normally for 15 minutes. When this happens the only way to fix the issue is to restart the calling program, this is the reason for the fatal. I don't know if there is much we can do here since this is a mysql issue. It is highly advised you run the slurmdbd on top to database instead of the slurmctld. In addition to the other numerous advantages the slurmdbd offers it would prevent the slurmctld getting a fatal here. On December 7, 2014 7:39:20 PM PST, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1304 > > Site: DownUnder GeoSolutions > Bug ID: 1304 > Summary: slurmdbd crashed on mysql error ER_LOCK_WAIT_TIMEOUT > Product: SLURM > Version: 14.03.9 > Hardware: Linux > OS: Linux > Status: UNCONFIRMED > Severity: 2 - High Impact > Priority: --- > Component: slurmdbd daemon > Assignee: david@schedmd.com > Reporter: akmalm@dugeo.com > CC: brian@schedmd.com, da@schedmd.com, david@schedmd.com, > jette@schedmd.com > >Hi, >slurmdbd crashed with this error in log. Any way to fix this? > >[2014-12-08T00:45:04.530] error: mysql_query failed: 1205 Lock wait >timeout >exceeded; try restarting transaction >update "london_last_ran_table" set hourly_rollup=1417786695, >daily_rollup=1417786695, monthly_rollup=1417786695 >[2014-12-08T00:45:04.530] fatal: mysql gave ER_LOCK_WAIT_TIMEOUT as an >error. >The only way to fix this is restart the calling program > >-- >You are receiving this mail because: >You are on the CC list for the bug.
Actually we did use slurmdbd recently. slurmctld didnt crash but slurmdbd did crash. After restarting slurmdbd, it crashing again after a few minutes. Do you have any other suggestion?
That is good you have switched to the dbd. I am not sure what is happening to your database. Is it under a heavy load? Seems like something is different now than before since this is the first time you are seeing this. In my experience it is a very rare occurrence, usually from a query that was taking too long to complete, perhaps upping the debug to give an idea of what query is causing the problem. I know in 14.11 there was work done that sped up interactions with the database that may help. You might consider upgrading at least the slurmdbd node to 14.11 and see if that helps. Have you tried restarting your database? Is the database used for anything else? On December 7, 2014 9:05:18 PM PST, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1304 > >--- Comment #2 from Akmal Madzlan <akmalm@dugeo.com> --- >Actually we did use slurmdbd recently. slurmctld didnt crash but >slurmdbd did >crash. After restarting slurmdbd, it crashing again after a few >minutes. Do you >have any other suggestion? > >-- >You are receiving this mail because: >You are on the CC list for the bug.
Thanks for that suggestion. I'll try that
Akmal, please see if there was anything else locking up the database. A common cause of this is mysqldump running or something. Running mysqldump will sometimes cause all sorts of issues on a live database. If this is the situation please make sure you have the following mysqldump options... --single-transaction --quick --lock-tables=false Without them the database will lock up and you will get these kind of issues. I am hoping this is the case, or something like it was locking the tables messing things up.
Akmal, any more on this?
I've managed to get slurmdbd running without crashes. Actually this is the first time slurmdbd is used on that particular cluster, so on initial start, it try to do a lot of rollup and it might causing mysql to become very busy. So I let slurmdbd run first and let all the rollup finish before restarting slurmctld. It works fine now
Thanks for the update. Please reopen if necessary. David