OS: CentOS 6.6 (Final) MariaDB 10.0.14 slurm 14.03.10 Extrac from slurm log: Nov 6 01:43:56 testsched slurmctld[6325]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (33, 70284, ' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (33, 70284, 'buterbkl', 36014, 'accre', "folding8core.slurm", 3, 2, 'accre_test',
Could you please send the entire error line? Would it be possible to up the debug to debug3 and send the log around the error? On November 7, 2014 6:19:37 AM PST, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1246 > > Site: -Other- > Bug ID: 1246 > Summary: Failure to insert job data > Product: SLURM > Version: 14.03.10 > Hardware: Linux > OS: Linux > Status: UNCONFIRMED > Severity: 6 - No support contract > Priority: --- > Component: Database > Assignee: david@schedmd.com > Reporter: charles.johnson@accre.vanderbilt.edu > CC: brian@schedmd.com, da@schedmd.com, david@schedmd.com, > jette@schedmd.com > >OS: CentOS 6.6 (Final) >MariaDB 10.0.14 >slurm 14.03.10 > >Extrac from slurm log: > >Nov 6 01:43:56 testsched slurmctld[6325]: error: mysql_query failed: >1064 You >have an error in your SQL syntax; check the manual that corresponds to >your >MariaDB server version for the right syntax to use near 'partition, >timelimit, >starttime, endtime, nodecnt, nodelist) values (33, 70284, ' at line >1#012insert >into jobcomp_table (jobid, uid, user_name, gid, group_name, name, >state, >proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) >values >(33, 70284, 'buterbkl', 36014, 'accre', "folding8core.slurm", 3, 2, >'accre_test', > >-- >You are receiving this mail because: >You are on the CC list for the bug.
Debug level upped to 5 on both slurmctl and slurmdbd. slurmctl log, full report: Nov 7 09:59:43 testsched slurmctld[3095]: sched: _slurm_rpc_step_complete StepId=46.0 usec=229 Nov 7 09:59:43 testsched slurmctld[3095]: completing job 46 status 0 Nov 7 09:59:43 testsched slurmctld[3095]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=46 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0 Nov 7 09:59:43 testsched slurmctld[3095]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (46, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (46, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 1, 'accre_test', Nov 7 09:59:43 testsched slurmctld[3095]: sched: job_complete for JobId=46 successful, exit code=0 slurmdbd log: [2014-11-07T09:53:58.191] Terminate signal (SIGINT or SIGTERM) received [2014-11-07T09:54:08.304] auth plugin for Munge (http://code.google.com/p/munge/) loaded [2014-11-07T09:54:08.431] Accounting storage MYSQL plugin loaded [2014-11-07T09:54:08.434] error: chdir(/var/log): Permission denied [2014-11-07T09:54:08.434] chdir to /var/tmp [2014-11-07T09:54:08.437] slurmdbd version 14.03.10 started [2014-11-07T09:54:08.439] debug: DBD_INIT: CLUSTER:accre_test VERSION:6912 UID:59229 IP:127.0.0.1 CONN:7 [2014-11-07T09:54:20.389] debug: cluster accre_test has disconnected [2014-11-07T09:54:20.545] debug: DBD_INIT: CLUSTER:accre_test VERSION:6912 UID:59229 IP:127.0.0.1 CONN:8
Please up it to 7 (debug3) and resend, thanks.
If the slurmctld is writing to mysql then you aren't using the slurmdbd for that particular cluster. I don't think this is what you are expecting. Check your slurm.conf file to make sure you have AccountingStorageType=accounting_storage/slurmdbd Only the slurmdbd.conf should have mysql in it.
From /var/log/messages: Nov 7 10:21:10 testsched slurmctld[10373]: slurmctld version 14.03.10 started on cluster accre_test Nov 7 10:21:10 testsched slurmctld[10373]: Munge cryptographic signature plugin loaded Nov 7 10:21:10 testsched slurmctld[10373]: Consumable Resources (CR) Node Selection plugin loaded with argument 276 Nov 7 10:21:10 testsched slurmctld[10373]: preempt/none loaded Nov 7 10:21:10 testsched slurmctld[10373]: Checkpoint plugin loaded: checkpoint/none Nov 7 10:21:10 testsched slurmctld[10373]: AcctGatherEnergy NONE plugin loaded Nov 7 10:21:10 testsched slurmctld[10373]: AcctGatherProfile NONE plugin loaded Nov 7 10:21:10 testsched slurmctld[10373]: AcctGatherInfiniband NONE plugin loaded Nov 7 10:21:10 testsched slurmctld[10373]: AcctGatherFilesystem NONE plugin loaded Nov 7 10:21:10 testsched slurmctld[10373]: Job accounting gather LINUX plugin loaded Nov 7 10:21:10 testsched slurmctld[10373]: ExtSensors NONE plugin loaded Nov 7 10:21:10 testsched slurmctld[10373]: switch NONE plugin loaded Nov 7 10:21:10 testsched slurmctld[10373]: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) Nov 7 10:21:10 testsched slurmctld[10373]: auth plugin for Munge (http://code.google.com/p/munge/) loaded Nov 7 10:21:10 testsched slurmctld[10373]: slurmdbd: recovered 0 pending RPCs Nov 7 10:21:10 testsched rsyslogd-2177: imuxsock begins to drop messages from pid 10373 due to rate-limiting Nov 7 10:21:19 testsched rsyslogd-2177: imuxsock lost 134 messages from pid 10373 due to rate-limiting Nov 7 10:21:58 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=47 usec=800 Nov 7 10:21:58 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=47 Name=myslurmtest.slurm Began, Queued time 00:00:00 Nov 7 10:21:58 testsched slurmctld[10373]: sched: Allocate JobId=47 NodeList=testdellnode1 #CPUs=2 Nov 7 10:21:58 testsched slurmctld[10373]: sched: _slurm_rpc_job_step_create: StepId=47.0 testdellnode1 usec=786 Nov 7 10:21:59 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=48 usec=796 Nov 7 10:22:00 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=49 usec=627 Nov 7 10:22:01 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=48 Name=myslurmtest.slurm Began, Queued time 00:00:02 Nov 7 10:22:01 testsched slurmctld[10373]: sched: Allocate JobId=48 NodeList=testdellnode1 #CPUs=2 Nov 7 10:22:01 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=49 Name=myslurmtest.slurm Began, Queued time 00:00:01 Nov 7 10:22:01 testsched slurmctld[10373]: sched: Allocate JobId=49 NodeList=testdellnode1 #CPUs=2 Nov 7 10:22:01 testsched rsyslogd-2177: imuxsock begins to drop messages from pid 10373 due to rate-limiting Nov 7 10:22:03 testsched rsyslogd-2177: imuxsock lost 102 messages from pid 10373 due to rate-limiting Nov 7 10:22:03 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=52 usec=580 Nov 7 10:22:04 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=50 Name=myslurmtest.slurm Began, Queued time 00:00:03 Nov 7 10:22:04 testsched slurmctld[10373]: sched: Allocate JobId=50 NodeList=testintnode1 #CPUs=2 Nov 7 10:22:04 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=51 Name=myslurmtest.slurm Began, Queued time 00:00:02 Nov 7 10:22:04 testsched slurmctld[10373]: sched: Allocate JobId=51 NodeList=testintnode1 #CPUs=2 Nov 7 10:22:04 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=52 Name=myslurmtest.slurm Began, Queued time 00:00:01 Nov 7 10:22:04 testsched slurmctld[10373]: sched: Allocate JobId=52 NodeList=testintnode1 #CPUs=2 Nov 7 10:22:04 testsched slurmctld[10373]: sched: _slurm_rpc_job_step_create: StepId=50.0 testintnode1 usec=1034 Nov 7 10:22:04 testsched slurmctld[10373]: sched: _slurm_rpc_job_step_create: StepId=51.0 testintnode1 usec=1452 Nov 7 10:22:04 testsched slurmctld[10373]: sched: _slurm_rpc_job_step_create: StepId=52.0 testintnode1 usec=760 Nov 7 10:22:04 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=53 usec=1363 Nov 7 10:22:05 testsched rsyslogd-2177: imuxsock begins to drop messages from pid 10373 due to rate-limiting Nov 7 10:22:10 testsched rsyslogd-2177: imuxsock lost 100 messages from pid 10373 due to rate-limiting Nov 7 10:22:28 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=47.0 usec=213 Nov 7 10:22:28 testsched slurmctld[10373]: completing job 47 status 0 Nov 7 10:22:28 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=47 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0 Nov 7 10:22:28 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (47, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (47, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test', Nov 7 10:22:28 testsched slurmctld[10373]: sched: job_complete for JobId=47 successful, exit code=0 Nov 7 10:22:31 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=48.0 usec=158 Nov 7 10:22:31 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=49.0 usec=193 Nov 7 10:22:31 testsched slurmctld[10373]: completing job 48 status 0 Nov 7 10:22:31 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=48 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0 Nov 7 10:22:31 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (48, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (48, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test', Nov 7 10:22:31 testsched slurmctld[10373]: sched: job_complete for JobId=48 successful, exit code=0 Nov 7 10:22:31 testsched slurmctld[10373]: completing job 49 status 0 Nov 7 10:22:31 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=49 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0 Nov 7 10:22:31 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (49, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (49, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test', Nov 7 10:22:31 testsched slurmctld[10373]: sched: job_complete for JobId=49 successful, exit code=0 Nov 7 10:22:34 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=50.0 usec=150 Nov 7 10:22:34 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=51.0 usec=184 Nov 7 10:22:34 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=52.0 usec=183 Nov 7 10:22:34 testsched slurmctld[10373]: completing job 50 status 0 Nov 7 10:22:34 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=50 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0 Nov 7 10:22:34 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (50, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (50, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test', Nov 7 10:22:34 testsched slurmctld[10373]: sched: job_complete for JobId=50 successful, exit code=0 Nov 7 10:22:34 testsched slurmctld[10373]: completing job 51 status 0 Nov 7 10:22:34 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=51 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0 Nov 7 10:22:34 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (51, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (51, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test', Nov 7 10:22:34 testsched slurmctld[10373]: sched: job_complete for JobId=51 successful, exit code=0 Nov 7 10:22:34 testsched slurmctld[10373]: completing job 52 status 0 Nov 7 10:22:34 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=52 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0 Nov 7 10:22:34 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (52, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (52, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test', Nov 7 10:22:34 testsched slurmctld[10373]: sched: job_complete for JobId=52 successful, exit code=0 Nov 7 10:22:38 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=53.0 usec=152 Nov 7 10:22:38 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=54.0 usec=185 Nov 7 10:22:38 testsched slurmctld[10373]: completing job 53 status 0 Nov 7 10:22:38 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=53 Name=myslurmtest.slurm Ended, Run time 00:00:31, COMPLETED, ExitCode 0 Nov 7 10:22:38 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (53, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (53, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 1, 'accre_test', Nov 7 10:22:38 testsched slurmctld[10373]: sched: job_complete for JobId=53 successful, exit code=0 Nov 7 10:22:38 testsched slurmctld[10373]: completing job 54 status 0 Nov 7 10:22:38 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=54 Name=myslurmtest.slurm Ended, Run time 00:00:31, COMPLETED, ExitCode 0 Nov 7 10:22:38 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (54, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (54, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 1, 'accre_test', Nov 7 10:22:38 testsched slurmctld[10373]: sched: job_complete for JobId=54 successful, exit code=0
We are using mariaDB, and not mysql. ~Charles~
Will you set your slurmctld loglevel to debug3, or level 7, and rerun your test?
The debug levels were "7" as requested with the last post. From the slurm.conf file: SlurmctldDebug=7 #SlurmctldLogFile= SlurmdDebug=7 From the slurmdbd.conf file: DbdHost=testsched DebugLevel=7 PurgeEventAfter=1month This is the whole extract from /var/log/messages: Nov 7 12:32:16 testsched slurmctld[13224]: _slurm_rpc_submit_batch_job JobId=55 usec=761 Nov 7 12:32:17 testsched slurmctld[13224]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=55 Name=myslurmtest.slurm Began, Queued time 00:00:01 Nov 7 12:32:17 testsched slurmctld[13224]: sched: Allocate JobId=55 NodeList=testdellnode1 #CPUs=2 Nov 7 12:32:17 testsched slurmctld[13224]: sched: _slurm_rpc_job_step_create: StepId=55.0 testdellnode1 usec=967 Nov 7 12:32:47 testsched slurmctld[13224]: sched: _slurm_rpc_step_complete StepId=55.0 usec=208 Nov 7 12:32:47 testsched slurmctld[13224]: completing job 55 status 0 Nov 7 12:32:47 testsched slurmctld[13224]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=55 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0 Nov 7 12:32:47 testsched slurmctld[13224]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (55, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (55, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test', Nov 7 12:32:47 testsched slurmctld[13224]: sched: job_complete for JobId=55 successful, exit code=0
We found that MariaDB was treating "partition" as a keyword. This is fixed in the following commit: https://github.com/SchedMD/slurm/commit/75f4951157401ad08bac47a3397d591b2f327d37 Thanks, Brian