Ticket 1246 - Failure to insert job data
Summary: Failure to insert job data
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Database (show other tickets)
Version: 14.03.10
Hardware: Linux Linux
: --- 6 - No support contract
Assignee: Brian Christiansen
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-11-07 00:19 MST by Charles Johnson
Modified: 2014-11-07 05:50 MST (History)
2 users (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 14.03.11 14.11.0 15.08.0pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Charles Johnson 2014-11-07 00:19:37 MST
OS: CentOS 6.6 (Final)
MariaDB 10.0.14
slurm 14.03.10

Extrac from slurm log:

Nov  6 01:43:56 testsched slurmctld[6325]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (33, 70284, ' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (33, 70284, 'buterbkl', 36014, 'accre', "folding8core.slurm", 3, 2, 'accre_test',
Comment 1 Danny Auble 2014-11-07 00:58:58 MST
Could you please send the entire error line?  Would it be possible to up the debug to debug3 and send the log around the error? 

On November 7, 2014 6:19:37 AM PST, bugs@schedmd.com wrote:
>http://bugs.schedmd.com/show_bug.cgi?id=1246
>
>              Site: -Other-
>            Bug ID: 1246
>           Summary: Failure to insert job data
>           Product: SLURM
>           Version: 14.03.10
>          Hardware: Linux
>                OS: Linux
>            Status: UNCONFIRMED
>          Severity: 6 - No support contract
>          Priority: ---
>         Component: Database
>          Assignee: david@schedmd.com
>          Reporter: charles.johnson@accre.vanderbilt.edu
>              CC: brian@schedmd.com, da@schedmd.com, david@schedmd.com,
>                    jette@schedmd.com
>
>OS: CentOS 6.6 (Final)
>MariaDB 10.0.14
>slurm 14.03.10
>
>Extrac from slurm log:
>
>Nov  6 01:43:56 testsched slurmctld[6325]: error: mysql_query failed:
>1064 You
>have an error in your SQL syntax; check the manual that corresponds to
>your
>MariaDB server version for the right syntax to use near 'partition,
>timelimit,
>starttime, endtime, nodecnt, nodelist) values (33, 70284, ' at line
>1#012insert
>into jobcomp_table (jobid, uid, user_name, gid, group_name, name,
>state,
>proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist)
>values
>(33, 70284, 'buterbkl', 36014, 'accre', "folding8core.slurm", 3, 2,
>'accre_test',
>
>-- 
>You are receiving this mail because:
>You are on the CC list for the bug.
Comment 2 Charles Johnson 2014-11-07 02:07:37 MST
Debug level upped to 5 on both slurmctl and slurmdbd.

slurmctl log, full report:

Nov  7 09:59:43 testsched slurmctld[3095]: sched: _slurm_rpc_step_complete StepId=46.0 usec=229
Nov  7 09:59:43 testsched slurmctld[3095]: completing job 46 status 0
Nov  7 09:59:43 testsched slurmctld[3095]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=46 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0
Nov  7 09:59:43 testsched slurmctld[3095]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (46, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (46, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 1, 'accre_test',
Nov  7 09:59:43 testsched slurmctld[3095]: sched: job_complete for JobId=46 successful, exit code=0

slurmdbd log:

[2014-11-07T09:53:58.191] Terminate signal (SIGINT or SIGTERM) received
[2014-11-07T09:54:08.304] auth plugin for Munge (http://code.google.com/p/munge/) loaded
[2014-11-07T09:54:08.431] Accounting storage MYSQL plugin loaded
[2014-11-07T09:54:08.434] error: chdir(/var/log): Permission denied
[2014-11-07T09:54:08.434] chdir to /var/tmp
[2014-11-07T09:54:08.437] slurmdbd version 14.03.10 started
[2014-11-07T09:54:08.439] debug:  DBD_INIT: CLUSTER:accre_test VERSION:6912 UID:59229 IP:127.0.0.1 CONN:7
[2014-11-07T09:54:20.389] debug:  cluster accre_test has disconnected
[2014-11-07T09:54:20.545] debug:  DBD_INIT: CLUSTER:accre_test VERSION:6912 UID:59229 IP:127.0.0.1 CONN:8
Comment 3 Danny Auble 2014-11-07 02:09:26 MST
Please up it to 7 (debug3) and resend, thanks.
Comment 4 Danny Auble 2014-11-07 02:13:52 MST
If the slurmctld is writing to mysql then you aren't using the slurmdbd for that particular cluster.

I don't think this is what you are expecting.

Check your slurm.conf file to make sure you have

AccountingStorageType=accounting_storage/slurmdbd

Only the slurmdbd.conf should have mysql in it.
Comment 5 Charles Johnson 2014-11-07 02:27:29 MST
From /var/log/messages:

Nov  7 10:21:10 testsched slurmctld[10373]: slurmctld version 14.03.10 started on cluster accre_test
Nov  7 10:21:10 testsched slurmctld[10373]: Munge cryptographic signature plugin loaded
Nov  7 10:21:10 testsched slurmctld[10373]: Consumable Resources (CR) Node Selection plugin loaded with argument 276
Nov  7 10:21:10 testsched slurmctld[10373]: preempt/none loaded
Nov  7 10:21:10 testsched slurmctld[10373]: Checkpoint plugin loaded: checkpoint/none
Nov  7 10:21:10 testsched slurmctld[10373]: AcctGatherEnergy NONE plugin loaded
Nov  7 10:21:10 testsched slurmctld[10373]: AcctGatherProfile NONE plugin loaded
Nov  7 10:21:10 testsched slurmctld[10373]: AcctGatherInfiniband NONE plugin loaded
Nov  7 10:21:10 testsched slurmctld[10373]: AcctGatherFilesystem NONE plugin loaded
Nov  7 10:21:10 testsched slurmctld[10373]: Job accounting gather LINUX plugin loaded
Nov  7 10:21:10 testsched slurmctld[10373]: ExtSensors NONE plugin loaded
Nov  7 10:21:10 testsched slurmctld[10373]: switch NONE plugin loaded
Nov  7 10:21:10 testsched slurmctld[10373]: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
Nov  7 10:21:10 testsched slurmctld[10373]: auth plugin for Munge (http://code.google.com/p/munge/) loaded
Nov  7 10:21:10 testsched slurmctld[10373]: slurmdbd: recovered 0 pending RPCs
Nov  7 10:21:10 testsched rsyslogd-2177: imuxsock begins to drop messages from pid 10373 due to rate-limiting
Nov  7 10:21:19 testsched rsyslogd-2177: imuxsock lost 134 messages from pid 10373 due to rate-limiting
Nov  7 10:21:58 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=47 usec=800
Nov  7 10:21:58 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=47 Name=myslurmtest.slurm Began, Queued time 00:00:00
Nov  7 10:21:58 testsched slurmctld[10373]: sched: Allocate JobId=47 NodeList=testdellnode1 #CPUs=2
Nov  7 10:21:58 testsched slurmctld[10373]: sched: _slurm_rpc_job_step_create: StepId=47.0 testdellnode1 usec=786
Nov  7 10:21:59 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=48 usec=796
Nov  7 10:22:00 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=49 usec=627
Nov  7 10:22:01 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=48 Name=myslurmtest.slurm Began, Queued time 00:00:02
Nov  7 10:22:01 testsched slurmctld[10373]: sched: Allocate JobId=48 NodeList=testdellnode1 #CPUs=2
Nov  7 10:22:01 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=49 Name=myslurmtest.slurm Began, Queued time 00:00:01
Nov  7 10:22:01 testsched slurmctld[10373]: sched: Allocate JobId=49 NodeList=testdellnode1 #CPUs=2
Nov  7 10:22:01 testsched rsyslogd-2177: imuxsock begins to drop messages from pid 10373 due to rate-limiting
Nov  7 10:22:03 testsched rsyslogd-2177: imuxsock lost 102 messages from pid 10373 due to rate-limiting
Nov  7 10:22:03 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=52 usec=580
Nov  7 10:22:04 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=50 Name=myslurmtest.slurm Began, Queued time 00:00:03
Nov  7 10:22:04 testsched slurmctld[10373]: sched: Allocate JobId=50 NodeList=testintnode1 #CPUs=2
Nov  7 10:22:04 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=51 Name=myslurmtest.slurm Began, Queued time 00:00:02
Nov  7 10:22:04 testsched slurmctld[10373]: sched: Allocate JobId=51 NodeList=testintnode1 #CPUs=2
Nov  7 10:22:04 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=52 Name=myslurmtest.slurm Began, Queued time 00:00:01
Nov  7 10:22:04 testsched slurmctld[10373]: sched: Allocate JobId=52 NodeList=testintnode1 #CPUs=2
Nov  7 10:22:04 testsched slurmctld[10373]: sched: _slurm_rpc_job_step_create: StepId=50.0 testintnode1 usec=1034
Nov  7 10:22:04 testsched slurmctld[10373]: sched: _slurm_rpc_job_step_create: StepId=51.0 testintnode1 usec=1452
Nov  7 10:22:04 testsched slurmctld[10373]: sched: _slurm_rpc_job_step_create: StepId=52.0 testintnode1 usec=760
Nov  7 10:22:04 testsched slurmctld[10373]: _slurm_rpc_submit_batch_job JobId=53 usec=1363
Nov  7 10:22:05 testsched rsyslogd-2177: imuxsock begins to drop messages from pid 10373 due to rate-limiting
Nov  7 10:22:10 testsched rsyslogd-2177: imuxsock lost 100 messages from pid 10373 due to rate-limiting
Nov  7 10:22:28 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=47.0 usec=213
Nov  7 10:22:28 testsched slurmctld[10373]: completing job 47 status 0
Nov  7 10:22:28 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=47 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0
Nov  7 10:22:28 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (47, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (47, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test',
Nov  7 10:22:28 testsched slurmctld[10373]: sched: job_complete for JobId=47 successful, exit code=0
Nov  7 10:22:31 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=48.0 usec=158
Nov  7 10:22:31 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=49.0 usec=193
Nov  7 10:22:31 testsched slurmctld[10373]: completing job 48 status 0
Nov  7 10:22:31 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=48 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0
Nov  7 10:22:31 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (48, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (48, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test',
Nov  7 10:22:31 testsched slurmctld[10373]: sched: job_complete for JobId=48 successful, exit code=0
Nov  7 10:22:31 testsched slurmctld[10373]: completing job 49 status 0
Nov  7 10:22:31 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=49 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0
Nov  7 10:22:31 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (49, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (49, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test',
Nov  7 10:22:31 testsched slurmctld[10373]: sched: job_complete for JobId=49 successful, exit code=0
Nov  7 10:22:34 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=50.0 usec=150
Nov  7 10:22:34 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=51.0 usec=184
Nov  7 10:22:34 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=52.0 usec=183
Nov  7 10:22:34 testsched slurmctld[10373]: completing job 50 status 0
Nov  7 10:22:34 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=50 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0
Nov  7 10:22:34 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (50, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (50, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test',
Nov  7 10:22:34 testsched slurmctld[10373]: sched: job_complete for JobId=50 successful, exit code=0
Nov  7 10:22:34 testsched slurmctld[10373]: completing job 51 status 0
Nov  7 10:22:34 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=51 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0
Nov  7 10:22:34 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (51, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (51, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test',
Nov  7 10:22:34 testsched slurmctld[10373]: sched: job_complete for JobId=51 successful, exit code=0
Nov  7 10:22:34 testsched slurmctld[10373]: completing job 52 status 0
Nov  7 10:22:34 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=52 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0
Nov  7 10:22:34 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (52, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (52, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test',
Nov  7 10:22:34 testsched slurmctld[10373]: sched: job_complete for JobId=52 successful, exit code=0
Nov  7 10:22:38 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=53.0 usec=152
Nov  7 10:22:38 testsched slurmctld[10373]: sched: _slurm_rpc_step_complete StepId=54.0 usec=185
Nov  7 10:22:38 testsched slurmctld[10373]: completing job 53 status 0
Nov  7 10:22:38 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=53 Name=myslurmtest.slurm Ended, Run time 00:00:31, COMPLETED, ExitCode 0
Nov  7 10:22:38 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (53, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (53, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 1, 'accre_test',
Nov  7 10:22:38 testsched slurmctld[10373]: sched: job_complete for JobId=53 successful, exit code=0
Nov  7 10:22:38 testsched slurmctld[10373]: completing job 54 status 0
Nov  7 10:22:38 testsched slurmctld[10373]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=54 Name=myslurmtest.slurm Ended, Run time 00:00:31, COMPLETED, ExitCode 0
Nov  7 10:22:38 testsched slurmctld[10373]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (54, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (54, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 1, 'accre_test',
Nov  7 10:22:38 testsched slurmctld[10373]: sched: job_complete for JobId=54 successful, exit code=0
Comment 6 Charles Johnson 2014-11-07 02:42:34 MST
We are using mariaDB, and not mysql.

~Charles~
Comment 7 Brian Christiansen 2014-11-07 04:11:46 MST
Will you set your slurmctld loglevel to debug3, or level 7, and rerun your test?
Comment 8 Charles Johnson 2014-11-07 04:36:19 MST
The debug levels were "7" as requested with the last post. From the slurm.conf file:

SlurmctldDebug=7
#SlurmctldLogFile=
SlurmdDebug=7

From the slurmdbd.conf file:

DbdHost=testsched
DebugLevel=7
PurgeEventAfter=1month

This is the whole extract from /var/log/messages:

Nov  7 12:32:16 testsched slurmctld[13224]: _slurm_rpc_submit_batch_job JobId=55 usec=761
Nov  7 12:32:17 testsched slurmctld[13224]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=55 Name=myslurmtest.slurm Began, Queued time 00:00:01
Nov  7 12:32:17 testsched slurmctld[13224]: sched: Allocate JobId=55 NodeList=testdellnode1 #CPUs=2
Nov  7 12:32:17 testsched slurmctld[13224]: sched: _slurm_rpc_job_step_create: StepId=55.0 testdellnode1 usec=967
Nov  7 12:32:47 testsched slurmctld[13224]: sched: _slurm_rpc_step_complete StepId=55.0 usec=208
Nov  7 12:32:47 testsched slurmctld[13224]: completing job 55 status 0
Nov  7 12:32:47 testsched slurmctld[13224]: email msg to charles.johnson@accre.vanderbilt.edu: SLURM Job_id=55 Name=myslurmtest.slurm Ended, Run time 00:00:30, COMPLETED, ExitCode 0
Nov  7 12:32:47 testsched slurmctld[13224]: error: mysql_query failed: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'partition, timelimit, starttime, endtime, nodecnt, nodelist) values (55, 110009,' at line 1#012insert into jobcomp_table (jobid, uid, user_name, gid, group_name, name, state, proc_cnt, partition, timelimit, starttime, endtime, nodecnt, nodelist) values (55, 110009, 'johns276', 36014, 'accre', "myslurmtest.slurm", 3, 2, 'accre_test',
Nov  7 12:32:47 testsched slurmctld[13224]: sched: job_complete for JobId=55 successful, exit code=0
Comment 9 Brian Christiansen 2014-11-07 05:50:57 MST
We found that MariaDB was treating "partition" as a keyword. This is fixed in the following commit:
https://github.com/SchedMD/slurm/commit/75f4951157401ad08bac47a3397d591b2f327d37

Thanks,
Brian