Ticket 3229

Summary:	JobCompType content for jobcomp/filetxt vs slurmdbd
Product:	Slurm	Reporter:	Brian Haymore <brian.haymore>
Component:	Accounting	Assignee:	Moe Jette <jette>
Status:	RESOLVED FIXED	QA Contact:
Severity:	5 - Enhancement
Priority:	---	CC:	alex
Version:	15.08.10
Hardware:	Linux
OS:	Linux
Site:	University of Utah	Alineos Sites:	---
Atos/Eviden Sites:	---	Confidential Site:	---
Coreweave sites:	---	Cray Sites:	---
DS9 clusters:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Linux Distro:	---
Machine Name:		CLE Version:
Version Fixed:	17.11.0-pre1	Target Release:	---
DevPrio:	4 - Medium	Emory-Cloud Sites:	---

Description Brian Haymore 2016-11-01 12:48:52 MDT

I'm not sure if David Richardson has yet opened a ticket with you all on some issues we are having with slurmdbd records, but I have a side question to ask that relates.

Let me first give a brief basis for this. We have been using slurmdbd to capture cluster usage for both the purpose of seeding XDMOD as well as for allocation enforcement. We have ALSO been capturing job completion records to the text file that slurm offers via JobCompLoc and JobCompType=jobcomp/filetxt. I left the filetxt option live vs none as it was a "second opinion" if we ever needed to compare slurmdbd records against something else.

We have now run into 3 or so events where something happens between slurmctld and slurmdbd where communication stops and we get syslog messages about a queue filling up and to restart slurmdbd. One event was from a reservationname that had some special character in it that broke things. Another was from having a reservation on nodes that were set to be retired. We retired them out of the slurm.conf file before releasing the reservation and that also got things stuck.

The result of these is that we can see that data about some jobs that ran in incomplete, flat out wrong, or just missing. Though it's not fully uniform from what we can see. So we have been leaning on this text file as a place to "trust" when things look out of place.

So now to the question that this ticket is about... We have noticed that as we setup GRES for GPU support that these bits were not landing in the text file from the JobCompType=jobcomp/filetxt setup. After looking more at the slurm job record from slurmdbd we see there are other fields not landing in the text file. I've goon looking in the man pages to see if I can control what fields land in the text file but I'm not finding things. Is there a way to setup slurm so that all the fields that are in slurmdbd's job records are also in the text file as well?

Thanks.

Comment 1 Alejandro Sanchez 2016-11-02 07:12:57 MDT

Hi Brian. The jobcomp/filetxt plugin currently records these fields:

JobId, UserId=name(uid), GroupId=name(gid), Name, JobState, Partition, TimeLimit, StartTime, EndTime, NodeList, NodeCnt, ProcCnt, WorkDir.

Other jobcomp plugins (i.e. elasticsearch) records up to 36+ fields, including job_record->gres_req and job_record->gres_alloc which you mentioned to be interested in to be recorded.

I can change the bug to a sev-5 (enchancement request) for this petition to widen the jobcomp/filetxt number of fields. Unfortunately our development schedule is pretty full at the moment, and absent an external patch or customer wanting to sponsor some of this priority work I can't say how soon we'd be able to get to it. (If that's something University of Utah may be interested in pursuing let me know and we can have that discussion separately.)

Comment 2 Brian Haymore 2016-11-02 12:34:34 MDT

So help me understand a bit.  Can I just change my setting from jobcmp/filetxt to elasticsearch and have all 36 of those fields land in the same text file I have now?  I'm happy to change to a feature request status if there is reason to do so to accomplish this, but if there is an existing knob that does it I'm happy to explore that to.  Guide me on this please.

--
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112
Phone: 801-558-1150, Fax: 801-585-5366
http://bit.ly/1HO1N2C
________________________________
From: bugs@schedmd.com [bugs@schedmd.com]
Sent: Wednesday, November 02, 2016 7:12 AM
To: Brian Haymore
Subject: [Bug 3229] JobCompType content for jobcomp/filetxt vs slurmdbd

Alejandro Sanchez<redir.aspx?REF=dGcATzYQ2pPgDLcWHHwf_-JPTDa3SEXDYjQ-ari_RzKiiulMTgPUCAFtYWlsdG86YWxleEBzY2hlZG1kLmNvbQ..> changed bug 3229<redir.aspx?REF=zIC9EIjS5rpybjewYcjVHCcSTqXTlF4ogBh9tjtJkqzJsOlMTgPUCAFodHRwczovL2J1Z3Muc2NoZWRtZC5jb20vc2hvd19idWcuY2dpP2lkPTMyMjk.>
What    Removed Added
CC              alex@schedmd.com
Assignee        support@schedmd.com     alex@schedmd.com

Comment # 1<redir.aspx?REF=0CJbeahu-47uIBKSFIocrD6CAZjL73T30v3O2xQVMfDJsOlMTgPUCAFodHRwczovL2J1Z3Muc2NoZWRtZC5jb20vc2hvd19idWcuY2dpP2lkPTMyMjkjYzE.> on bug 3229<redir.aspx?REF=zIC9EIjS5rpybjewYcjVHCcSTqXTlF4ogBh9tjtJkqzJsOlMTgPUCAFodHRwczovL2J1Z3Muc2NoZWRtZC5jb20vc2hvd19idWcuY2dpP2lkPTMyMjk.> from Alejandro Sanchez<redir.aspx?REF=10dt2X-4wccnNslTjFrXfqNyr1xiyokQ2BCe7unXq_zw1ulMTgPUCAFtYWlsdG86YWxleEBzY2hlZG1kLmNvbQ..>

Hi Brian. The jobcomp/filetxt plugin currently records these fields:

JobId, UserId=name(uid), GroupId=name(gid), Name, JobState, Partition,
TimeLimit, StartTime, EndTime, NodeList, NodeCnt, ProcCnt, WorkDir.

Other jobcomp plugins (i.e. elasticsearch) records up to 36+ fields, including
job_record->gres_req and job_record->gres_alloc which you mentioned to be
interested in to be recorded.

I can change the bug to a sev-5 (enchancement request) for this petition to
widen the jobcomp/filetxt number of fields. Unfortunately our development
schedule is pretty full at the moment, and absent an external patch or customer
wanting to sponsor some of this priority work I can't say how soon we'd be able
to get to it. (If that's something University of Utah may be interested in
pursuing let me know and we can have that discussion separately.)

________________________________
You are receiving this mail because:

  *   You reported the bug.

Comment 3 Alejandro Sanchez 2016-11-02 13:28:36 MDT

(In reply to Brian Haymore from comment #2)
> So help me understand a bit.  Can I just change my setting from
> jobcmp/filetxt to elasticsearch and have all 36 of those fields land in the
> same text file I have now?  

No, if you change JobCompType value from jobcomp/filetxt to jobcomp/elasticsearch, the job completion information will be indexed into the elasticsearch server[1] address and port specified by JobCompLoc parameter.

> I'm happy to change to a feature request status if there is reason to do so to
> accomplish this, but if there is an existing knob that does it I'm happy to
> explore that to.  Guide me on this please.

So if the accounting done by slurmdbd is not enough, and you want to make use of a parallel Job Completion Plugin besides the slurmdbd, there are currently 4 JobCompType plugins available: filetxt, mysql, script and elasticsearch. As I said, the filetxt plugin only stores the fields mentioned in my previous comment. The elasticsearch plugin stores 36+ fields but instead of doing so in a file, it indexes the records in an Elasticsearch server. Anyhow, for most of the customers, doing the accounting with the slurmdbd is more than enough. Also, we've done some tests where slurmdbd scales better than the JobCompType plugins in heavy HTC environments. The elasticsearch plugin stores these 38 fields:

jobid, username, user_id, groupname, group_id, @start, @end, elapsed, partition, alloc_node, nodes, total_cpus, total_nodes, derived_exitcode, exitcode, state, cpu_hours, array_job_id, array_task_id, @submit, queue_wait, work_dir, std_err, std_in, std_out, cluster, qos, ntasks, ntasks_per_node, cpus_per_task, orig_dependency, excluded_nodes, time_limit, reservation_name, gres_req, gres_alloc, account, script, parent_accounts.

Note: parent_accounts is the account hierarchy from the job account up to the root in the format /root/accountA/subaccountB/.../subaccountC.

[1] https://www.elastic.co/products/elasticsearch

Comment 4 Brian Haymore 2016-11-02 13:45:38 MDT

OK that helps me understand better.   At the root of this is that our current setup using slurmdbd has proven to be susceptible things that end up loosing partial or full job records.  So the interest in extending the jobcomp/filetxt is much to give us a fall back plan that we can reconstruct missing info in slurmdbd.  I'm still not sure if David Richardson has yet opened the ticket with you all on the front of the issues we have observed with slurmdbd yet to see if there are already measures we can take there to improve things so right now it's hard to give an overall opinion.  I guess my feeling would be to request that we flag this on as a feature extension request understanding your current queue and load means this is a ways out before it will be looked at.  Then between here and then as we look at what David has or will report we can see if we should rethink the feature request.  How does that sound to you?

--
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112
Phone: 801-558-1150, Fax: 801-585-5366
http://bit.ly/1HO1N2C
________________________________
From: bugs@schedmd.com [bugs@schedmd.com]
Sent: Wednesday, November 02, 2016 1:28 PM
To: Brian Haymore
Subject: [Bug 3229] JobCompType content for jobcomp/filetxt vs slurmdbd


Comment # 3<redir.aspx?REF=1xafrk5fBrl2wzJp4t9HIGwF_nl31lI0whajmK5OpbvrcYdWVwPUCAFodHRwczovL2J1Z3Muc2NoZWRtZC5jb20vc2hvd19idWcuY2dpP2lkPTMyMjkjYzM.> on bug 3229<redir.aspx?REF=wDvT8Z1cENMOPXmIhDUobAuYXZiiEix-2TH8oKZYKHrrcYdWVwPUCAFodHRwczovL2J1Z3Muc2NoZWRtZC5jb20vc2hvd19idWcuY2dpP2lkPTMyMjk.> from Alejandro Sanchez<redir.aspx?REF=SVwqKulFJkhA1j5sAluwkulSa-JHy4xFo-SucFMusBgRmIdWVwPUCAFtYWlsdG86YWxleEBzY2hlZG1kLmNvbQ..>

(In reply to Brian Haymore from comment #2<redir.aspx?REF=S-E5eHNL-UvtUnOs4ZzMsyZIsIbkxuTgWaqjWk7-vbkRmIdWVwPUCAFodHRwczovL2J1Z3Muc2NoZWRtZC5jb20vc2hvd19idWcuY2dpP2lkPTMyMjkjYzI.>)
> So help me understand a bit.  Can I just change my setting from
> jobcmp/filetxt to elasticsearch and have all 36 of those fields land in the
> same text file I have now?

No, if you change JobCompType value from jobcomp/filetxt to
jobcomp/elasticsearch, the job completion information will be indexed into the
elasticsearch server[1] address and port specified by JobCompLoc parameter.

> I'm happy to change to a feature request status if there is reason to do so to
> accomplish this, but if there is an existing knob that does it I'm happy to
> explore that to.  Guide me on this please.

So if the accounting done by slurmdbd is not enough, and you want to make use
of a parallel Job Completion Plugin besides the slurmdbd, there are currently 4
JobCompType plugins available: filetxt, mysql, script and elasticsearch. As I
said, the filetxt plugin only stores the fields mentioned in my previous
comment. The elasticsearch plugin stores 36+ fields but instead of doing so in
a file, it indexes the records in an Elasticsearch server. Anyhow, for most of
the customers, doing the accounting with the slurmdbd is more than enough.
Also, we've done some tests where slurmdbd scales better than the JobCompType
plugins in heavy HTC environments. The elasticsearch plugin stores these 38
fields:

jobid, username, user_id, groupname, group_id, @start, @end, elapsed,
partition, alloc_node, nodes, total_cpus, total_nodes, derived_exitcode,
exitcode, state, cpu_hours, array_job_id, array_task_id, @submit, queue_wait,
work_dir, std_err, std_in, std_out, cluster, qos, ntasks, ntasks_per_node,
cpus_per_task, orig_dependency, excluded_nodes, time_limit, reservation_name,
gres_req, gres_alloc, account, script, parent_accounts.

Note: parent_accounts is the account hierarchy from the job account up to the
root in the format /root/accountA/subaccountB/.../subaccountC.

[1] https://www.elastic.co/products/elasticsearch<redir.aspx?REF=4t4IJhEGWf0eweNpI27JXKSTfWOOeR2dpVAWolinKiwRmIdWVwPUCAFodHRwczovL3d3dy5lbGFzdGljLmNvL3Byb2R1Y3RzL2VsYXN0aWNzZWFyY2g.>

________________________________
You are receiving this mail because:

  *   You reported the bug.

Comment 5 Alejandro Sanchez 2016-11-03 04:33:13 MDT

(In reply to Brian Haymore from comment #4)
> OK that helps me understand better.   At the root of this is that our
> current setup using slurmdbd has proven to be susceptible things that end up
> loosing partial or full job records.  So the interest in extending the
> jobcomp/filetxt is much to give us a fall back plan that we can reconstruct
> missing info in slurmdbd.  I'm still not sure if David Richardson has yet
> opened the ticket with you all on the front of the issues we have observed
> with slurmdbd yet to see if there are already measures we can take there to
> improve things so right now it's hard to give an overall opinion.  I guess
> my feeling would be to request that we flag this on as a feature extension
> request understanding your current queue and load means this is a ways out
> before it will be looked at.  Then between here and then as we look at what
> David has or will report we can see if we should rethink the feature
> request.  How does that sound to you?

I see 4 bugs from David Richardson related to slurmdbd:

Bug 2602 - resolved/infogiven.
Bug 2828 - resolved/fixed.
Bug 2888 - unconfirmed (we're working on it)
Bug 2889 - resolved/infogiven.

Don't know if you are experiencing more issues but in any case just open a new bug for that. It sounds good to me, gonna mark this bug as a sev-5. I'd also encourage you to upgrade to the latest 16.05 version, a lot of issues where fixed since 15.08.

Comment 7 Moe Jette 2017-02-24 13:50:58 MST

I've added quite a few fields to the jobcomp/filetxt plugin. It's not every job field, but it seem to be pretty much everything I would imagine would be useful. Here's a list of added fields:

ArrayJobId, ArrayTaskId, ReservationName, Gres, Account, QOS, WcKey, Cluster, SubmitTime, EligibleTime, DerivedExitCode and ExitCode.

Here's the commit with the change:
https://github.com/SchedMD/slurm/commit/41f2c4745929b83e8ef3d3fe577d266aa5b81b0f

This will be available in Slurm version 17.11, but should apply cleanly as a patch to version 17.02 if desired.