2889 – Cleaning up errors in slurmdbd

Ticket 2889 - Cleaning up errors in slurmdbd

Summary: Cleaning up errors in slurmdbd

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmdbd (show other tickets)
Version:	15.08.10
Hardware:	Linux Linux

Importance:	--- 4 - Minor Issue
Assignee:	Tim Wickberg
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2016-07-11 15:52 MDT by David Richardson
Modified:	2016-07-12 08:11 MDT (History)
CC List:	0 users

See Also:
Site:	University of Utah
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description David Richardson 2016-07-11 15:52:12 MDT

Hi,

Last month, we hit a bug with a single quote character in the reservation name, which made slurmctld unable to talk to slurmdbd. That bug has been fixed (2828). 

We use xdmod to visualize what's happening on our clusters. During the time when we were affected by 2828, many jobs started, ran, and completed. We've
found gaps in our graphs, and have traced this to several of the fields in sacct about those jobs are incorrect:
* Partition is blank
* User's group name and gid are set to root/0
* Eligible time is unknown
* Assigned node and cpu counts are 0 (even though nodelist is accurate)
* Requested cpu count is 0.

We know what needs to be done to correct those errors:
* Partition we can extrapolate from account
* Group name/gid can be discovered from user name
* Eligible time we are willing to assume is the same as submission time (fairly accurate, and xdmod barfs if it's left as unknown)
* We can calculate how many nodes and cores were assigned from the nodelist
* We are willing to assume the user requested all the cpus he was assigned (it's not completely accurate, but more so than leaving it set to 0).

The real question I have is how to get those corrections in to slurmdbd. Is there a command to update the record, or will I need to update the records in the database directly? (That's acceptable, but I'll likely need some guidance on where exactly those fields are stored).

Thanks,
DR

Comment 1 Tim Wickberg 2016-07-11 16:09:17 MDT

(In reply to David Richardson from comment #0)
> The real question I have is how to get those corrections in to slurmdbd. Is
> there a command to update the record, or will I need to update the records
> in the database directly? (That's acceptable, but I'll likely need some
> guidance on where exactly those fields are stored).

It's going to have to be done manually in the database unfortunately.

$CLUSTER_job_table is going to have everything of interest here.

To get a quick feel for the format:
MariaDB [slurmdbd_1508]> describe zoidberg_job_table;

and

MariaDB [slurmdbd_1508]> select * from zoidberg_job_table limit 10;

will probably show you most of the relevant fields.

> * Partition is blank

One warning, "partition" is a keyword in some versions of MySQL, you may need to refer to the column as `partition` (with backticks).

> * User's group name and gid are set to root/0

id_user and id_group are the uid / gid respectively. Usernames are kept elsewhere and shouldn't need to be modified.

> * Eligible time is unknown
> * Assigned node and cpu counts are 0 (even though nodelist is accurate)
> * Requested cpu count is 0.

Two fields to be careful with are tres_alloc and tres_req. You may need to adjust these depending on how the field looks for the affected jobs.

As an example, "1=1,2=1000,4=1" is 1 cpu ("1="), 1000 MB of memory ("2="), and 1 node ("4="). You might be able to get away with copying the tres_req field into tres_alloc for some jobs, or may need to rebuild the tres_alloc field manually.

Let me know if I can give you any other pointers, but I'm hoping that's enough to point you in the right direction.

- Tim

Comment 2 David Richardson 2016-07-11 16:22:55 MDT

Thanks for the explanation, Tim.

It sounds like it's not too dangerous to modify the table (and like there's not a lot of weird interactions between tables).

I'll just be sure to make a backup of the records I'm changing before I start turning knobs and pushing buttons. :)

Thanks,
DR

Comment 3 Tim Wickberg 2016-07-12 08:11:30 MDT

(In reply to David Richardson from comment #2)
> Thanks for the explanation, Tim.
> 
> It sounds like it's not too dangerous to modify the table (and like there's
> not a lot of weird interactions between tables).

Job records shouldn't impact anything else. (Modifications to reservations or the association structures can cause issues.)

If you want to play it particularly safe, you can always shut down slurmdbd while making these modifications - slurmctld will cache records (up to a reasonable limit - 2*MaxJobCount) and send them over when slurmdbd is started back up.

> I'll just be sure to make a backup of the records I'm changing before I
> start turning knobs and pushing buttons. :)

Always a good idea. :) 

Marking closed, please reopen if there's anything else I can answer.

- Tim