Summary: | Cleaning up errors in slurmdbd | ||
---|---|---|---|
Product: | Slurm | Reporter: | David Richardson <david.richardson> |
Component: | slurmdbd | Assignee: | Tim Wickberg <tim> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 15.08.10 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | University of Utah | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
David Richardson
2016-07-11 15:52:12 MDT
(In reply to David Richardson from comment #0) > The real question I have is how to get those corrections in to slurmdbd. Is > there a command to update the record, or will I need to update the records > in the database directly? (That's acceptable, but I'll likely need some > guidance on where exactly those fields are stored). It's going to have to be done manually in the database unfortunately. $CLUSTER_job_table is going to have everything of interest here. To get a quick feel for the format: MariaDB [slurmdbd_1508]> describe zoidberg_job_table; and MariaDB [slurmdbd_1508]> select * from zoidberg_job_table limit 10; will probably show you most of the relevant fields. > * Partition is blank One warning, "partition" is a keyword in some versions of MySQL, you may need to refer to the column as `partition` (with backticks). > * User's group name and gid are set to root/0 id_user and id_group are the uid / gid respectively. Usernames are kept elsewhere and shouldn't need to be modified. > * Eligible time is unknown > * Assigned node and cpu counts are 0 (even though nodelist is accurate) > * Requested cpu count is 0. Two fields to be careful with are tres_alloc and tres_req. You may need to adjust these depending on how the field looks for the affected jobs. As an example, "1=1,2=1000,4=1" is 1 cpu ("1="), 1000 MB of memory ("2="), and 1 node ("4="). You might be able to get away with copying the tres_req field into tres_alloc for some jobs, or may need to rebuild the tres_alloc field manually. Let me know if I can give you any other pointers, but I'm hoping that's enough to point you in the right direction. - Tim Thanks for the explanation, Tim. It sounds like it's not too dangerous to modify the table (and like there's not a lot of weird interactions between tables). I'll just be sure to make a backup of the records I'm changing before I start turning knobs and pushing buttons. :) Thanks, DR (In reply to David Richardson from comment #2) > Thanks for the explanation, Tim. > > It sounds like it's not too dangerous to modify the table (and like there's > not a lot of weird interactions between tables). Job records shouldn't impact anything else. (Modifications to reservations or the association structures can cause issues.) If you want to play it particularly safe, you can always shut down slurmdbd while making these modifications - slurmctld will cache records (up to a reasonable limit - 2*MaxJobCount) and send them over when slurmdbd is started back up. > I'll just be sure to make a backup of the records I'm changing before I > start turning knobs and pushing buttons. :) Always a good idea. :) Marking closed, please reopen if there's anything else I can answer. - Tim |