Summary: | Some scripts using Perl API (e.g. qstat) hang reading config in configless environments | ||
---|---|---|---|
Product: | Slurm | Reporter: | Troy Baer <troy> |
Component: | Configuration | Assignee: | Marcin Stolarek <cinek> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | Ole.H.Nielsen, tdockendorf |
Version: | 20.02.2 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Ohio State OSC | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | RHEL |
Machine Name: | CLE Version: | ||
Version Fixed: | 20.02.6 20.11pre1 | Target Release: | --- |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Attachments: | v2 |
Description
Troy Baer
2020-07-02 12:05:52 MDT
Troy, I reproduced the issue and prepared a patch that I'm passing to our QA now. The patch introduces changes in contribs - perlapi and torque/qstat,qalter perl scripts. I can share it with you before the review if you want to give it a try, knowing that it's not yet scheduled for release. cheers, Marcin Thanks. We are not yet in production with Slurm and in any case this isn't a critical bug, so I'm OK with waiting on QA. Any updates on this? We changed to Configless Slurm 20.02.4 yesterday, and now a number of users are complaining that the qstat command (from the slurm-torque RPM for CentOS 7) has stopped working, as was described by Troy. We would appreciate it if Marcin's patch could be included in the upcoming 20.02.5 release! Thanks, Ole *** Bug 9804 has been marked as a duplicate of this bug. *** Ole, Do you want to apply the patch locally - before QA completion? cheers, Marcin (In reply to Marcin Stolarek from comment #11) > Ole, > > Do you want to apply the patch locally - before QA completion? The easy workaround is to restore the /etc/slurm/ directory and thus avoid the Configless configuration for the time being. To test the mentioned patch, how does one go about it? Thanks, Ole OSC would be able to test any patches as we have test systems available we can utilize. Comment on attachment 14990 [details]
v2
Making the patch public. If you can run it and verify in your environment feedback is always appreciated.
As mentioned before, the patch didn't pass SchedMD QA and is not yet scheduled for release.
cheers,
Marcin
> As mentioned before, the patch didn't pass SchedMD QA and is not yet scheduled for release.
Can you elaborate on why the patch didn't pass QA?
I applied the patch to our test environment and verified that past work arounds for config-less to work with Torque wrapper like qstat are not needed. The first time I ran qstat there was a slight delay of about 3-4 seconds when only 1 job was in the queues, but subsequent executions seemed fine. I tried to make our login nodes configless (bug 9832) and removed the /etc/slurm directory, but immediately I got user complaints that the qstat command is broken as reported above. For some reason the users must use qstat in their automated scripts. It would be really great if priority could be given to getting the patch in this bug report included in the next Slurm release. Thanks a lot, Ole Trey, Ole, The issue should be fixed in Slurm 20.02.6 by the following commits: >commit c888ee827d179f9e54c09c4be8f282cf886e2c11 >Author: Marcin Stolarek <cinek@schedmd.com> >AuthorDate: Mon Jul 13 11:51:38 2020 +0000 > > Perl API - call slurm_conf_init(NULL) before any API calls > > Add slurm_conf_init() in BOOT: section. > > Bug 9330. > >commit 34163061104cc2dec4d5e4371359d9af2fb38afb >Author: Marcin Stolarek <cinek@schedmd.com> >AuthorDate: Fri Jul 3 14:08:53 2020 +0000 > > Perl API - use slurm_conf_init() not slurm_conf_reinit in Slurm::new() > > slurm_conf_reinit() ends with call to _init_slurm_conf(), which should only > be used internaly. External tools should call slurm_conf_init to > correctly establish configuration source. > > Bug 9330. that were merged into our public repository. cheers, Marcin Hi Marcin, Thanks very much for the patch! I'm looking forward to 20.02.6! Best regards, Ole (In reply to Marcin Stolarek from comment #19) > Trey, > Ole, > > The issue should be fixed in Slurm 20.02.6 by the following commits: > >commit c888ee827d179f9e54c09c4be8f282cf886e2c11 > >Author: Marcin Stolarek <cinek@schedmd.com> > >AuthorDate: Mon Jul 13 11:51:38 2020 +0000 > > > > Perl API - call slurm_conf_init(NULL) before any API calls > > > > Add slurm_conf_init() in BOOT: section. > > > > Bug 9330. > > > >commit 34163061104cc2dec4d5e4371359d9af2fb38afb > >Author: Marcin Stolarek <cinek@schedmd.com> > >AuthorDate: Fri Jul 3 14:08:53 2020 +0000 > > > > Perl API - use slurm_conf_init() not slurm_conf_reinit in Slurm::new() > > > > slurm_conf_reinit() ends with call to _init_slurm_conf(), which should only > > be used internaly. External tools should call slurm_conf_init to > > correctly establish configuration source. > > > > Bug 9330. > > that were merged into our public repository. > > cheers, > Marcin |