Summary: | slurmd/systemctl can't open pid file | ||
---|---|---|---|
Product: | Slurm | Reporter: | dl_support-hpc |
Component: | slurmd | Assignee: | Tim McMullan <mcmullan> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | isaid |
Version: | 19.05.2 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: |
https://bugs.schedmd.com/show_bug.cgi?id=10625 https://bugs.schedmd.com/show_bug.cgi?id=15965 |
||
Site: | WEHI | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | 20.11.0pre1 | Target Release: | --- |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
dl_support-hpc
2020-01-23 16:17:33 MST
Hi Laszlo, This is happening because we create the PID file slightly after systemd tries to read it. Commands where systemd needs to know the PID (eg systemctl restart slurmd.service) it will re-read the file (which appears to be getting created properly). From a functional standpoint, this error shouldn't have any impact on systemd or slurm. That said, I'm testing some changes to how we launch slurmd et al in unit files that should fix this problem. Is the error reported by systemd causing you any issues? Thank you! --Tim (McMullan) Hi Tim, I had not observed functional problems at all, if you can fix the timing or find a work around solution that would be great. Cheers, Laszlo Hey Laszlo, The quickest workaround you could use is to just comment out "PIDFile=*" line in the unit file and do a daemon-reload. instead of reading the pid file we write out, it will "guess" the main pid (and in my tests does so correctly). I'm still testing a more proper solution and will keep you posted here on it! Thanks! --Tim We've landed a change to the suggested unit files that runs the slurm daemons in the foreground for systemd. This should work for 19.05 as well, but is currently set for 20.11. I think for now though, just running with "PIDFile=" commented out in the unit file should be fine! Thanks! --Tim Thanks Tim, I will try your suggested workaround Hi! Just wanted to check in and make sure this was working for you! Thanks! --Tim I'm closing this for now since the patch has landed. Feel free to re-open if you still have the same problem, or open a new ticket if a new issue arises! Thanks! --Tim Tested with 20.02 and the problem is still there. Sorry about that, I should have mentioned it in the comment! The change we decided to make to the suggested unit files was different enough that we chose to put it in 20.11 but not 20.02. For now, I would suggest continuing to run without the "PIDFile=" line or switching to the style suggested for 20.11. Thanks! --Tim |