Summary: | SIGUSR2 signal has no impact on logging to job completion log | ||
---|---|---|---|
Product: | Slurm | Reporter: | Jim Long <jlong1s> |
Component: | slurmctld | Assignee: | Oriol Vilarrubi <jvilarru> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | alex, hwj_0201, nate, stuart.barkley |
Version: | 20.02.3 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | NCSA | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | 22.05 | Target Release: | --- |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Jim Long
2020-06-22 10:33:40 MDT
Thank you for reporting this. I have looked over the code and it seems this has never worked. I have a patch proposal to review with our team. I will let you know the outcome once we look over these code changes. Per another site's request to avoid confusion, we'd like to note that the issue in this ticket relates to storing accounting information into text files when that has been configured and slurmctld receives a SIGUSR2 (log rotation signal). There is not a problem with slurm daemons rotating their logs (slurmctld.log, slurmdbd.log and slurmd.log) when they receive a SIGUSR2. *** Ticket 10528 has been marked as a duplicate of this ticket. *** Hi Jim, We've fixed this in the master branch, so this will ship with slurm 22.05. I've also measured how much time does it take to handle the SIGUSR2 signal sent by logrotate to slurm, in case of the controller, which is the one that potentially has more log files to handle, it took in the worst test I did 130 microseconds. I've also did a test in which I was sending jobs continuously with an script and at the same time doing logrotates, and not a single job information was lost. I agree with you that delaycompress would be the safest route to go, but that also makes the disk space used by the logs to increase heavily. As said in the documentation this is just a logrotate configuration sample and we encourage to make all the site modifications as needed. Greetings. Resolving as fixed. commit 6355f4cebea36a1285bf59e30abd556521970282 Author: Alejandro Sanchez <alex@schedmd.com> Date: Wed Feb 23 17:05:33 2022 +0100 jobcomp/elasticsearch - protect log_url access with location_mutex. Bug 9264 commit e2793bc14404ac5642c4e45665fb038b09db4849 Author: Alejandro Sanchez <alex@schedmd.com> Date: Wed Feb 23 16:58:48 2022 +0100 jobcomp/elasticsearch - fix log_url potential memory leak. If jobcomp_p_set_location() is called more than once, log_url needs to be freed before xstrdup()'ing again to it. Bug 9264 commit a512ffc99a427630f02c65cbb3bcde9069aeb2fa Author: Oriol Vilarrubi <jvilarru@schedmd.com> Date: Fri Feb 18 16:26:27 2022 +0100 Make slurmctld call the JobCompPlugin set location operation on SIGUSR2. A relevant consequence is that the filetxt plugin reopens the file. Bug 9264 commit 9b92cb3f9b94d41723951dc41fee624a929d646e Author: Oriol Vilarrubi <jvilarru@schedmd.com> Date: Tue Nov 16 11:57:22 2021 +0100 common/slurm_jobcomp - add jobcomp_g_set_location() function. This will be useful to be able to perform the jobcomp_p_set_location() plugin operation independently of jobcomp_g_init(). Bug 9264 |