Hello,SLURM Support Team! This is a continuation of SLURM Bug#.853. There were the following check requests from a customer. In order to add and reflect JobAcctGatherType=jobacct_gather/linux on a slurm.conf file, Slurm support are required a slurm demon's restart and slurmctrld restart. Is it possible to done the following processings without stopping a execution job? For example, Is the work which does not affect the job under execution by done in the following procedures possible? ■ I will done SLURM setting procedure at following action) ======================================================== 1. Add describe JobAcctGatherType=jobacct_gather/linux in Slurm.Conf file. 2. Service Slurmctld Restart done at management node. (Execution Job Should Not be Affected at this Time) 3. Slurmd restart(Service Slurm Restart) for Idle Node. (slurmd on the compute node is rebooted one by one for all node) ◎Is the stop of slurmdbd on mgmt unnecessary at this time? ◎isn't it necessary to perform the stop of slurmctld and the stop of slurmd simultaneously? If the point taking into consideration and other good work procedures occur, please let me know. Best Regards.. Toru Matsuoka
Hello,SLURM Support ! I am waiting for the reply to this inquiry. Best Regards.. Toru Matsuoka
The JobAcctGatherType determines the plugin to be used to collect resource usage information about user jobs, which are run by the slurmd daemon (on the compute nodes) and the slurmstepd daemon (which starts the application, one is started for each srun command). While the JobAcctGatherType configuration parameter is not used by the slurmctld daemon, we strongly recommend the same slurm.conf file be used on every node. If the slurmctld and slurmd daemons are running with different slurm.conf files, the slurmctld will report the error error: Node <name> appears to have a different slurm.conf than the slurmctld. This could cause issues with communication and functionality. Please review both files and make sure they are the same. If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf. The slurmdbd does not use the slurm.conf file and will not need to be restarted. You can change the slurm.conf file and restart the daemons in any order without losing any running or pending jobs, however the sstat program will fail until both the slurmd and slurmstepd are running with the desired JobAcctGatherType. Slurm version 14.03 works better with JobAcctGatherType=jobacct_gather/none (the sstat command just returns zeros).
Hello,Slurm Support Team! >The JobAcctGatherType determines the plugin to be used to collect resource >usage information about user jobs, which are run by the slurmd daemon (on the >compute nodes) and the slurmstepd daemon (which starts the application, one is >started for each srun command). →Thank you.I understood it. >While the JobAcctGatherType configuration parameter is not used by the >slurmctld daemon, we strongly recommend the same slurm.conf file be used on >every node. If the slurmctld and slurmd daemons are running with different >slurm.conf files, the slurmctld will report the error >error: Node <name> appears to have a different slurm.conf than the slurmctld. >This could cause issues with communication and functionality. Please review >both files and make sure they are the same. If this is expected ignore, and >set DebugFlags=NO_CONF_HASH in your slurm.conf. →The Customer can same slurm.conf file.Thus,It look likes a no problem state. The slurmdbd does not use the slurm.conf file and will not need to be restarted. →Thank you. I understood it. You can change the slurm.conf file and restart the daemons in any order without losing any running or pending jobs, however the sstat program will fail until both the slurmd and slurmstepd are running with the desired JobAcctGatherType. →In Case,Can JobAcctGatherType parameter JobAcctGatherType=jobacct_gather/linux? And Are both the slurmd and slurmstepd always running? >Slurm version 14.03 works better with JobAcctGatherType=jobacct_gather/none (the sstat command just returns zeros). → We can not Slurm version up from 2.6.2 to 14.03. If Slurm version is 2.6.2, is there any problem concern with JobAcctGatherType parameter? Best Regards.. Toru Matsuoka
(In reply to toru matsuoka from comment #3) > You can change the slurm.conf file and restart the daemons in any order > without losing any running or pending jobs, however the sstat program will > fail until both the slurmd and slurmstepd are running with the desired > JobAcctGatherType. > > →In Case,Can JobAcctGatherType parameter > JobAcctGatherType=jobacct_gather/linux? If you want to collect accounting information about jobs on a linux cluster then JobAcctGatherType=jobacct_gather/linux must be set. > And Are both the slurmd and slurmstepd always running? The slurmd should always be running on every compute node. A slurmstepd is running whenever a job step is running on the compute node. One slurmstepd for each job step > >Slurm version 14.03 works better with JobAcctGatherType=jobacct_gather/none (the sstat command just returns zeros). > > → We can not Slurm version up from 2.6.2 to 14.03. > If Slurm version is 2.6.2, is there any problem concern with > JobAcctGatherType parameter? The only problem with Slurm version 2.6 is the sstat errors when the JobAcctGatherType value for sstat is different than what the slurmd is running with. You should at least consider upgrading from version 2.6.2 to 2.6.8. Version 2.6.2 is known to contain about 100 bugs that were fixed in later releases of version 2.6. There will be no loss of jobs or command changes when upgrading, only bug fixes. > Best Regards.. > Toru Matsuoka
Hello,Slurm Support Team! I understood about this contents. I want plan version up from Slurm 2.6.2 to 2.6.8. If enable , Please teach me how to simple Slurm update procedure from 2.6.2 to 2.6.8. Best Regards.. Toru Matsuoka
(In reply to toru matsuoka from comment #5) > Hello,Slurm Support Team! > > I understood about this contents. > > I want plan version up from Slurm 2.6.2 to 2.6.8. > > If enable , Please teach me how to simple Slurm update procedure from 2.6.2 > to 2.6.8. Slurm is upgraded the same way as any other Linux package. Just install the new RPMs and restart daeemons. There is some more information here: http://slurm.schedmd.com/quickstart_admin.html#upgrade
Hello,Slurm Supoort Team! Thanks for slurm supoort. I verified at this URL. I understood about Slurm version up procedure. But, Under the customer's situation, while the upgrade to 2.6.8 from 2.6.2 is for a while, it is difficult. I would like to ask you for support, when 2.6.2 is done and sstat does not use. Best Regards.. Toru Matsuoka
(In reply to toru matsuoka from comment #7) > Hello,Slurm Supoort Team! > > Thanks for slurm supoort. > > I verified at this URL. > > I understood about Slurm version up procedure. > > But, > Under the customer's situation, while the upgrade to > 2.6.8 from 2.6.2 is for a while, it is difficult. > > I would like to ask you for support, > when 2.6.2 is done and sstat does not use. > > Best Regards.. > Toru Matsuoka Then just change the configuration to collect accounting information. Set: JobAcctGatherType=jobacct_gather/linux
Please open a new ticket if you need more information
It understood. Please close this case.