Bug 17704 - How to configure acct_gather_energy/IPMI for nodes without IPMI DCMI extensions?
Summary: How to configure acct_gather_energy/IPMI for nodes without IPMI DCMI extensions?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other bugs)
Version: 23.02.4
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Oscar Hernández
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-09-16 03:59 MDT by Ole.H.Nielsen@fysik.dtu.dk
Modified: 2024-03-22 03:38 MDT (History)
2 users (show)

See Also:
Site: DTU Physics
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ole.H.Nielsen@fysik.dtu.dk 2023-09-16 03:59:06 MDT
The acct_gather_energy/IPMI documentation[1] specifies how the EnergyIPMIPowerSensors=Node=DCMI may be configured.  The corresponding freeipmi command displays compute node DCMI power readings as expected, for example:

$ ipmi-dcmi --get-system-power-statistics
Current Power                        : 2429 Watts
Minimum Power over sampling duration : 342 watts
Maximum Power over sampling duration : 2925 watts
Average Power over sampling duration : 1752 watts
Time Stamp                           : 09/16/2023 - 09:33:15
Statistics reporting time period     : 1926185000 milliseconds
Power Measurement                    : Active

However, not all BMCs support the IPMI DCMI extensions.  We have some Huawei/Xfusion nodes which report an error in stead:

$ ipmi-dcmi --get-system-power-statistics
ipmi_cmd_dcmi_get_power_reading: command invalid or unsupported

Before enabling EnergyIPMIPowerSensors=Node=DCMI into our production system, I would like to ask if slurmd on such nodes are going to handle DCMI errors gracefully, or potentially exit with a fatal error?  I don't fully understand the source code in src/plugins/acct_gather_energy/ipmi/acct_gather_energy_ipmi.c

Next, will it be possible to have a fallback acct_gather_energy plugin such as the RAPL plugin so at least CPU+DIMM power readings will be reported in stead of any invalid zero values from DCMI?  I guess the question is whether a list of AcctGatherEnergyType plugins is possible?

Thanks,
Ole

[1] https://slurm.schedmd.com/acct_gather.conf.html#SECTION_acct_gather_energy/IPMI
Comment 2 Oscar Hernández 2023-09-21 03:38:03 MDT
Ole,

> Before enabling EnergyIPMIPowerSensors=Node=DCMI into our production system,
> I would like to ask if slurmd on such nodes are going to handle DCMI errors
> gracefully, or potentially exit with a fatal error?
Looking through the code, the only fatals I found are more related with wrong configuration of this plugins. All of them will fatal (terminating slurmd) during initialization. These are:

1 - Trying to load ipmi (AcctGatherEnergyType=acct_gather_energy/ipmi) without the library compiled (acct_gather_energy_ipmi.so).

2 - Having a malformed EnergyIPMIPowerSensors in etc/acct_gather.conf

3 - having a negative value for EnergyIPMIFrequency in etc/acct_gather.conf

All the other errors, will print their corresponding error log, and will have as a consequence readings with 0 or n/s value. Running it in my local machine, for example:

[2023-09-19T17:15:42.179] error: _get_dcmi_power_reading: get DCMI power reading failed

And node displayed:

CurrentWatts=0 AveWatts=0

> Next, will it be possible to have a fallback acct_gather_energy plugin such
> as the RAPL plugin so at least CPU+DIMM power readings will be reported in
> stead of any invalid zero values from DCMI?  I guess the question is whether
> a list of AcctGatherEnergyType plugins is possible?
This is not currently supported. Only one plugin should be configured for AcctGatherEnergyType.

I have tested setting them both, in a comma separated list. And slurm will not complain, will simply load both at the same time. But I expect this to have unpredictable results, having them both overwriting same data.

The alternative I see to handle your case, would be to set a different etc/acct_gather.conf file for each node. This way, you will be able to configure EnergyIPMIPowerSensors matching the nodes capabilities. To use RAPL though, you would also need to have different slurm.conf for each node.

You might already be using that, but in case you have all configs centralized under a shared directory, you could do a symlink like:

/nfs/etc/slurm/acct_gather.conf -> /local/etc/slurm/acct_gather.conf

But that would imply to maintain a copy of acct_gather.conf in each node (it is not something that should change frequently).

Let me know your thoughts about it,

Kind regards,
Oscar
Comment 3 Ole.H.Nielsen@fysik.dtu.dk 2023-09-21 06:32:47 MDT
Hi Oscar,

Thanks a lot for your detailed answer!  I have a first comment (more later):

(In reply to Oscar Hernández from comment #2)
> > Before enabling EnergyIPMIPowerSensors=Node=DCMI into our production system,
> > I would like to ask if slurmd on such nodes are going to handle DCMI errors
> > gracefully, or potentially exit with a fatal error?
> Looking through the code, the only fatals I found are more related with
> wrong configuration of this plugins. All of them will fatal (terminating
> slurmd) during initialization. These are:
> 
> 1 - Trying to load ipmi (AcctGatherEnergyType=acct_gather_energy/ipmi)
> without the library compiled (acct_gather_energy_ipmi.so).

It seems that if the file acct_gather.conf exists (we use configless), and even if we do NOT yet use AcctGatherEnergyType=acct_gather_energy/ipmi, the slurmd's will read acct_gather.conf and crash on our system :-(  The reason is that the slurmd's from our RPM build host didn't yet have the precondition freeipmi-devel RPM package installed - please refer to bug 17706.  We're going to install updated Slurm RPM packages on all compute nodes next Tuesday, so this issue should go away.

My testing discovered that even though we have set AcctGatherEnergyType=acct_gather_energy/rapl (i.e., RAPL and NOT IPMI) in slurm.conf, this caused all slurmd's to crash with error messages:

[2023-09-21T11:52:50.010] error: _parse_next_key: Parsing error at unrecognized key: EnergyIPMIPowerSensors
[2023-09-21T11:52:50.010] error: _parse_next_key: Parsing error at unrecognized key: EnergyIPMIFrequency
[2023-09-21T11:52:50.010] error: _parse_next_key: Parsing error at unrecognized key: EnergyIPMICalcAdjustment
[2023-09-21T11:52:50.010] fatal: Could not open/read/parse acct_gather.conf file /var/spool/slurmd/conf-cache/acct_gather.conf.  Many times this is because you have defined options for plugins that are not loaded.  Please check your slurm.conf file and make sure the plugins for the options listed are loaded.

My /etc/slurm/acct_gather.conf file has this content:

EnergyIPMIPowerSensors=Node=DCMI
EnergyIPMIFrequency=60
EnergyIPMICalcAdjustment=yes

So the Slurm build host REALLY must have the freeipmi-devel RPM package installed!  

Furthermore, I'm guessing that also the slurmd nodes must have the freeipmi-devel RPM package installed.  What I'm seeing is that /usr/lib64/libfreeipmi.so (a soft-link) is only installed by the freeipmi-devel RPM, whereas the actual library file is part of the freeipmi RPM:

$ rpm -ql freeipmi | grep lib64/libfree
/usr/lib64/libfreeipmi.so.17
/usr/lib64/libfreeipmi.so.17.1.4

$ rpm -ql freeipmi-devel | grep lib64/libfree
/usr/lib64/libfreeipmi.so

$ ls -l /usr/lib64/libfreeipmi.so
lrwxrwxrwx. 1 root root 21 Sep 18 13:34 /usr/lib64/libfreeipmi.so -> libfreeipmi.so.17.1.4

When I want to "yum update" the slurm* RPM packages, only the freeipmi and not the freeipmi-devel RPM dependency gets installed:

================================================================================
 Package      Arch   Version        Repository                             Size
================================================================================
Updating:
 auto_tmpdir  x86_64 1.0.2-23.02.5.el7
                                    /auto_tmpdir-1.0.2-23.02.5.el7.x86_64  35 k
 slurm        x86_64 23.02.5-1.el7  /slurm-23.02.5-1.el7.x86_64            75 M
 slurm-contribs
              x86_64 23.02.5-1.el7  /slurm-contribs-23.02.5-1.el7.x86_64   32 k
 slurm-devel  x86_64 23.02.5-1.el7  /slurm-devel-23.02.5-1.el7.x86_64     372 k
 slurm-pam_slurm
              x86_64 23.02.5-1.el7  /slurm-pam_slurm-23.02.5-1.el7.x86_64 470 k
 slurm-perlapi
              x86_64 23.02.5-1.el7  /slurm-perlapi-23.02.5-1.el7.x86_64   3.1 M
 slurm-slurmd x86_64 23.02.5-1.el7  /slurm-slurmd-23.02.5-1.el7.x86_64    2.4 M
 slurm-torque x86_64 23.02.5-1.el7  /slurm-torque-23.02.5-1.el7.x86_64    390 k
Installing for dependencies:
 freeipmi     x86_64 1.5.7-3.el7    base-niflheim                         2.0 M
 libjwt       x86_64 1.12.1-7.el7   epel                                   24 k
 libyaml      x86_64 0.1.4-11.el7_0 base-niflheim                          55 k

Question:  Can you verify whether slurmd will actually require the link /usr/lib64/libfreeipmi.so to exist?  I'm afraid that slurmd's may crash without it.  If so, the RPM dependency freeipmi-devel will have to be added to the slurm.spec or configure files so that it will get installed automatically.

I realize that these issues are only relevant only for RHEL/CentOS systems which are RPM based, but many Slurm sites use this family of OSes.

Thanks a lot for your help,
Ole
Comment 4 Oscar Hernández 2023-09-21 10:27:59 MDT
Ole,

I am sorry for that...

>and even if we do NOT yet use AcctGatherEnergyType=acct_gather_energy/ipmi, the 
>slurmd's will read acct_gather.conf and crash on our system :-(
Yes, I agree that this is pretty inconvenient. The acct_gather.conf file can only be used for the ipmi energy plugin option. However, there are other plugins that may have their configuration options set in that file (e.g: acct_gather_profile/HDF5).

That is the reason we always validate acct_gather.conf, to make sure it makes sense. Will check better, but I am not sure if we can improve much in the behavior here.

With regard to the library, I was doing my tests in ubuntu, but also had to install libfreeipmi-dev (and libipmimonitoring-dev) to successfully build the library. So I will try to properly document this, as you mentioned in the other bug.

I understand your concerns with the libraries. I would not expect the dev packages to be needed in compute nodes. Since, dev packages should be mainly needed to get the header files for compilation, but once it is compiled, I see no point in having them. However, I'll check about the symlink mentioned.

For the moment, testing in my ubuntu system (where I have a similar symlink). I did test starting slurm with the symlink removed, and Slurm did not complain in any way, things seemed to work as expected.

Afterwards, tested directly removing the libs:

libfreeipmi.so.17      
libfreeipmi.so.17.2.8

And got the following error:

[2023-09-21T17:44:02.936] error: plugin_load_from_file: dlopen(/home/oscar/Projects/sandbox/17704/install/lib/slurm/acct_gather_energy_ipmi.so): libfreeipmi.so.17: cannot open shared object file: No such file or directory
[2023-09-21T17:44:02.936] error: Couldn't load specified plugin name for acct_gather_energy/ipmi: Dlopen of plugin file failed
[2023-09-21T17:44:02.936] error: cannot create acct_gather_energy context for acct_gather_energy/ipmi
[2023-09-21T17:44:02.936] fatal: can not open the (null) plugin

So, as you can see, it looks for the library versioned, not the basic symlink. Dev package does not seem to be needed. 

In any case, tomorrow I will test it out in CentOS, but I am expecting a similar behavior here.

Cheers,
Oscar
Comment 5 Ole.H.Nielsen@fysik.dtu.dk 2023-09-21 23:26:08 MDT
Hi Oscar,

Thanks for a very detailed analysis!

(In reply to Oscar Hernández from comment #4)
> >and even if we do NOT yet use AcctGatherEnergyType=acct_gather_energy/ipmi, the 
> >slurmd's will read acct_gather.conf and crash on our system :-(
> Yes, I agree that this is pretty inconvenient. The acct_gather.conf file can
> only be used for the ipmi energy plugin option. However, there are other
> plugins that may have their configuration options set in that file (e.g:
> acct_gather_profile/HDF5).
> 
> That is the reason we always validate acct_gather.conf, to make sure it
> makes sense. Will check better, but I am not sure if we can improve much in
> the behavior here.

I think it's fine, now I understand this.

> With regard to the library, I was doing my tests in ubuntu, but also had to
> install libfreeipmi-dev (and libipmimonitoring-dev) to successfully build
> the library. So I will try to properly document this, as you mentioned in
> the other bug.
> 
> I understand your concerns with the libraries. I would not expect the dev
> packages to be needed in compute nodes. Since, dev packages should be mainly
> needed to get the header files for compilation, but once it is compiled, I
> see no point in having them. However, I'll check about the symlink mentioned.
...
> So, as you can see, it looks for the library versioned, not the basic
> symlink. Dev package does not seem to be needed. 

OK, this wasn't obvious.  I'm very glad that you tested without the symlink /usr/lib64/libfreeipmi.so and showed that it actually works.

> In any case, tomorrow I will test it out in CentOS, but I am expecting a
> similar behavior here.

I agree, but it's better to test also CentOS to be 100% sure.  Then we will finally be sure that the required freeipmi libraries are installed as dependencies with the Slurm RPMs, and all should be good :-)

Thanks a lot,
Ole
Comment 6 Ole.H.Nielsen@fysik.dtu.dk 2023-09-21 23:44:51 MDT
Hi Oscar,

(In reply to Oscar Hernández from comment #2)
> > Next, will it be possible to have a fallback acct_gather_energy plugin such
> > as the RAPL plugin so at least CPU+DIMM power readings will be reported in
> > stead of any invalid zero values from DCMI?  I guess the question is whether
> > a list of AcctGatherEnergyType plugins is possible?
> This is not currently supported. Only one plugin should be configured for
> AcctGatherEnergyType.
> 
> I have tested setting them both, in a comma separated list. And slurm will
> not complain, will simply load both at the same time. But I expect this to
> have unpredictable results, having them both overwriting same data.

Thanks for testing a comma separated list of plugins.  It's surprising that this undocumented list even works :-)  I understand that the results may become unpredictable, so this setup should not be used at present.  In the future, it might be good to develop a prioritized list of plugins:  If IPMI fails to work, then try RAPL, then try ...

> The alternative I see to handle your case, would be to set a different
> etc/acct_gather.conf file for each node. This way, you will be able to
> configure EnergyIPMIPowerSensors matching the nodes capabilities. To use
> RAPL though, you would also need to have different slurm.conf for each node.

Thanks for the suggestion.  We're using configless and are extremely happy with it, so having a different slurm.conf and acct_gather.conf on each node would be a step backward :-(

> You might already be using that, but in case you have all configs
> centralized under a shared directory, you could do a symlink like:
> 
> /nfs/etc/slurm/acct_gather.conf -> /local/etc/slurm/acct_gather.conf
> 
> But that would imply to maintain a copy of acct_gather.conf in each node (it
> is not something that should change frequently).

I see that this should work, but we would like to stay with configless.

Another idea:  Since slurm.conf now accepts INCLUDE MODIFIERS, we could have a node local file defining the relevant parameter AcctGatherEnergyType=acct_gather_energy/ipmi or AcctGatherEnergyType=acct_gather_energy/rapl.  We could then have a global acct_gather.conf defining the IPMI parameters.

Do you think it is a good idea to have in slurm.conf a line like this pointing to a local config file:

include /local/etc/slurm/AcctGatherEnergyType.conf

This file would have to exist on the slurmctld server as well as on all slurmd nodes.  It is, however, not obvious to me if the include file gets resolved by slurmctld and passed to all slurmd's using configless, or if slurmd will read the file locally?  Can you resolve this question?

Thanks,
Ole
Comment 7 Oscar Hernández 2023-09-22 04:49:24 MDT
Hi Ole,

>I agree, but it's better to test also CentOS to be 100% sure.  Then we will 
>finally be sure that the required freeipmi libraries are installed as 
>dependencies with the Slurm RPMs, and all should be good :-)
Tested in a Alma8 system I had in hand (which uses same package system)..

Packages include the same:
$ rpm -ql freeipmi | grep lib64/libfree
/usr/lib64/libfreeipmi.so.17
/usr/lib64/libfreeipmi.so.17.2.7

$ rpm -ql freeipmi-devel | grep lib64/libfree
/usr/lib64/libfreeipmi.so

After running:

$ yum remove freeipmi-devel

Slurm still initializes and load the plugin. However, after running:

$ yum remove freeipmi

>[2023-09-22T09:54:49.691] error: plugin_load_from_file: dlopen(/home/vagrant/slurm-23/install-23.11/lib/slurm/acct_gather_energy_ipmi.so): libipmimonitoring.so.6: cannot open shared object file: No such file or directory
>[2023-09-22T09:54:49.691] error: Couldn't load specified plugin name for acct_gather_energy/ipmi: Dlopen of plugin file failed
>[2023-09-22T09:54:49.691] error: cannot create acct_gather_energy context for acct_gather_energy/ipmi
>[2023-09-22T09:54:49.691] fatal: can not open the (null) plugin

You could also check if it is linked to the versioned lib with ldd:

$ ldd /home/test/slurm-23/install/lib/slurm/acct_gather_energy_ipmi.so
	linux-vdso.so.1 (0x00007ffcabd21000)
	libipmimonitoring.so.6 => not found <-
	libfreeipmi.so.17 => not found      <-
	libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f080ef00000)
        ...
I can confirm that having only libfreeipmi won't fatal.

>In the future, it might be good to develop a prioritized list of plugins:  If 
>IPMI fails to work, then try RAPL, then try ...
I see the idea here, it seems this would also require some rework also for acct_gather.conf. Since its options may crash slurmd if they do not match the option picked from the list. I am also thinking that it could also be ambiguous to know which plugin was finally used for each node. Anyway, these are just some thoughts that came to my mind, the suggestion is much appreciated.

>This file would have to exist on the slurmctld server as well as on all slurmd 
>nodes.  It is, however, not obvious to me if the include file gets resolved by 
>slurmctld and passed to all slurmd's using configless, or if slurmd will read 
>the file locally? Can you resolve this question?
When configless is enabled, the slurm controller will share the trackable configurations files located in the same folder. This includes any "include" that might be in the same folder of slurm.conf. But you can have includes to files to different local folders, and these ones will not be sent over, which is convenient for our case.

I have done some testing, and I believe your idea using includes will work. This is the scenario I tested:

Do not set AcctGatherEnergyType in the controllers slurm.conf. Since this will be spread to all nodes. Instead, as you suggested, add a line with(just using /etc/custom as an example path here, this should be a node-local path and different to the path where slurm.conf is placed in the controller):

include etc/custom/AcctGatherEnergyType.conf

Then, in acct_gather.conf, do not set any option referring to Energy plugins. Since as you already saw, having options for "ipmi" when "rapl" is loaded will cause the deamon to fatal on startup. Instead of the options for the enrgy plugin, add:

include /etc/custom/EnergyOptions.conf
 
In all your slurmds/slurmctld, you will need to have a couple of local files (this files content will be permanent, only defining the energygather options):

etc/custom/AcctGatherEnergyType.conf
etc/custom/EnergyOptions.conf

Contents for that files, should be(depending on the node):

####etc/custom/AcctGatherEnergyType.conf

AcctGatherEnergyType=acct_gather_energy/ipmi

or 

AcctGatherEnergyType=acct_gather_energy/rapl

####etc/custom/EnergyOptions.conf (depoending on the plugin loaded)
#for ipmi, these options can be tuned for each node.
EnergyIPMIPowerSensors=Node=DCMI
EnergyIPMIFrequency=5

#in case of rapl, should be an empty file.

That way, you will be able to have a general configuration distributed via configless. But also having some custom EnergyGather configuration for some specific nodes. I have tested it, but would suggest to test it out in a small subset first, to make sure things run as expected.

Let me know if you have any doubt/question with that suggestion. Or if there is something inconvenient with it.

Kind regards,
Oscar
Comment 8 Ole.H.Nielsen@fysik.dtu.dk 2023-09-22 07:50:23 MDT
Hi Oscar,

Thanks for testing the libraries, so I think we're in a good situation with the libfreeipmi and it ought to work

Your detailed description in comment 7 of exactly how to use include files makes a lot of sense.  I will consider this, once I have configured AcctGatherEnergyType=acct_gather_energy/ipmi in stead of our current RAPL.  I will put this on my ToDo list and look at it later.

At this time I have all the necessary information, so you are welcome to close this case.

Thanks for your excellent support!

Ole
Comment 9 Oscar Hernández 2023-09-26 03:59:16 MDT
Great!

Closing then. Just re-open if you have any doubt/issue.

Oscar
Comment 10 Oscar Hernández 2023-09-26 13:45:01 MDT
Hi Ole,

Talking with some colleagues today, they brought to my attention a bug we recently found when using DCMI: Bug 17639.

In brief, thanks to what Marshall was able to get from the backtrace in Bug 17639 comment 30. It seems that there is a current limitation in the freeipmi lib, it is using select(), which is limited to 1024 file descriptors. So when a greater device_fd is assigned, like it is happening in the bug, it crashes the slurmd process.

Since this is part of an external library, all we can do now is suggest to avoid using it(only DCMI is affected). But we are currently looking for alternatives to handle the crash.

Apologies for bringing up this news now, but I was not aware of them last week. And I consider them relevant given your current intention of switching AcctGatherEnergyType plugins.

Kind regards,
Oscar
Comment 11 Ole.H.Nielsen@fysik.dtu.dk 2023-09-27 01:09:31 MDT
Hi Oscar,

Thanks so much for the important information:

(In reply to Oscar Hernández from comment #10)
> Talking with some colleagues today, they brought to my attention a bug we
> recently found when using DCMI: Bug 17639.
> 
> In brief, thanks to what Marshall was able to get from the backtrace in Bug
> 17639 comment 30. It seems that there is a current limitation in the
> freeipmi lib, it is using select(), which is limited to 1024 file
> descriptors. So when a greater device_fd is assigned, like it is happening
> in the bug, it crashes the slurmd process.
> 
> Since this is part of an external library, all we can do now is suggest to
> avoid using it(only DCMI is affected). But we are currently looking for
> alternatives to handle the crash.

I will put my work on using FreeIPMI power monitoring on hold for the time being.  IMHO, it would be good if SchedMD can find some workaround for the problem.  I may also try to alert the FreeIPMI developer to this issue using their mailing list.

Thanks,
Ole
Comment 12 Oscar Hernández 2023-09-27 07:24:34 MDT
Ole,
> I will put my work on using FreeIPMI power monitoring on hold for the time
> being.  IMHO, it would be good if SchedMD can find some workaround for the
> problem.
Thanks for your understanding.

> I may also try to alert the FreeIPMI developer to this issue using
> their mailing list.
We are also looking into some patch proposal for FreeIPMI.

Cheers,
Oscar