Bug 1611 - New feature: Slurm efficiency script: seff
Summary: New feature: Slurm efficiency script: seff
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other bugs)
Version: 15.08.x
Hardware: Linux Linux
: --- 5 - Enhancement
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-04-20 03:55 MDT by Dennis McRitchie
Modified: 2019-04-09 11:34 MDT (History)
7 users (show)

See Also:
Site: Princeton (PICSciE)
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 16.05.0-pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Tarball for seff and smail utilities (2.89 KB, application/x-gzip)
2015-04-20 03:55 MDT, Dennis McRitchie
Details
Springdale/RedHat 6 binary RPM for seff and smail utilities (6.24 KB, application/octet-stream)
2015-04-20 03:57 MDT, Dennis McRitchie
Details
Springdale/RedHat 6 SRPM for seff and smail utilities (6.77 KB, application/octet-stream)
2015-08-16 08:08 MDT, Dennis McRitchie
Details
Remove Data::Dumper dependency (253 bytes, patch)
2016-06-04 02:56 MDT, Robbert Eggermont
Details | Diff
Preserve original subject (333 bytes, patch)
2016-06-04 03:05 MDT, Robbert Eggermont
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dennis McRitchie 2015-04-20 03:55:48 MDT
Created attachment 1830 [details]
Tarball for seff and smail utilities

This is a contribution: the Slurm job efficiency report (seff).

Summary:
seff takes a jobid and reports on the efficiency of that job's cpu and memory utilization. The rpm/tarball comes with an 'smail' utility that allows for Slurm end-of-job emails to include a seff report. This allows users to become aware if they are wasting resources.

> seff 
Usage: seff [Options] <Jobid>
       Options:
       -h    Help menu
       -v    Version
       -d    Debug mode: display raw Slurm data

The seff output is mostly self-explanatory:

> seff 3485050
Job ID: 3485050
Cluster: della
User/Group: dmcr/cses
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 2
CPU Utilized: 00:00:01
CPU Efficiency: 0.40% of 00:04:08 core-walltime
Memory Utilized: 2.04 GB (estimated maximum)
Memory Efficiency: 86.89% of 2.34 GB (1.17 GB/node)

The smail utility is invoked automatically to process end-of-job notifications by adding the following to /etc/slurm/slurm.conf:

MailProg=/usr/bin/smail

This script parses the notification subject line, and generates the requested email with seff output as the body.

Please let me know if you have any questions.

Best regards,
Dennis
Comment 1 Dennis McRitchie 2015-04-20 03:57:48 MDT
Created attachment 1831 [details]
Springdale/RedHat 6 binary RPM for seff and smail utilities
Comment 2 Dennis McRitchie 2015-04-20 04:00:14 MDT
I forgot to mention that this script relies on the jobs_get() functionality of the Perl API, and hence requires Slurm 15.08.

Dennis
Comment 3 Dennis McRitchie 2015-08-16 08:08:16 MDT
Created attachment 2127 [details]
Springdale/RedHat 6 SRPM for seff and smail utilities
Comment 4 Moe Jette 2015-09-25 11:04:22 MDT
I've added this to the "master" branch of Slurm, which will be in version 16.05 (May 2015, we pretty much limit changes other than bug fixes to major releases, I assume you'll manage this as a local patch for now). 

I added a copyright notice to both scripts:
# Copyright 2015 Princeton University Research Computing

I also modified the smail script so that it would identify the location of seff rather than assume it is in /usr/bin. I also set these files in their own RPM (named "slurm-seff"). Let us know if you make other changes going forward.

I think this will prove very helpful to many other Slurm users. Thanks!

Commit here:
https://github.com/SchedMD/slurm/commit/93d9189c35be9d603cfedb09b55c4110a9b5779a
Comment 5 Dennis McRitchie 2015-09-26 08:18:11 MDT
You're welcome Moe.

Your changes look good to me.

Best,
Dennis
Comment 6 Robbert Eggermont 2016-06-04 02:56:13 MDT
Created attachment 3179 [details]
Remove Data::Dumper dependency

seff crashes when Data::Dumper is not installed (and that one is not required by the slurm-seff rpm). It's not used so it doesn't need to be loaded.
Comment 7 Robbert Eggermont 2016-06-04 03:05:55 MDT
Created attachment 3180 [details]
Preserve original subject

Preserving the original subject has it's pro's, both for backward compatibility and for spotting failed jobs between thousands of successful jobs.
Comment 8 Ole.H.Nielsen@fysik.dtu.dk 2017-09-08 14:41:06 MDT
One of our users asked for "smail" to prepend the ClusterName to the subject line so that mails from Slurm jobs can be more easily identified or filtered.

I made a small change to the smail script that seems to do the trick, see https://github.com/OleHolmNielsen/Slurm_tools/tree/master/smail.

If you like it, please accept this as a contribution.
Comment 9 Moe Jette 2017-09-11 08:24:22 MDT
(In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #8)
> One of our users asked for "smail" to prepend the ClusterName to the subject
> line so that mails from Slurm jobs can be more easily identified or filtered.
> 
> I made a small change to the smail script that seems to do the trick, see
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/smail.
> 
> If you like it, please accept this as a contribution.

Thanks for your contribution. It has been added to the version 17.11 code base here:
https://github.com/SchedMD/slurm/commit/d26370f88e437581ed034a20f3ebf56047dad187

I added the install directory path to scontrol (not a problem in your case) here:
https://github.com/SchedMD/slurm/commit/be89d5c0f944e9538c821fda299e9e9c050c0440
Comment 10 Manu 2018-07-02 02:20:34 MDT
(In reply to Dennis McRitchie from comment #0)
> Created attachment 1830 [details]
> Tarball for seff and smail utilities
> 
> This is a contribution: the Slurm job efficiency report (seff).
> 
> Summary:
> seff takes a jobid and reports on the efficiency of that job's cpu and
> memory utilization. The rpm/tarball comes with an 'smail' utility that
> allows for Slurm end-of-job emails to include a seff report. This allows
> users to become aware if they are wasting resources.
> 
> > seff 
> Usage: seff [Options] <Jobid>
>        Options:
>        -h    Help menu
>        -v    Version
>        -d    Debug mode: display raw Slurm data
> 
> The seff output is mostly self-explanatory:
> 
> > seff 3485050
> Job ID: 3485050
> Cluster: della
> User/Group: dmcr/cses
> State: COMPLETED (exit code 0)
> Nodes: 2
> Cores per node: 2
> CPU Utilized: 00:00:01
> CPU Efficiency: 0.40% of 00:04:08 core-walltime
> Memory Utilized: 2.04 GB (estimated maximum)
> Memory Efficiency: 86.89% of 2.34 GB (1.17 GB/node)
> 
> The smail utility is invoked automatically to process end-of-job
> notifications by adding the following to /etc/slurm/slurm.conf:
> 
> MailProg=/usr/bin/smail
> 
> This script parses the notification subject line, and generates the
> requested email with seff output as the body.
> 
> Please let me know if you have any questions.
> 
> Best regards,
> Dennis



Hi Dennis,

 I am interested in your work. Which formulas had you used to reach your final conclusions of both efficiencies?

Thanks in advance, 

Manuel.-
Comment 11 Dennis McRitchie 2018-07-05 11:08:13 MDT
Hi Manuel,

 

I don’t recall exactly since this was written several years ago now, and I don’t have the code with me at the moment. However, I believe that CPU efficiency was basically the actual CPU time divided by the number of CPUs divided by the wallclock time. Memory efficiency was the high-water mark of memory used divided by the memory requested for the job.

 

The code is not very complicated, so if you download it and take a look, you can probably see exactly what I did.

 

Best,

Dennis

 

From: bugs@schedmd.com [mailto:bugs@schedmd.com] 
Sent: Monday, July 2, 2018 1:21 AM
To: dmcr@princeton.edu
Subject: [Bug 1611] New feature: Slurm efficiency script: seff

 

 <mailto:mcasillasrcc@gmail.com> Manu changed bug 1611 <https://bugs.schedmd.com/show_bug.cgi?id=1611>  


What

Removed

Added


CC

  

mcasillasrcc@gmail.com <mailto:mcasillasrcc@gmail.com>  

Comment # 10 <https://bugs.schedmd.com/show_bug.cgi?id=1611#c10>  on bug 1611 <https://bugs.schedmd.com/show_bug.cgi?id=1611>  from  <mailto:mcasillasrcc@gmail.com> Manu 

(In reply to Dennis McRitchie from comment #0 <https://bugs.schedmd.com/show_bug.cgi?id=1611#c0> )
> Created attachment 1830 [details] <https://bugs.schedmd.com/attachment.cgi?id=1830>  [details] <https://bugs.schedmd.com/attachment.cgi?id=1830&action=edit> 
> Tarball for seff and smail utilities
> 
> This is a contribution: the Slurm job efficiency report (seff).
> 
> Summary:
> seff takes a jobid and reports on the efficiency of that job's cpu and
> memory utilization. The rpm/tarball comes with an 'smail' utility that
> allows for Slurm end-of-job emails to include a seff report. This allows
> users to become aware if they are wasting resources.
> 
> > seff 
> Usage: seff [Options] <Jobid>
>        Options:
>        -h    Help menu
>        -v    Version
>        -d    Debug mode: display raw Slurm data
> 
> The seff output is mostly self-explanatory:
> 
> > seff 3485050
> Job ID: 3485050
> Cluster: della
> User/Group: dmcr/cses
> State: COMPLETED (exit code 0)
> Nodes: 2
> Cores per node: 2
> CPU Utilized: 00:00:01
> CPU Efficiency: 0.40% of 00:04:08 core-walltime
> Memory Utilized: 2.04 GB (estimated maximum)
> Memory Efficiency: 86.89% of 2.34 GB (1.17 GB/node)
> 
> The smail utility is invoked automatically to process end-of-job
> notifications by adding the following to /etc/slurm/slurm.conf:
> 
> MailProg=/usr/bin/smail
> 
> This script parses the notification subject line, and generates the
> requested email with seff output as the body.
> 
> Please let me know if you have any questions.
> 
> Best regards,
> Dennis
 
 
 
Hi Dennis,
 
 I am interested in your work. Which formulas had you used to reach your final
conclusions of both efficiencies?
 
Thanks in advance, 
 
Manuel.-
  _____  


You are receiving this mail because: 

*	You reported the bug.
*	You are on the CC list for the bug.
Comment 12 Dennis McRitchie 2018-07-05 11:16:54 MDT
Hi Manuel,

 

I don’t recall exactly since this was written several years ago now, and I don’t have the code with me at the moment. However, I believe that CPU efficiency was basically the actual CPU time divided by the number of CPUs divided by the wallclock time. Memory efficiency was the high-water mark of memory used divided by the memory requested for the job.

 

The code is not very complicated, so if you download it and take a look, you can probably see exactly what I did.

 

Best,

Dennis

 

From: bugs@schedmd.com [mailto:bugs@schedmd.com] 
Sent: Monday, July 2, 2018 1:21 AM
To: dmcr@princeton.edu
Subject: [Bug 1611] New feature: Slurm efficiency script: seff

 

 <mailto:mcasillasrcc@gmail.com> Manu changed bug 1611 <https://bugs.schedmd.com/show_bug.cgi?id=1611>  


What

Removed

Added


CC

  

mcasillasrcc@gmail.com <mailto:mcasillasrcc@gmail.com>  

Comment # 10 <https://bugs.schedmd.com/show_bug.cgi?id=1611#c10>  on bug 1611 <https://bugs.schedmd.com/show_bug.cgi?id=1611>  from  <mailto:mcasillasrcc@gmail.com> Manu 

(In reply to Dennis McRitchie from comment #0 <https://bugs.schedmd.com/show_bug.cgi?id=1611#c0> )
> Created attachment 1830 [details] <https://bugs.schedmd.com/attachment.cgi?id=1830>  [details] <https://bugs.schedmd.com/attachment.cgi?id=1830&action=edit> 
> Tarball for seff and smail utilities
> 
> This is a contribution: the Slurm job efficiency report (seff).
> 
> Summary:
> seff takes a jobid and reports on the efficiency of that job's cpu and
> memory utilization. The rpm/tarball comes with an 'smail' utility that
> allows for Slurm end-of-job emails to include a seff report. This allows
> users to become aware if they are wasting resources.
> 
> > seff 
> Usage: seff [Options] <Jobid>
>        Options:
>        -h    Help menu
>        -v    Version
>        -d    Debug mode: display raw Slurm data
> 
> The seff output is mostly self-explanatory:
> 
> > seff 3485050
> Job ID: 3485050
> Cluster: della
> User/Group: dmcr/cses
> State: COMPLETED (exit code 0)
> Nodes: 2
> Cores per node: 2
> CPU Utilized: 00:00:01
> CPU Efficiency: 0.40% of 00:04:08 core-walltime
> Memory Utilized: 2.04 GB (estimated maximum)
> Memory Efficiency: 86.89% of 2.34 GB (1.17 GB/node)
> 
> The smail utility is invoked automatically to process end-of-job
> notifications by adding the following to /etc/slurm/slurm.conf:
> 
> MailProg=/usr/bin/smail
> 
> This script parses the notification subject line, and generates the
> requested email with seff output as the body.
> 
> Please let me know if you have any questions.
> 
> Best regards,
> Dennis
 
 
 
Hi Dennis,
 
 I am interested in your work. Which formulas had you used to reach your final
conclusions of both efficiencies?
 
Thanks in advance, 
 
Manuel.-
  _____  


You are receiving this mail because: 

*	You reported the bug.
*	You are on the CC list for the bug.