Ticket 3187

Summary: sreport returns reserved 0 for gres
Product: Slurm Reporter: Albert Gil <albert.gil>
Component: AccountingAssignee: Jacob Jenson <jacob>
Status: RESOLVED WONTFIX QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 15.08.9   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Albert Gil 2016-10-18 01:37:34 MDT
It seems that srepot is always returning 0 as Reserved time for GRES values in Cluster Utilization sreports.

For example:

$ sreport -T GRES/gpu  -t SecPer cluster utilization Format=TresName,TresCount,Allocated,Reserved,Idle,Down Start=`date -d "last month" +%D` End=now
--------------------------------------------------------------------------------
Cluster Utilization 2016-09-18T00:00:00 - 2016-10-18T08:59:59
Use reported in TRES Seconds/Percentage of Total
--------------------------------------------------------------------------------
     TRES Name TRES Count         Allocated          Reserved              Idle              Down 
-------------- ---------- ----------------- ----------------- ----------------- ----------------- 
      gres/gpu         10  10338753(36.78%)          0(0.00%)  16345037(58.15%)    1423366(5.06%) 


I know for sure that several jobs have been in Reserved while trying to allocate with --gres:gpu.

Please note that the percentatge of allocated+reserved+idle+down is still 100%, so it seems that reserved time is counted incorrectly to some of the other states?
(and we cannot deduce the Reserved from the others! ;-)

Thanks!

Albert

PS: Although this is probably a minor issue for the general Slurm case, in our  case this is becoming important as GPU accounting is being the most important computing resource in our cluster.
Comment 1 Jacob Jenson 2017-11-03 15:30:30 MDT
Before this ticket can be sent to the support team we need to put a support contract in place for your site. Please let me know if you would like to discuss Slurm support.