Summary: | sacct cannot print GPU usage information | ||
---|---|---|---|
Product: | Slurm | Reporter: | Ole.H.Nielsen <Ole.H.Nielsen> |
Component: | Accounting | Assignee: | Ben Roberts <ben> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | ||
Version: | 20.11.8 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | DTU Physics | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Ole.H.Nielsen@fysik.dtu.dk
2021-07-14 05:40:24 MDT
Hi Ole, You can have the TRES information recorded in the database. It shows up in the AllocTRES field, but the controller does have to be configured to record that information. You use the AccountingStorageTRES parameter to define which TRES you would like to track in the database. Here is an example where I have the following in my slurm.conf: AccountingStorageTRES=billing,cpu,energy,mem,node,gres/gpu:tesla1 I submit a job that requests 1 'tesla1' gpu and you can see that sacct reports that that gpu was allocated: $ sbatch -N1 --gres=gpu:tesla1:1 --wrap='srun sleep 5' Submitted batch job 29745 $ sacct -j 29745 --format=jobid,account,alloctres%45 JobID Account AllocTRES ------------ ---------- --------------------------------------------- 29745 sub1 billing=2,cpu=1,gres/gpu:tesla1=1,mem=100M,n+ 29745.batch sub1 cpu=1,gres/gpu:tesla1=1,mem=100M,node=1 29745.extern sub1 billing=2,cpu=1,gres/gpu:tesla1=1,mem=100M,n+ 29745.0 sub1 cpu=1,gres/gpu:tesla1=1,mem=100M,node=1 Let me know if you have any questions about this or if you're seeing different results. Thanks, Ben Hi Ben, (In reply to Ben Roberts from comment #1) > You can have the TRES information recorded in the database. It shows up in > the AllocTRES field, but the controller does have to be configured to record > that information. You use the AccountingStorageTRES parameter to define > which TRES you would like to track in the database. > > Here is an example where I have the following in my slurm.conf: > AccountingStorageTRES=billing,cpu,energy,mem,node,gres/gpu:tesla1 > > > > I submit a job that requests 1 'tesla1' gpu and you can see that sacct > reports that that gpu was allocated: > > $ sbatch -N1 --gres=gpu:tesla1:1 --wrap='srun sleep 5' > Submitted batch job 29745 > > $ sacct -j 29745 --format=jobid,account,alloctres%45 > JobID Account AllocTRES > ------------ ---------- --------------------------------------------- > 29745 sub1 billing=2,cpu=1,gres/gpu:tesla1=1,mem=100M,n+ > 29745.batch sub1 cpu=1,gres/gpu:tesla1=1,mem=100M,node=1 > 29745.extern sub1 billing=2,cpu=1,gres/gpu:tesla1=1,mem=100M,n+ > 29745.0 sub1 cpu=1,gres/gpu:tesla1=1,mem=100M,node=1 > > > Let me know if you have any questions about this or if you're seeing > different results. Thanks for the info! I configured this in slurm.conf: AccountingStorageTRES=gres/gpu,gres/gpu:K20Xm,gres/gpu:RTX3090 and restarted slurmctld. Now we do get GPU accounting as desired: $ sacct -j 3829403 -p --format=jobid,account,alloctres JobID|Account|AllocTRES| 3829403|ecsstud|billing=28,cpu=14,gres/gpu:rtx3090=1,gres/gpu=1,mem=30G,node=1| 3829403.batch|ecsstud|cpu=14,gres/gpu:rtx3090=1,gres/gpu=1,mem=30G,node=1| 3829403.extern|ecsstud|billing=28,cpu=14,gres/gpu:rtx3090=1,gres/gpu=1,mem=30G,node=1| I guess you may close this case now. Thanks, Ole I'm glad that will get you the information you need going forward, but I'm sorry that you don't have the historical data you were looking for. I'll close this ticket but let us know if there is anything else we can do to help. Thanks, Ben |