Management has asked me to provide cluster usage reports ASAP giving specified data for each job, such as provided by $ sacct -o JobID,User,AllocNodes,AllocCPUS Further information needed is job walltime, which I can calculate as End-Start. However, the number of GPUs used per job seems to be unavailable to sacct in the database. Jobs that are currently running can be inquired like this: $ squeue -hO tres-per-node -j 3827335 gpu:RTX3090:1 It would seem that the tres-per-node information is not recorded in the database. Question: Is there some way to read the tres-per-node information from the database? Question: Is there some way to configure the recording of tres-per-node, if it isn't there already? Thanks, Ole
Hi Ole, You can have the TRES information recorded in the database. It shows up in the AllocTRES field, but the controller does have to be configured to record that information. You use the AccountingStorageTRES parameter to define which TRES you would like to track in the database. Here is an example where I have the following in my slurm.conf: AccountingStorageTRES=billing,cpu,energy,mem,node,gres/gpu:tesla1 I submit a job that requests 1 'tesla1' gpu and you can see that sacct reports that that gpu was allocated: $ sbatch -N1 --gres=gpu:tesla1:1 --wrap='srun sleep 5' Submitted batch job 29745 $ sacct -j 29745 --format=jobid,account,alloctres%45 JobID Account AllocTRES ------------ ---------- --------------------------------------------- 29745 sub1 billing=2,cpu=1,gres/gpu:tesla1=1,mem=100M,n+ 29745.batch sub1 cpu=1,gres/gpu:tesla1=1,mem=100M,node=1 29745.extern sub1 billing=2,cpu=1,gres/gpu:tesla1=1,mem=100M,n+ 29745.0 sub1 cpu=1,gres/gpu:tesla1=1,mem=100M,node=1 Let me know if you have any questions about this or if you're seeing different results. Thanks, Ben
Hi Ben, (In reply to Ben Roberts from comment #1) > You can have the TRES information recorded in the database. It shows up in > the AllocTRES field, but the controller does have to be configured to record > that information. You use the AccountingStorageTRES parameter to define > which TRES you would like to track in the database. > > Here is an example where I have the following in my slurm.conf: > AccountingStorageTRES=billing,cpu,energy,mem,node,gres/gpu:tesla1 > > > > I submit a job that requests 1 'tesla1' gpu and you can see that sacct > reports that that gpu was allocated: > > $ sbatch -N1 --gres=gpu:tesla1:1 --wrap='srun sleep 5' > Submitted batch job 29745 > > $ sacct -j 29745 --format=jobid,account,alloctres%45 > JobID Account AllocTRES > ------------ ---------- --------------------------------------------- > 29745 sub1 billing=2,cpu=1,gres/gpu:tesla1=1,mem=100M,n+ > 29745.batch sub1 cpu=1,gres/gpu:tesla1=1,mem=100M,node=1 > 29745.extern sub1 billing=2,cpu=1,gres/gpu:tesla1=1,mem=100M,n+ > 29745.0 sub1 cpu=1,gres/gpu:tesla1=1,mem=100M,node=1 > > > Let me know if you have any questions about this or if you're seeing > different results. Thanks for the info! I configured this in slurm.conf: AccountingStorageTRES=gres/gpu,gres/gpu:K20Xm,gres/gpu:RTX3090 and restarted slurmctld. Now we do get GPU accounting as desired: $ sacct -j 3829403 -p --format=jobid,account,alloctres JobID|Account|AllocTRES| 3829403|ecsstud|billing=28,cpu=14,gres/gpu:rtx3090=1,gres/gpu=1,mem=30G,node=1| 3829403.batch|ecsstud|cpu=14,gres/gpu:rtx3090=1,gres/gpu=1,mem=30G,node=1| 3829403.extern|ecsstud|billing=28,cpu=14,gres/gpu:rtx3090=1,gres/gpu=1,mem=30G,node=1| I guess you may close this case now. Thanks, Ole
I'm glad that will get you the information you need going forward, but I'm sorry that you don't have the historical data you were looking for. I'll close this ticket but let us know if there is anything else we can do to help. Thanks, Ben