Created attachment 8255 [details] Patch seff to reflect api change from rss_max Hi all, The 18.08.3 version of seff is showing warnings and is not calculating the memory usage correctly: Use of uninitialized value $lmem in numeric lt (<) at /usr/bin/seff line 130, <DATA> line 624. As reported on the mailing list, it looks like the perlapi changed from using 'rss_max' 'tres_usage_in_max' (changed in c72bc449) needs to be matched in seff. The attached patch should fix it. Thanks, Paddy
Hullo, this largely works for us - with one addition - the value needs to be divided by 1024 to be in kbs (so just a /1024): my $lmem = Slurmdb::find_tres_count_in_string($step->{'stats'}{'tres_usage_in_max'}, TRES_MEM)/1024; can you please merge this (with this tweak). If it would help please feel free to associate this bug with Princeton site - we use this and need it fixed... Thanks!
Created attachment 8695 [details] Add more constants to perl API We also don't have access to vm_max either so we need those constants available in perl.
Created attachment 8933 [details] Updated patch seff to reflect api change from rss_max (fix kb) Updated the patch to include Josko's fix about converting to kb.
Thanks guys, these are now pushed into 18.08. I will note we have the idea to add seff functionality to sacct in new columns such as TRESUtil[in|out] and TRESEff[in|out]. This will be in 19.05 or 20.02 depending on available developers for the task. I will also note while looking at seff in general it appears it will not handle any parallel steps correctly, or any batch/extern step doing things as well. I am not sure if this was expected, but wanted to bring it to your attention. It appears this has always been the case. If the job only ever has one step then you get better results. But still it is assuming homogeneous tasks as well for memory. It is taking the max of all tasks and then multiplying that by the number of tasks. This is wrong in all cases unless they are all exactly the same. I would suggest using the tres_usage_in_ave instead of tres_usage_in_max to get a more correct value here. Of course if you only ever have 1 task then things work as expected. At the moment I am leaving the code the way it is as it has always been this way. Please reopen if you would like to talk about this future.
Hullo, we are aware of the seff's limitations as far as memory efficiency goes (for n>1) but we haven't address them in part due to time limitations and in part because there are multiple answers and the current one is a good starting point for deciding "what's the minimum memory I need/task to allocate". E.g. if your memory use is asymmetrical (as it can be with MPI programs where rank 0 can consume more, sometimes much more) then you need to take the max as the guide for how much memory you need per task. Average could easily be insufficient. Of course, one could go with heterogeneous job allocations but that's too messy for most of our audience. So we'd like to see in per job report max/task, average and min numbers and better handling of corner cases (we've seen memory efficiencies of 2000%+...). Anyway, just something to think about. Thanks for patching it.
*** Ticket 6380 has been marked as a duplicate of this ticket. ***
Josko, thanks for the information. It wasn't clear from the output what was expected. I can see what you are looking for now. Just a heads up on this, we are looking to clear out contribs and send projects like this back to the original author referencing them from our webpage instead of having them directly in contribs. This will mostly happen this year at some point.
Created attachment 9000 [details] Fix contributors name list Thanks for the merge Danny. And good to know about the future plans for contribs. I looked back on my patch and realised I didn't follow the guidelines for keeping authors in alphabetical order! So trivial patch attached, if you want to put things back in order. (The nit-picker in me couldn't just let it go!)
Hullo, it would be nice if you considered carefully each one of these and either kept more useful ones or moved them into core. E.g. seff is something that we find essential to help users and us gauge how well their jobs are running and clearly you do too as you are adding some of that functionality to sacct. I guess if all of this (memory usage stats included) will be easily available in sacct we can ditch seff and write a simple wrapper around sacct for email/post job checking. Thanks, JP (In reply to Danny Auble from comment #9) > Josko, thanks for the information. It wasn't clear from the output what was > expected. I can see what you are looking for now. > > Just a heads up on this, we are looking to clear out contribs and send > projects like this back to the original author referencing them from our > webpage instead of having them directly in contribs. > > This will mostly happen this year at some point.