Bug 498

Summary: Ticket-based multifactor priority depends on user count in each account?
Product: Slurm Reporter: Ryan Cox <ryan_cox>
Component: SchedulingAssignee: Moe Jette <jette>
Status: RESOLVED WORKSFORME    
Severity: 4 - Minor Issue CC: da
Priority: ---    
Version: 2.6.x   
Hardware: Linux   
OS: Linux   
Site: BYU - Brigham Young University Alineos Sites: ---
Bull/Atos Sites: --- Confidential Site: ---
Cray Sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
OCF Sites: --- SFW Sites: ---
Machine Name: Version Fixed:
Target Release: --- DevPrio: ---
CLE Version:

Description Ryan Cox 2013-10-30 05:49:20 MDT
We are using the ticket-based multifactor plugin but seem to have found a problem with it.  For some reason account glh43 can get fairshare numbers that are in between some of dhe's users.  glh43 has much higher usage than anyone else including dhe.

One thing I noticed is that the priority of fslcollab8 (glh43) plummeted after I added myself to the glh43 account and submitted a held job under that account.  fslcollab8's fairshare number in sprio had been 3373 then it dropped to 447.  At that point he was still ahead of a user from dhe but not behind all of them.  It dropped even further when another dhe user had jobs queued.

My understanding is that it doesn't matter how many users in an account have queued up jobs or what their usage is within that group.  I thought ticket-based multifactor was supposed to always rank accounts first then the users inside the account, thus it shouldn't matter if glh43 has 1 user and dhe has 100 or vice versa.  Is my understanding incorrect?

We are on 2.6.2 at commit cd91f0a6488bfac9da5847caa87586b157fceb58.

The numbers below are before I added myself to the glh43 account.

# sshare -aA glh43,dhe -l
Accounts requested:
	: glh43
	: dhe
             Account       User Raw Shares Norm Shares   Raw Usage  Norm Usage Effectv Usage  FairShare  GrpCPUMins      CPURunMins 
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ----------- --------------- 
dhe                                   1024    0.007692  4074644021    0.137712      0.137712   0.000004                     8921718 
 dhe                   andrewdt          1    0.000000           0    0.000000      0.000000   0.406123                           0 
 dhe                     apetit        500    0.000078       46054    0.000002      0.000101   0.406123                           0 
 dhe                   austinb9       1024    0.000159      290417    0.000010      0.000207   0.406123                           0 
 dhe                    bdianal          1    0.000000           0    0.000000      0.000000   0.406123                           0 
 dhe                      bk232       1024    0.000159   427838311    0.014460      0.014460   0.000000                     2702685 
 dhe                   brendang       1024    0.000159      166016    0.000006      0.000207   0.406123                           0 
 dhe                    brianhk       1024    0.000159     7166084    0.000242      0.000242   0.349027                           0 
 dhe                      deepa       8192    0.001276   671945802    0.022710      0.022710   0.000004                      197074 
 dhe                        dhe      16384    0.002552    96215432    0.003252      0.003317   0.406123                           0 
 dhe                 fslcollab+         64    0.000010           0    0.000000      0.000013   0.406123                           0 
 dhe                 fslcollab+         64    0.000010           0    0.000000      0.000013   0.406123                           0 
 dhe                 fslcollab+       1024    0.000159           0    0.000000      0.000207   0.406123                           0 
 dhe                   jaustinj       1024    0.000159     5967056    0.000202      0.000207   0.406123                           0 
 dhe                   jflygare       1024    0.000159    13272068    0.000449      0.000449   0.142346                           0 
 dhe                     jreitz       1024    0.000159     3674755    0.000124      0.000207   0.406123                           0 
 dhe                    kevinsp          1    0.000000        4806    0.000000      0.000000   0.406123                           0 
 dhe                    kredd31          1    0.000000        7057    0.000000      0.000000   0.345936                           0 
 dhe                   mdevonas       1024    0.000159   461633029    0.015602      0.015602   0.000000                     3599884 
 dhe                      mwahl       1024    0.000159        1593    0.000000      0.000207   0.406123                           0 
 dhe                   panders6       1024    0.000159        2495    0.000000      0.000207   0.406123                           0 
 dhe                      pez22          1    0.000000           0    0.000000      0.000000   0.406123                           0 
 dhe                   samparas       1024    0.000159   108982624    0.003683      0.003683   0.000000                           0 
 dhe                   scottm27       1024    0.000159   215065638    0.007269      0.007269   0.000000                           0 
 dhe                    sgustaf       8192    0.001276  1450630340    0.049027      0.049027   0.000000                      708829 
 dhe                     slake4       1024    0.000159       37578    0.000001      0.000207   0.406123                           0 
 dhe                     tccook        500    0.000078    60390454    0.002041      0.002041   0.000000                           0 
 dhe                    ycaibyu        128    0.000020    44234874    0.001495      0.001495   0.000000                           0 
 dhe                    yzhang8       1024    0.000159   507071529    0.017138      0.017138   0.000000                     1713244 
glh43                                 1024    0.007692 10663598743    0.360400      0.360400   0.000000                    17381294 
 glh43                    cade1       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                  chumphe       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                   cobalt       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43               fslcollab8       1024    0.000405  8932994373    0.301911      0.301911   0.000000                    17381294 
 glh43                    glh43       1024    0.000405   382890464    0.012941      0.012941   0.000000                           0 
 glh43                     gww7       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                  harts12       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                  hoggane       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                 jaykisme       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                   kc9joc       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                  legoses       1024    0.000405  1346159674    0.045497      0.045497   0.000000                           0 
 glh43                 mburbidg       1024    0.000405           2    0.000000      0.000526   0.406123                           0 
 glh43                  ostrom2       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                    ppare       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                  profcwr       1024    0.000405     1554228    0.000053      0.000526   0.406123                           0 
 glh43                 sdaltonb       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                   t2rich       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                  tylerbj       1024    0.000405           0    0.000000      0.000526   0.406123                           0 
 glh43                    van32       1024    0.000405           0    0.000000      0.000526   0.406123                           0 

Here is some cleaned up output from sprio.  I omitted a bunch of jobs that are redundant information for debugging purposes.  All of the "fslcolla" below are "fslcollab8" in glh43's account.  mdevonas, yzhang8, and sgustaf are in account dhe.  Note that sgustaf is higher in terms of fairshare than fslcollab8 but the other dhe users are not.

  JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
2302256  sgustaf       8097         27       8070          0          0          0      0
2302255  sgustaf       8097         27       8070          0          0          0      0
2302254  sgustaf       8097         27       8070          0          0          0      0
2303799  sgustaf       8071          1       8070          0          0          0      0
2303798  sgustaf       8071          1       8070          0          0          0      0
2303797  sgustaf       8071          1       8070          0          0          0      0
2303795  sgustaf       8071          1       8070          0          0          0      0
2303927  sgustaf       8070          0       8070          0          0          0      0
1972208 fslcolla       4372       1000       3373          0          0          0      0
1972207 fslcolla       4372       1000       3373          0          0          0      0
1970581 fslcolla       4373       1000       3373          0          0          0      0
1969766 fslcolla       4049        677       3373          0          0          0      0
1969776 fslcolla       4041        668       3373          0          0          0      0
1969778 fslcolla       4040        667       3373          0          0          0      0
1969788 fslcolla       4033        661       3373          0          0          0      0
1969818 fslcolla       4010        638       3373          0          0          0      0
1969823 fslcolla       4007        634       3373          0          0          0      0
1969840 fslcolla       3993        620       3373          0          0          0      0
1969858 fslcolla       3978        606       3373          0          0          0      0
1969868 fslcolla       3972        599       3373          0          0          0      0
1969866 fslcolla       3972        600       3373          0          0          0      0
1969878 fslcolla       3963        591       3373          0          0          0      0
1969897 fslcolla       3949        576       3373          0          0          0      0
1969895 fslcolla       3949        577       3373          0          0          0      0
1969957 fslcolla       3898        525       3373          0          0          0      0
1969981 fslcolla       3879        507       3373          0          0          0      0
1969789 fslcolla       3807        435       3373          0          0          0      0
1970336 fslcolla       3573        200       3373          0          0          0      0
2299766 mdevonas        454         57        396          1          0          0      0
2299765 mdevonas        454         57        396          1          0          0      0
2299764 mdevonas        454         57        396          1          0          0      0
2299763 mdevonas        454         57        396          1          0          0      0
2299678 mdevonas        454         58        396          0          0          0      0
2299677 mdevonas        454         58        396          0          0          0      0
2299676 mdevonas        454         58        396          0          0          0      0
2299675 mdevonas        454         58        396          0          0          0      0
2299674 mdevonas        454         58        396          0          0          0      0
2299673 mdevonas        454         58        396          0          0          0      0
2299672 mdevonas        454         58        396          0          0          0      0
2299671 mdevonas        454         58        396          0          0          0      0
2299670 mdevonas        454         58        396          0          0          0      0
2299669 mdevonas        454         58        396          0          0          0      0
2299668 mdevonas        454         58        396          0          0          0      0
2299667 mdevonas        454         58        396          0          0          0      0
2303855  yzhang8        365          0        361          4          0          0      0
2303853  yzhang8        365          0        361          4          0          0      0
2303852  yzhang8        365          0        361          4          0          0      0
2303851  yzhang8        365          0        361          4          0          0      0
2303850  yzhang8        365          0        361          4          0          0      0
2303849  yzhang8        365          0        361          4          0          0      0
2303848  yzhang8        365          0        361          4          0          0      0
2303847  yzhang8        365          0        361          4          0          0      0
Comment 1 Moe Jette 2013-10-31 08:59:34 MDT
My understanding is that it doesn't matter how many users in an account have queued up jobs or what their usage is within that group.  I thought ticket-based multifactor was supposed to always rank accounts first then the users inside the account, thus it shouldn't matter if glh43 has 1 user and dhe has 100 or vice versa.  Is my understanding incorrect?

That is incorrect. Tickets are distributed at each level of the hierarchy according to how over- or under-served the child users/accounts are. In the case of account dhe, tickets are distributed in the following ratios (details below show each stage):
Portion of account's tickets
User mdevonas   tickets = (1024 x 0.0102) / (10.44 + 213.01 + 9.52) = 0.0448
User sgustaf    tickets = (8192 x 0.0260) / (10.44 + 213.01 + 9.52) = 0.9143
User yzhang8    tickets = (1024 x 0.0093) / (10.44 + 213.01 + 9.52) = 0.0409

That means user sgustaf should expect a fair share value about 20 times higher than users mdevonas or yzhang8, which matches what we see:
# sprio
  JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE
2302256  sgustaf       8097         27       8070          0
1972208 fslcolla       4372       1000       3373          0
2299766 mdevonas        454         57        396          1
2303855  yzhang8        365          0        361          4

User fslcollab8 fair share will be a about 40% of user sgustaf as we see. All the calculations are below and more information is available at the web site:
http://slurm.schedmd.com/priority_multifactor.html
http://slurm.schedmd.com/priority_multifactor2.html

# sshare -aA glh43,dhe -l
Accounts requested:
	: glh43
	: dhe
             Account       User Raw Shares Norm Shares   Raw Usage  Norm Usage Effectv Usage  FairShare
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ----------
dhe                                   1024    0.007692  4074644021    0.137712      0.137712   0.000004
 dhe                   mdevonas       1024    0.000159   461633029    0.015602      0.015602   0.000000
 dhe                    sgustaf       8192    0.001276  1450630340    0.049027      0.049027   0.000000
 dhe                    yzhang8       1024    0.000159   507071529    0.017138      0.017138   0.000000

glh43                                 1024    0.007692 10663598743    0.360400      0.360400   0.000000
 glh43               fslcollab8       1024    0.000405  8932994373    0.301911      0.301911   0.000000

# sprio
  JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
2302256  sgustaf       8097         27       8070          0          0          0      0
1972208 fslcolla       4372       1000       3373          0          0          0      0
2299766 mdevonas        454         57        396          1          0          0      0
2303855  yzhang8        365          0        361          4          0          0      0

Account dhe   Norm Shares = 0.007692
Account glh43 Norm Shares = 0.007692

Account dhe   Effectv Usage = 0.137712
Account glh43 Effectv Usage = 0.360400

fair share = normalized share / effective usage
Account dhe   fair share = 0.05586
Account glh43 fair share = 0.02134

User mdevonas   Norm Shares = 0.000159
User sgustaf    Norm Shares = 0.001276
User yzhang8    Norm Shares = 0.000159
User fslcollab8 Norm Shares = 0.000405

User mdevonas   Effectv Usage = 0.015602
User sgustaf    Effectv Usage = 0.049027
User yzhang8    Effectv Usage = 0.017138
User fslcollab8 Effectv Usage = 0.301911

User mdevonas   fair share = 0.0102
User sgustaf    fair share = 0.0260
User yzhang8    fair share = 0.0093
User fslcollab8 fair share = 0.0013

=========================
ticket share = (shares x fair share) / (SUM siblings(share x fair share))
Account dhe   tickets = (1024 x 0.05586) / (1024 x 0.05586 + 1024 x 0.02134) = 0.7245
Account glh43 tickets = (1024 x 0.02134) / (1024 x 0.05586 + 1024 x 0.02134) = 0.2764

Portion of account's tickets
User mdevonas   tickets = (1024 x 0.0102) / (10.44 + 213.01 + 9.52) = 0.0448
User sgustaf    tickets = (8192 x 0.0260) / (10.44 + 213.01 + 9.52) = 0.9143
User yzhang8    tickets = (1024 x 0.0093) / (10.44 + 213.01 + 9.52) = 0.0409
User fslcollab8 tickets = 1.0

Portion of all tickets
User mdevonas   tickets = 0.0448 x 0.7245 = 0.0324
User sgustaf    tickets = 0.9143 x 0.7245 = 0.6624
User yzhang8    tickets = 0.0409 x 0.7245 = 0.0296
User fslcollab8 tickets = 1.0 x 0.2764    = 0.2764
Comment 2 Ryan Cox 2013-11-01 06:05:53 MDT
That makes sense.  After following the calculations you provided, the documentation made sense.

Basically, we have a problem where our most active user (fslcollab8) was the only one from his account running jobs for a time.  He was competing with several users from the second most active account (dhe), several of whom have different shares numbers.  Since the calculation at each depth in the tree for shares is (Suser / Ssiblings), fslcollab8 was 1.0 since he was the only active user.  dhe's users were at a disadvantage since there were more of them active and that lowered one component of the shares calculation.  The other components are essentially constant for us since we treat each account equally.  Thus accounts with only one active user have an advantage.

We're planning to add a new algorithm to the multifactor plugin that will handle our use case much better.  We've modeled it extensively and may code it up and try it out by early next week.  It will evaluate each depth and make it so that, all else equal, users from an overserved account will never under any circumstances get a higher priority than users from an underserved account.  It assumes a rigid hierarchy and may only be practical for a two level tree (user and account), but that's what we have and it should work well for us.
Comment 3 Moe Jette 2013-11-01 06:13:37 MDT
CEA did some work described in the presentation below that will be included in the version 13.12 release. I'm not sure if this will help your situation though.

http://slurm.schedmd.com/SUG13/fairshare-improvement-0.4.pdf