Bug 9374 - Some users not able to run jobs when sssd enumerate is off on compute nodes
Summary: Some users not able to run jobs when sssd enumerate is off on compute nodes
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other bugs)
Version: - Unsupported Older Versions
Hardware: Cray CS Linux
: --- 3 - Medium Impact
Assignee: Marcin Stolarek
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-07-09 14:33 MDT by Bill Marmagas
Modified: 2020-07-17 00:59 MDT (History)
0 users

See Also:
Site: VTech BI
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Marmagas 2020-07-09 14:33:46 MDT
We are using sssd to authenticate users on our clusters.  We recently had to turn off the sssd enumerate option on our compute nodes while implementing our latest cluster because the large number of nodes (320) caused cpu load issues on the directory server.  The directory server is a custom LDAP server, and it allows certain users to suppress their "uupid", which in our environment is their username: ldap_user_name = uupid.  It is those users with a suppressed uupid who are not able to submit jobs because they do not get their SLURM_JOB_USER set properly.  For example:

[zorba@tcadmin2 ~]$ srun --pty $SHELL

[I have no name!@tc-hm001 ~]$ id
uid=986791 gid=986791 groups=986791

[I have no name!@tc-hm001 ~]$ echo $SLURM_JOB_USER
nobody

This Cray cluster is being implemented by Cray with Bright 8.2 and Slurm 18.08.9.  We have tested this issue on several clusters with different Slurm versions: 17.02.11, 18.08.9 (the current cluster in question), and 19.05.5.  It appears to NOT be a problem on the 19.05.5 cluster, and our research indicates that there was an issue fixed in 19.05 that may be directly related: -- srun - do not continue with job launch if --uid fails. CVE-2019-19728.

We are opening this ticket in order to get confirmation that our issue may be addressed and the version in which it is addressed in order know how to proceed and to have a convincing argument if we need to ask Cray to upgrade their Slurm version on this implementation.


NOTE:  Our old support contract has expired, but we purchased SchedMD support with this new cluster:

SL-SLURM-1KS-SUP
Software
SCHEDMD, SLURM, COMPUTE NODE DISTRIBUTION, PER X86 CPU COMPUTE SOCKET, FOR SYSTEMS WITH OVER 100 AND UP TO 1,000 X86 COMPUTE SOCKETS -QUARTERLY SUPPORT
13280

SL-SLURM-1KS
Software
SCHEDMD, SLURM, COMPUTE NODE DISTRIBUTION, PER X86 CPU COMPUTE SOCKET, FOR SYSTEMS WITH OVER 100 AND UP TO 1,000 X86 COMPUTE SOCKETS
664
Comment 3 Bill Marmagas 2020-07-10 16:07:39 MDT
One other difference on the 19.05.5 cluster that does not exhibit the problem is that I have implemented pam_slurm_adopt on it, but not on the others.  Not sure if that has an effect.  I can probably test that on one of the older Slurm systems.
Comment 4 Marcin Stolarek 2020-07-14 03:46:10 MDT
Bill,

>[...]because they do not get their SLURM_JOB_USER set properly
It looks that both SLUR_JOB_USER and BASH prompt are not set correctly because of some issues in the environment.

I have a hypothesis that we can confirm by the following sequence of commands on the computing node:
#getent passwd 986791
#getent passwd USER_NAME
#getent passwd 986791
where you should substitute USER_NAME with an appropriate name. 

SLURM_JOB_USER variable gets when calling srun is set based on uid. Slurm code is rather simple here and just executes getpwuid_r glibc function, if it fails to get user name then we're setting it to hardcoded "nobody".

>issue fixed in 19.05 that may be directly related: -- srun - do not continue with job launch if --uid fails. CVE-2019-19728.
This is not related to the issue you described.

cheers,
Marcin
Comment 5 Bill Marmagas 2020-07-14 08:05:45 MDT
I logged in as root to a compute node that I just had trouble launching a job on using my regular user account and ran those commands:

[root@tc219 ~]# getent passwd 986791
[root@tc219 ~]# getent passwd zorba
zorba:*:986791:986791:William Gregory Marmagas:/home/zorba:/bin/bash
[root@tc219 ~]# getent passwd 986791
zorba:*:986791:986791:William Gregory Marmagas:/home/zorba:/bin/bash


Thanks.
Comment 6 Marcin Stolarek 2020-07-15 01:45:24 MDT
Bill,

As you see uid->username resolution in your configuration works only after first username->uid call. This is outside of SchedMD expertise, however, there are two most probable reasons for that:
-> Your backend (IAM) database configuration doesn't correctly handle queries originated by sssd in case of direct user name specified (enumeration disabled).
-> You're using sssd-ldap with algorithmic mapping and users in different slices. In this case mapping from uid to user name will only work when the first username->uid resolution for this slice was already performed, since this will assign the slice.

Looking back at comment 0:
>We recently had to turn off the sssd enumerate option on our compute nodes while implementing our latest cluster because the large number of nodes
My recommendation would be to try nss_slurm[1] which is Name Service Switch module that can be used on top of other sources (like sssd) to reduce the load on the backend IAM databases. Simply speaking it handles all "getent passwd" like queries happening inside the job step from Slurm cached instead of querying over the network. It was added in Slurm 19.05 release.

This way you can reenable enumeration, which as I understand was rather a workaround for high load than intentional configuration change.

Checking Slurm code for potential workarounds I have an idea that may, or may not work depending on configuration details, version, your workload specifics, and... the root cause details. Could you please check if the SLURM_JOB_USER is correctly set in a job prolog? If it's you can execute a query like:
`getent passwd $SLURM_JOB_USER` in prolog script before Slurm will execv user job process to "pre-load" the information to sssd. This will effectively require configuration of:
>PrologFlags=Alloc

cheers,
Marcin

[1]https://slurm.schedmd.com/nss_slurm.html
Comment 7 Bill Marmagas 2020-07-15 13:58:29 MDT
Thanks for the detailed reply.


>-> Your backend (IAM) database configuration doesn't correctly handle queries originated by sssd in case of direct user name specified (enumeration disabled).

Yes, the issue seems to be that individuals can have their user name suppressed in our IAD, and those are the accounts having the issues with sssd enumeration disabled.  I'm not sure why it is not a problem on our Slurm 19.05 cluster.


>My recommendation would be to try nss_slurm[1] which is Name Service Switch module that can be used on top of other sources (like sssd) to reduce the load on on the backend IAM databases.

Thank you for that tip!  I did not know about that new module.  That's a good reason for us to ask the integrator to perform the upgrade to at least 19.05.


>Checking Slurm code for potential workarounds I have an idea that may, or may not work depending on configuration details, version, your workload specifics, and... the root cause details. Could you please check if the SLURM_JOB_USER is correctly set in a job prolog?

Yes, I had that same thought about using a prolog, but the SLURM_JOB_USER gets set to nobody and so it did not work when I first tried it.  However, I did not try it with PrologFlags=Alloc so I will set up a new test with that configured and let you know how it works.
Comment 8 Bill Marmagas 2020-07-15 14:47:15 MDT
It appears that the prolog with PrologFlags=Alloc set worked!


-> The test prolog:

#!/bin/bash

GETENT=/usr/bin/getent

if [ ! -z $SLURM_JOB_USER ]
then
        echo $SLURM_JOB_USER
        $GETENT passwd $SLURM_JOB_USER
else
        echo "The SLURM_JOB_USER variable is empty."
        exit 101
fi


-> The job on a node that had just previously failed:

[zorba@tinkercliffs2 ~]$ srun --pty --reservation=slurm_testing $SHELL

Inactive Modules:
  1) DefaultModules

[zorba@tc001 ~]$ id
uid=986791(zorba) gid=986791(zorba) groups=986791(zorba),16521(arc.arcadm),7228715(arc.openondemand),7863760(arc.sysadmin),7937010(arc.haswell)
[zorba@tc001 ~]$ echo $SLURM_JOB_USER
zorba
[zorba@tc001 ~]$ 



I'm going to do some repeat testing to verify, but I'm pretty sure that was it.


Thanks!
Comment 9 Marcin Stolarek 2020-07-17 00:59:31 MDT
Bill,

I'm going to go ahead and close the case as "info given".

Should you have any questions please don't hesitate to reopen.

cheers,
Marcin