Hi there, Just started testing Slurm 20.02 on Gerty and have run into a regression when running our Reframe tests, in this case where they test that they can submit from inside a job on the Cray XC to the Slurm cluster that manages the external compute nodes (for transfer jobs). The way we set this up is to have separate Slurm builds, one with the usual /usr/bin/sbatch etc that is for the XC part and another build that is has /opt/esslurm/ set as its --prefix and has its own config file. Here's an example of it working on Cori: salloccsamuel@cori10:~> salloc -q interactive -C haswell salloc: Pending job allocation 31395060 salloc: job 31395060 queued and waiting for resources salloc: job 31395060 has been allocated resources salloc: Granted job allocation 31395060 salloc: Waiting for resource configuration salloc: Nodes nid00249 are ready for job csamuel@nid00249:~> module load esslurm csamuel@nid00249:~> sbatch -q xfer --test --wrap hostname sbatch: Job 724062 to start at 2020-06-05T16:46:34 using 1 processors on nodes cori01 in partition xfer Whereas on Gerty (which has the same module file): csamuel@gert01:~> salloc -q interactive -C haswell salloc: Granted job allocation 1936622 salloc: Waiting for resource configuration salloc: Nodes nid00022 are ready for job csamuel@nid00022:~> module load esslurm csamuel@nid00022:~> sbatch -q xfer --test --wrap hostname sbatch: error: Job request does not match any supported policy for gerty In the second case the giveaway is that the submit filter says which cluster you're trying to submit to if it doesn't match a rule, so you can see for some reason it's still trying to submit to the XC. Here's the output of it run with lots of `-v`'s: csamuel@nid00442:~> sbatch -vvvvvvvvvvvvvv -q xfer --test --wrap hostname sbatch: debug4: found jobid = 1936627, stepid = 0 sbatch: debug4: found jobid = 1936627, stepid = 4294967295 sbatch: debug: Leaving stepd_getpw sbatch: debug3: cli_filter/lua: slurm_lua_loadscript: skipping loading Lua script: /etc/esslurm/cli_filter.lua sbatch: debug3: cli_filter/lua: slurm_lua_loadscript: skipping loading Lua script: /etc/esslurm/cli_filter.lua sbatch: defined options sbatch: -------------------- -------------------- sbatch: qos : xfer sbatch: test-only : set sbatch: verbose : 14 sbatch: wrap : hostname sbatch: -------------------- -------------------- sbatch: end of defined options sbatch: debug2: spank: shifter_slurm.so: init_post_opt = 0 sbatch: debug2: spank: zonesort.so: init_post_opt = 0 sbatch: debug2: spank: sdn_plugin.so: init_post_opt = 0 sbatch: debug2: spank: perf.so: init_post_opt = 0 sbatch: debug: propagating RLIMIT_CPU=18446744073709551615 sbatch: debug: propagating RLIMIT_FSIZE=18446744073709551615 sbatch: debug: propagating RLIMIT_DATA=18446744073709551615 sbatch: debug: propagating RLIMIT_STACK=18446744073709551615 sbatch: debug: propagating RLIMIT_CORE=0 sbatch: debug: propagating RLIMIT_RSS=126701535232 sbatch: debug: propagating RLIMIT_NPROC=2048 sbatch: debug: propagating RLIMIT_NOFILE=4096 sbatch: debug: propagating RLIMIT_MEMLOCK=18446744073709551615 sbatch: debug: propagating RLIMIT_AS=18446744073709551615 sbatch: debug: propagating SLURM_PRIO_PROCESS=0 sbatch: debug3: Trying to load plugin /opt/esslurm/lib64/slurm/auth_munge.so sbatch: debug: Munge authentication plugin loaded sbatch: debug3: Success. sbatch: debug3: Trying to load plugin /opt/esslurm/lib64/slurm/select_cons_res.so sbatch: select/cons_res loaded with argument 50 sbatch: debug3: Success. sbatch: debug3: Trying to load plugin /opt/esslurm/lib64/slurm/select_cons_tres.so sbatch: select/cons_tres loaded with argument 50 sbatch: debug3: Success. sbatch: debug3: Trying to load plugin /opt/esslurm/lib64/slurm/select_cray_aries.so sbatch: Cray/Aries node selection plugin loaded sbatch: debug3: Success. sbatch: debug3: Trying to load plugin /opt/esslurm/lib64/slurm/select_linear.so sbatch: Linear node selection plugin loaded with argument 50 sbatch: debug3: Success. sbatch: debug3: Trying to load plugin /opt/esslurm/lib64/slurm/select_cons_res.so sbatch: select/cons_res loaded with argument 50 sbatch: debug3: Success. sbatch: error: Job request does not match any supported policy for gerty allocation failure: Unspecified error sbatch: debug2: spank: libAtpSLaunch.so: exit = 0 sbatch: debug2: spank: libslurm_notifier.so: exit = 0 sbatch: debug2: (24788) __del__:840 Unloading slurm_notifier There's something in the environment with Slurm 20.02 that seems to cause it though, if I dump the environment and start a new shell then it will work correctly (so it's not the install). csamuel@nid00442:~> env - /bin/bash -l csamuel@nid00442:/global/u2/c/csamuel> module load esslurm csamuel@nid00442:/global/u2/c/csamuel> sbatch -q xfer --test --wrap hostname sbatch: Job 973 to start at 2020-06-05T17:55:42 using 1 processors on nodes gert01 in partition xfer Any ideas? All the best, Chris
This is all the esslurm module does: modulecsamuel@gert01:~> module show esslurm ------------------------------------------------------------------- /usr/common/software/modulefiles/esslurm: module-whatis NERSC es Slurm binary executable module prepend-path PATH /opt/esslurm/bin prepend-path LD_LIBRARY_PATH /opt/esslurm/lib64 -------------------------------------------------------------------
Chris, This looks like a side-effect of the config-less approach introduced in Slurm 20.02. Specifically, to make sure that commands make use of the same cached version of slurm.conf the SLURM_CONF environment variable gets set. Although environment modules are outside of SchedMD expertise I think that adding: >set OLD_SLURM_CONF [ getenv SLURM_CONF] >unsetenv SLURM_CONF $OLD_SLURM_CONF to esslurm module file should work for you. The goal is to unset SLURM_CONF when loading the module, but set it back to the previous value if someone will unload the module attempting to use local build - creating a step within existing allocation on gerty after submission to xfer. Let me know if that worked. cheers, Marcin
We also have a number of use-cases where calling the executable by absolute path, (e.g., /opt/esslurm/bin/scontrol) needs to work, so I do think that we need to assure that an executable can find its own configuration without relying on the environment.
Doug, Just to make sure we're on the same page. Calling the binary within a clean environment (without SLURM_CONF variable set) will work as you expect and use slurm.conf from the binary default location. The issue here is calling a Slurm binary from the existing allocation against other configuration than initial salloc/sbatch/srun. The simple workaround that should work for this case is the use of wrapper that unsets SLURM_CONF(or sets appropriately) before calling a Slurm command. What do you think? cheers, Marcin
Hi Marcin, Whilst setting up wrappers is technically possible it is a bit of a hack to try and work around something that used to work. What I'm wondering, looking at _establish_config_source() in src/common/read_config.c, is that it looks like it might be able to fix this by having the check for the file specified by SLURM_CONF go after the check for the default_slurm_config_file rather than before. Given that (at least to my understanding) configless operation is when that shouldn't exist (or if it does it'll be a symlink to the downloaded config) then that should be right order of operation anyway? What do you think? All the best, Chris
Chris, The logic is a little bit different - SLURM_CONF had always precedence over built-in default it wasn't introduced as config-less, but it's rather in-use by config-less option. The new part is that SLURM_CONF gets set by salloc(and other commands) not its precedence over default location. If we ever decide to change this, such default can only be changed as part of a major release. When you think about that from a perspective of a random site using Slurm there are those who use environment modules just updating SLURM_CONF to achieve similar behavior. Another (maybe strongest) argument against that change is a "common sense" of built-in default configuration file, which seems to me being - use that location if nothing provided by other mechanisms like run time option or environment variable. Another way to consider, but definitively being a bigger change in your environment is Slurm multi-cluster setup, did you consider this option? cheers, Marcin
Hi Marcin, Thanks for that - I can understand the reasoning behind that. Is it possible that SLURM_CONF could only be set when running configless (i.e. when "enable_configless" is set in SlurmctldParameters) ? That would seem like a reasonable compromise, this new behaviour would only occur when requested, leaving existing configurations unaffected by this change in behaviour. All the best, Chris
Hi Marcin, Having poked around the code some more, and tried some patches myself, I've realised the simplest solution is probably to just use a task prolog to unset SLURM_CONF on our Cray XC systems. csamuel@gert01:~> cat /etc/slurm/taskprolog.sh #!/bin/bash # Unset SLURM_CONF on XC to prevent it breaking "module load esslurm" echo unset SLURM_CONF results in: csamuel@gert01:~> srun -q interactive -C haswell bash -c 'env | fgrep -c SLURM_CONF' srun: job 2224234 queued and waiting for resources srun: job 2224234 has been allocated resources 0 srun: error: nid00058: task 0: Exited with exit code 1 srun: Terminating job step 2224234.0 which is what we want. Likely the simplest solution all round (no patches, no wrappers, no changes to RPMs and only deployed where we need it via our ansible setup). All the best, Chris
Chris, We had a short internal discussion on that case and we think that unset/override of SLURM_CONF in environment module for esslurm is the best approach. Did you check if the absolute path case is used within a job allocation (I asked Doug about that in comment 5) if it's not then there won't be any change for those? cheers, Marcin
Chris, Just a touch base - are you OK with our advice? cheers, Marcin
Chris, Another follow-up. If you don't reply within a week I'll close the report as information given, cheers, Marcin
Chris, I'm closing the case as "information given". Should you have any question please don't hesitate to reopen. cheers, Marcin
Hi Marcin, Sorry, I didn't see your previous replies until this was closed. We can't assume people will load the environment module, so we'll stick with unsetting it in the task prolog, that seems the easiest way to stop it breaking things for us. All the best, Chris
Chris, Understood, however, you have to be aware that we're not running any regression testing with such a disruptive TaskProlog. This may result in some issues on your side in the future that may not be easy to reproduce by us without knowing about the variable being unset. cheers, Marcin
(In reply to Marcin Stolarek from comment #18) > Understood, however, you have to be aware that we're not running any > regression testing with such a disruptive TaskProlog. This may result in > some issues on your side in the future that may not be easy to reproduce by > us without knowing about the variable being unset. Understood, it's just an unfortunate and unavoidable consequence of the Cray architecture. :-(