Summary: | Job Submit Lua plugin needs access to slurm_errno.h error codes | ||
---|---|---|---|
Product: | Slurm | Reporter: | Ole.H.Nielsen <Ole.H.Nielsen> |
Component: | Other | Assignee: | Skyler Malinowski <skyler> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 21.08.8 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | DTU Physics | Alineos Sites: | --- |
Bull/Atos Sites: | --- | Confidential Site: | --- |
Cray Sites: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
SFW Sites: | --- | SNIC sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | 23.02.0pre1 | |
Target Release: | --- | DevPrio: | --- |
Description
Ole.H.Nielsen@fysik.dtu.dk
2022-07-08 05:41:10 MDT
> Using numeric return codes gives the expected error message to the user. > For example, with "return ESLURM_BAD_TASK_COUNT" (i.e. code 2025) I get: > > sbatch: error: Batch job submission failed: Task count specification invalid > > But if I try to use the following in my Lua script, by analogy to "return > slurm.ERROR", then this seems to be equivalent to slurm.SUCCESS: > > return slurm.ESLURM_BAD_TASK_COUNT That is because slurm.ESLURM_BAD_TASK_COUNT is not defined and defaults to 0. slurm.SUCCESS is defined as 0 so this behavior makes sense. > Question: How can Lua scripts be enabled to return all possible ESLURM* > error codes from slurm_errno.h using symbolic names in stead of numeric > values? Looks like slurm_lua.c:_register_slurm_output_functions is responsible to loading the slurm_errno.h enum as a consumable symbol. As of right now, Slurm does not expose all the errno enum entries to LUA, hence there is nothing that can be enabled by conf to expose them all. I will be working on a patch to expose all the error codes (or at least the ones that make sense). In the meantime though, you can manually define each slurm.ESLURM_* (as desired) in your LUA script -- as you already have been doing. You can override/extend the slurm lua object with additional fields at runtime. E.g. > -- job_submit.lua > slurm.ESLURM_INVALID_GRES=2072 > slurm.ESLURM_BAD_TASK_COUNT=2025 > > function slurm_job_submit(job_desc, part_list, submit_uid) > ... > if (job_desc.num_tasks == slurm.NO_VAL) then > return slurm.ESLURM_BAD_TASK_COUNT > end > ... > return slurm.SUCCESS > end Or create a new object for the manually defined Slurm error map. Best, Skyler Hi Skyler, (In reply to Skyler Malinowski from comment #1) > Looks like slurm_lua.c:_register_slurm_output_functions is responsible to > loading the slurm_errno.h enum as a consumable symbol. As of right now, > Slurm does not expose all the errno enum entries to LUA, hence there is > nothing that can be enabled by conf to expose them all. > > I will be working on a patch to expose all the error codes (or at least the > ones that make sense). Thanks, I see the ESLURM* etc. symbols defined in slurm_lua.c. It would be great to extend the list of symbols exposed to Lua! I think that documentation in job_submit_plugins should also state clearly where the return codes can be looked up. For example, in the section Returns: slurm.SUCCESS — Job submission accepted by plugin. append something like this: All ESLURM* fields in the source file slurm/slurm_errno.h are available to Lua as slurm.ESLURM* return codes. > In the meantime though, you can manually define each slurm.ESLURM_* (as > desired) in your LUA script -- as you already have been doing. You can > override/extend the slurm lua object with additional fields at runtime. > > E.g. > > -- job_submit.lua > > slurm.ESLURM_INVALID_GRES=2072 > > slurm.ESLURM_BAD_TASK_COUNT=2025 I don't really know Lua, but are you saying that in Lua you are allowed to define arbitrary fields such as ESLURM_BAD_TASK_COUNT in the slurm data structure as in your examples? Without overwriting existing fields or overwriting data outside the slurm data structure? Thanks, Ole > I don't really know Lua, but are you saying that in Lua you are allowed to
> define arbitrary fields such as ESLURM_BAD_TASK_COUNT in the slurm data
> structure as in your examples? Without overwriting existing fields or
> overwriting data outside the slurm data structure?
Define arbitrary object fields, yes. Without overwrite. no. However, manually adding error codes to the LUA slurm object is safe enough to consider and should be convenient once the patch is introduced.
Hi Ole,
The patch to expose all Slurm error codes in LUA plugins was merged in for 23.02.0pre1.
If interested in the relevant commits, please see the following:
> https://github.com/SchedMD/slurm/compare/fac34e6adc..423585c632
Cheers,
Skyler
|