In our job_submit.lua script we would like to check (for a certain partition with 4TB memory nodes) that the users submit jobs which *must* specify --mem or --mem-per-cpu and that the values are above certain minimum limits. However, I haven't been able to find the Lua job_desc fields describing --mem or --mem-per-cpu values. I also don't know if the fields will contain NIL or NO_VAL in case the user omitted them. Could you kindly inform me about the fields that might be named similar to job_desc.min_mem_per_node and job_desc.min_mem_per_cpu? The kind of code I'm trying is: local min_mem_per_node = 256000 if job_desc.min_mem_per_node ~= nil and job_desc.min_mem_per_node < min_mem_per_node then ... return slurm.ESLURM_INVALID_TASK_MEMORY end Could you offer an example code which performs this kind of check? Thanks a lot, Ole
Hello Ole, You already have almost everything sorted out, so my contribution will be minimal. If you want to know the complete list of job_desc fields, you can check them in the _get_job_req_field(...) function here [1]. I've created a basic code snippet for demonstration purposes that you can use as your base and also for some quick testing: *********** function slurm_job_submit(job_desc, part_list, submit_uid) -- Verbose checks slurm.log_user("Values of the following parameters:") slurm.log_user("Min mem per node: %s (%s)", job_desc.min_mem_per_node, type(job_desc.min_mem_per_node)) slurm.log_user("Min mem per cpu: %s (%s)", job_desc.min_mem_per_cpu, type(job_desc.min_mem_per_cpu)) if job_desc.min_mem_per_node ~= nil and job_desc.min_mem_per_node < 300 then slurm.log_user("User requested less memory than 300M") end return slurm.SUCCESS end *********** Using this, you will quickly see how the values are populated. Here are some examples: $ sbatch -n1 --mem=100M --wrap="srun hostname" >>sbatch: Values of the following parameters: >>sbatch: Min mem per node: 100.0 (number) >>sbatch: Min mem per cpu: nil (nil) >>sbatch: User requested less memory than 300M $ sbatch -n1 --mem-per-cpu=100M --wrap="srun hostname" >>sbatch: Values of the following parameters: >>sbatch: Min mem per node: nil (nil) >>sbatch: Min mem per cpu: 100.0 (number) With some basic changes to my code (or yours) wou will have this up and running in no time. Let me know if you have any other doubts regarding this. If not, also let me know so I can mark the ticket as closed. Best regards, Ricard. [1] https://github.com/SchedMD/slurm/blob/master/src/plugins/job_submit/lua/job_submit_lua.c#L493
Hi Ricard, Thanks for your quick reply: (In reply to Ricard Zarco Badia from comment #1) > I've created a basic code snippet for demonstration purposes that you can > use as your base and also for some quick testing: > > *********** > function slurm_job_submit(job_desc, part_list, submit_uid) > -- Verbose checks > slurm.log_user("Values of the following parameters:") > slurm.log_user("Min mem per node: %s (%s)", job_desc.min_mem_per_node, > type(job_desc.min_mem_per_node)) > slurm.log_user("Min mem per cpu: %s (%s)", job_desc.min_mem_per_cpu, > type(job_desc.min_mem_per_cpu)) > > if job_desc.min_mem_per_node ~= nil and job_desc.min_mem_per_node < 300 > then > slurm.log_user("User requested less memory than 300M") > end > > return slurm.SUCCESS > end With Slurm 23.02.7 we're getting errors from slurm.log_user() when an argument has a NIL value. When I add these lines to job_submit.lua: slurm.log_user("Values of the following parameters:") slurm.log_user("Min mem per node: %s (%s)", job_desc.min_mem_per_node, type(job_desc.min_mem_per_node)) slurm.log_user("Min mem per cpu: %s (%s)", job_desc.min_mem_per_cpu, type(job_desc.min_mem_per_cpu)) we get in slurmctld.log: [2024-04-11T10:39:04.930] error: job_submit/lua: /etc/slurm/job_submit.lua: [string "slurm.user_msg (string.format(unpack({...})))"]:1: bad argument #2 to 'format' (string expected, got nil) Did you test on a newer Slurm version? I have to make cheks before calling slurm.log_user() like: if job_desc.min_mem_per_node ~= nil and job_desc.min_mem_per_node < min_mem_per_node then slurm.log_user("Big-memory partition %s requires jobs to specify a memory per node of at least %d MB", job_desc.partition, min_mem_per_node) slurm.log_user("Your job requested %s MB", job_desc.min_mem_per_node) slurm.log_user("See the Wiki page %s", usage_page) return slurm.ESLURM_INVALID_TASK_MEMORY end Can you comment on this handling of NIL values? With appropriate checks, my job_submit.lua now seems to work as intended :-) Thanks, Ole
Node added: We have this version of LUA on CentOS 7.9: # rpm -q lua lua-5.1.4-15.el7.x86_64
Hello Ole, I think it is your Lua version, it is a bit outdated. My tests were performed with Slurm 23.02.7 + Lua 5.3.6. I've compared the behavior of Lua 5.1.4 vs 5.3.6 and I think that we have a culprit: >> Lua 5.3.6 Copyright (C) 1994-2020 Lua.org, PUC-Rio >> > local var1 = nil >> > print(string.format("Variable is %s", var1)) >> Variable is nil vs >> Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio >> > local var1 = nil >> > print(string.format("Variable is %s", var1)) >> stdin:1: bad argument #2 to 'format' (string expected, got nil) >> stack traceback: >> [C]: in function 'format' >> stdin:1: in main chunk >> [C]: ? >> > var2 = nil >> > print(string.format("Variable is %s", var2)) >> stdin:1: bad argument #2 to 'format' (string expected, got nil) >> stack traceback: >> [C]: in function 'format' >> stdin:1: in main chunk >> [C]: ? I would either upgrade to 5.3.X at least or just adapt your code to check your variables to make sure that they don't contain nil values before usage/formatting. Best regards, Ricard.
Hi Ricard, (In reply to Ricard Zarco Badia from comment #4) > I think it is your Lua version, it is a bit outdated. My tests were > performed with Slurm 23.02.7 + Lua 5.3.6. I've compared the behavior of Lua > 5.1.4 vs 5.3.6 and I think that we have a culprit: ... > I would either upgrade to 5.3.X at least or just adapt your code to check > your variables to make sure that they don't contain nil values before > usage/formatting. Thanks for confirming that the old Lua 5.1.4 from CentOS 7 is causing this issue. Unfortunately, there is no RPM package available for Lua 5.3.X on CentOS 7. Even if we could find an RPM, installing it might potentially break other things in CentOS. In the case of CentOS 7, it seems that we must make extra checks for NIL values in job_submit.lua. When we later will move our Slurm controller to an EL8 server, the we're going to get Lua 5.3.4 by default and the NIL values will be handled much better. I think that the present issue is now well understood. You're welcome to close this case now. Best regards, Ole
Perfect, I will close the ticket then. Feel free to open it again if something else relevant to the matter comes up. Best regards, Ricard.