Hi, nVidia GPUs support the following modes which are set using the nvidia-smi -c X switch, where X is one of: 0 – Default - Shared mode available for multiple processes 1 – Exclusive Thread - Only one host thread is allowed to access the GPU for compute 2 – Prohibited - No host threads / processes are allowed to access the GPU for compute 3 – Exclusive Process - Only one host process is allowed to access the GPU for compute On our system the default compute mode has been set for each device as (3) exclusive process. How could we let a user change this access type from SLURM? It requires sudo access and using PBS a job ran a pre-launch script to set it how a user reqested. Thanks, Josh.
Slurm supports a large number of prolog and epilog options. For details see: http://slurm.schedmd.com/prolog_epilog.html The environment variables are documented in the slurm.conf man page (see the section labeled "Prolog and Epilog Scripts". (They should probably be added to the web page too, but are not there today). See: http://slurm.schedmd.com/slurm.conf.html Ideally the user's job would specify the GPU mode via a "--constraint" option, which would be passed to the script as an environment variable. I'm not sure we'll be able to add that before the version 15.08 release, but the prolog script could get the job's constraint option information now using the squeue or scontrol command today, it's just not as scalable as if Slurm passed it directly to the prolog by including the jobs's constraint information in the launch RPC.
Hi Moe, thanks for the information. We currently have a script that could be used as a prolog target. It ran in the prolog on our previous PBS Torque system and used the PBS_GPUFILE variable to determine what GPU had been allocated on each node. How would I determine what GPU has been allocated by SLURM? Is the CUDA_VISIBLE_DEVICES set for each node allocated before the prolog script run? and if so, how do I access this information? Thanks again,Josh.
Prolog does not currently get the CUDA env var, but the info is available and it could be set. On February 15, 2015 8:44:52 PM PST, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1458 > >--- Comment #2 from Josh Bowden <josh.bowden@csiro.au> --- >Hi Moe, thanks for the information. > >We currently have a script that could be used as a prolog target. It >ran in the >prolog on our previous PBS Torque system and used the PBS_GPUFILE >variable to >determine what GPU had been allocated on each node. > >How would I determine what GPU has been allocated by SLURM? Is the >CUDA_VISIBLE_DEVICES set for each node allocated before the prolog >script run? >and if so, how do I access this information? > >Thanks again,Josh. > >-- >You are receiving this mail because: >You are on the CC list for the bug.
So, how could I get that information from within my prolog script then? It is currently a bash script.
We'd have to modify the Slurm code to pass the prolog the same CUDA env vars as the app. On February 15, 2015 9:37:05 PM PST, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1458 > >--- Comment #4 from Josh Bowden <josh.bowden@csiro.au> --- >So, how could I get that information from within my prolog script then? >It is >currently a bash script. > >-- >You are receiving this mail because: >You are on the CC list for the bug.
is that possible? and will it require a recompiliation of SLURM? And will our system admins be able to update the system easily?
It would definitely require rebuilding Slurm, which is simple. At this point I'm not sure how difficult the code changes would be. We'll get back to you later in the week on that. On February 15, 2015 10:12:52 PM PST, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1458 > >--- Comment #6 from Josh Bowden <josh.bowden@csiro.au> --- >is that possible? and will it require a recompiliation of SLURM? And >will our >system admins be able to update the system easily? > >-- >You are receiving this mail because: >You are on the CC list for the bug.
Ok, I have warned our system admins. Happy to hear of any potential solution to the problem. I am surprised it has not been encountered previously. Thanks, Josh.
And just to make things clear about how the prolog script worked, I would need a list of nodes allocated (host names preferably) and a list of GPU devices. e.g. something like: <hostname>,<GPUNUM>,<GPUNUM>,... n002,1,2 n034,0,1 n035,0,2 etc. The script then ssh's onto the node and uses nvidia-smi to set the access mode. I also need a variable containing the access mode that a user requested. For example: --gres=gpu:2:exclusive_process or --gres=gpu:2:normal or --gres=gpu:2:exclusive_thread So i'd need access to access mode Is all of that possible? Thanks, Josh
Wouldn't it be better to just run directly on each compute node rather than using ssh from the head node? See the prolog/epilog web page previously cited. On February 16, 2015 4:50:06 PM PST, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1458 > >--- Comment #9 from Josh Bowden <josh.bowden@csiro.au> --- >And just to make things clear about how the prolog script worked, I >would need >a list of nodes allocated (host names preferably) and a list of GPU >devices. > >e.g. something like: ><hostname>,<GPUNUM>,<GPUNUM>,... >n002,1,2 >n034,0,1 >n035,0,2 >etc. > >The script then ssh's onto the node and uses nvidia-smi to set the >access mode. >I also need a variable containing the access mode that a user >requested. >For example: > >--gres=gpu:2:exclusive_process >or >--gres=gpu:2:normal >or >--gres=gpu:2:exclusive_thread > >So i'd need access to access mode > >Is all of that possible? > >Thanks, >Josh > >-- >You are receiving this mail because: >You are on the CC list for the bug.
nvidia-smi requires root access privilages to change the access mode, so I would be running as SlurmdUser from the compute or front end node (i.e. the first row of the first table on page: http://slurm.schedmd.com/prolog_epilog.html). I would then have to either: A/ ssh to any other nodes allocated and run nvidia-smi or B/ use 'srun' as SlurmdUser/root to run nvidia-smi on allocated nodes. (if this is possible) I would have to ensure that PrologFlags=Alloc is not set as I would be possibly modifying the gpu access while previous jobs are running.
The prolog on each compute node runs as root before the app gets launched. I was thinking it could be passed the allocated GPUs on that node using a CUDA env var, which we already have to launch the app. On February 16, 2015 5:30:21 PM PST, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1458 > >--- Comment #11 from Josh Bowden <josh.bowden@csiro.au> --- >nvidia-smi requires root access privilages to change the access mode, >so I >would be running as SlurmdUser from the compute or front end node (i.e. >the >first row of the first table on page: >http://slurm.schedmd.com/prolog_epilog.html). > >I would then have to either: >A/ ssh to any other nodes allocated and run nvidia-smi >or >B/ use 'srun' as SlurmdUser/root to run nvidia-smi on allocated nodes. >(if this >is possible) > >I would have to ensure that PrologFlags=Alloc is not set as I would be >possibly >modifying the gpu access while previous jobs are running. > >-- >You are receiving this mail because: >You are on the CC list for the bug.
Well, if that is the case, then that is exactly what is needed and will simplify the needed prolog script.
This will be addressed to the extent possible for now in version 14.11.5. Specifically, the Prolog (run as root on each allocated compute node) will now have the environment variable SLURM_JOB_GPUS passed to it. This will be the same as CUDA_VISIBLE_DEVICES on that specific compute node unless the device is bound to the job using cgroups (in that case, SLURM_JOB_GPUS will be the global GPU index while CUDA_VISIBLE_DEVICES always starts at 0). The changes are here if you are anxious to try this. https://github.com/SchedMD/slurm/commit/2e95c20b3bf9bcddd9b0fe0048e222fb8306c90b Note that this will not work if you have in slurm.conf PrologFlags=alloc (the necessary information is not available for that to work). Also note, that you will need to use the squeue or scontrol command to get a job's "constraint" specification (i.e. Exclusive Thread, Prohibited, Exclusive Process, or Default). Fixing those two things will need to wait until Slurm version 15.08 as changes to the RPCs are required.
(In reply to Moe Jette from comment #14) > Note that this (previous patch) will not work if you have in slurm.conf > PrologFlags=alloc (the necessary information is not available for that > to work). > > Also note, that you will need to use the squeue or scontrol command to get a > job's "constraint" specification (i.e. Exclusive Thread, Prohibited, > Exclusive Process, or Default). > > Fixing those two things will need to wait until Slurm version 15.08 as > changes to the RPCs are required. Here are the fixes to the two problems described above. Changes are fairly extensive and several RPCs changed, so these changes will not be released until Slurm version 15.08, but you should be able to work around the shortcomings as described above. https://github.com/SchedMD/slurm/commit/6966f77ee747685d4858dc80b8ea35f61872ee72 https://github.com/SchedMD/slurm/commit/06db2ded72ae192bb55f74f10cdb51610b14a8fb You will have to define node features as desired in slurm.conf, something like this NodeName=tux[0-123] Features=gpu_ex,gpu_pro,gpu_ex_pro,gpu_def ... The user can then specify the desired behaviour, something like this: sbatch -C gpu_ex --gres=gpu:2 ... The prolog in SLurm v15.08 will see these env vars SLURM_JOB_GPUS=0,1 SLURM_JOB_CONSTRAINTS=gpu_ex In v14.11 with the previously mentioned commit you will see SLURM_JOB_GPUS=0,1 And use scontrol or squeue to get the constraints, like this: $ scontrol show job $SLURM_JOB_ID JobId=26 JobName=hostname ... Features=gpu_ex Gres=gpu:2 Reservation=(null) Please re-open if you encounter more difficulties.
Hi Moe, That is great news. I have passed this on to our systems people so I hope they can have it implemented in the near furture. Regards, Josh.