Hi folks, [Couldn't tell if this should be configuration, scheduling, slurmd or slurmctld sorry!] Currently it appears that GRES are allocated per-node, meaning that a job that does: #!/bin/bash #SBATCH --ntasks=2 #SBATCH --tasks-per-node=2 #SBATCH -c 4 #SBATCH --gres=gpu:2 srun --gres=gpu:1 nvidia-smi -L Results in the job being allocated 2 GPUs, but then the individual tasks end up being allocated the same GPU rather than getting a different GPU each. $ cat slurm-51744.out GPU 0: Tesla P100-PCIE-12GB (UUID: GPU-74d38c15-d0b4-c3e5-c825-8db93c583c01) GPU 0: Tesla P100-PCIE-12GB (UUID: GPU-74d38c15-d0b4-c3e5-c825-8db93c583c01) $ uniq -c slurm-51744.out 2 GPU 0: Tesla P100-PCIE-12GB (UUID: GPU-74d38c15-d0b4-c3e5-c825-8db93c583c01) It would be nice if there was a --gres-per-task option in the same way that there is a --cpus-per-task option so that codes that would benefit from each task having a dedicated GPU could take advantage of it. As a side issue it'd be nice to be able to request memory per task too for symmetry. :-) All the best, Chris
Chris - Retagging as an enhancement request. I believe we do have some similar work in progress for a future release, but can't quite comment on that at the moment. - Tim
On 26/03/18 16:53, bugs@schedmd.com wrote: > Retagging as an enhancement request. I selected that, I was sure I had! Need more caffeine.. > I believe we do have some similar work in progress for a future > release, but can't quite comment on that at the moment. Not a problem, thanks for letting me know. Or not. :-) cheers, Chris
(In reply to Christopher Samuel from comment #2) > On 26/03/18 16:53, bugs@schedmd.com wrote: > > > Retagging as an enhancement request. > > I selected that, I was sure I had! Need more caffeine.. ... I need to document it somewhere, but I punt Sev5 requests by customers to Sev4 to make sure they get some initial triage before being relegated to an enhancement request. > > I believe we do have some similar work in progress for a future > > release, but can't quite comment on that at the moment. > > Not a problem, thanks for letting me know. Or not. :-) I'll try to update this as more details are available. Expect some news on this front by SLUG'18 at the latest. :)
This issue is critical to our adoption of Slurm. Tasks should never be given colliding GPU assignments. We use our GPUs in Exclusive Process mode to allow proper use of GPU memory, therefore GPU collisions cause jobs to fail. More generally, tasks should run wherever there are resources, spread out across nodes if necessary, or sharing nodes if that works better. Frankly, the idea of per-node resource requests is completely unhelpful to us. We have a variety of GPU servers, some have 2, 4, or even 8 GPUs, and we need to be able to schedule a job that will take maximum advantage of heterogeneous hardware. Splitting the work up into small tasks, which the scheduler can fit on a smaller server, or fit several of on a larger server, is a core part of our workflow.
This functionality is in Slurm version 19.05. It's official release date is in May. There should be a new pre-release (unsupported) next week.
Thanks Moe, much appreciated. I've been keeping an eye on master. :-)