Summary: | Setup MPS on multi-GPU nodes | ||
---|---|---|---|
Product: | Slurm | Reporter: | Misha Ahmadian <misha.ahmadian> |
Component: | GPU | Assignee: | Marcin Stolarek <cinek> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 20.11.7 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: | https://bugs.schedmd.com/show_bug.cgi?id=7834 | ||
Site: | TTU | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Misha Ahmadian
2021-06-23 12:15:09 MDT
Misha, I found an older bug discussing very similar if not exactly same topic - Bug 7834. Could you please take a look at it? Let me know if it makes things clear for you. cheers, Marcin Did you find the answers you were looking for in the referenced bug? Is there anything else I can help you with? cheers, Marcin Hi Marcin, Sorry for the delay in my response, and thanks for your quick reply. I think the Bug 7834 makes almost everything clear to me. Actually, I'm not impressed with the way Nvidia handles the MPS, and it seems to me, and it could make users too confused when they submit their jobs. I'd also suggest looking into the Multi-Instance GPUs (MIG) feature that comes with A100. I think that could be a great option to be implemented in Slurm. Best Regards, Misha > I'd also suggest looking into the Multi-Instance GPUs (MIG) feature that comes with A100. I think that could be a great option to be implemented in Slurm. You may add yourself to CC in Bug 10970 which is public and deals with the enhancement request from NVIDIA. >I think the Bug 7834 makes almost everything clear to me. I'll take it as a confirmation that I can close the ticket. If you need any help here (to remove 'almost' from the sentence above), please reopen. cheers, Marcin |