Summary: | Slurm with Open MPI | ||
---|---|---|---|
Product: | Slurm | Reporter: | Hadrian <hxd58> |
Component: | Scheduling | Assignee: | Tim Wickberg <tim> |
Status: | RESOLVED TIMEDOUT | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | ||
Version: | 17.02.2 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Case | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Hadrian
2017-05-11 09:59:02 MDT
(In reply to Hadrian from comment #0) > Hi, > > We have a couple of questions about implementing OpenMPI with Slurm 17.02.2. > > 1) We use "rpmbuild -ta --with pmix > /usr/local/src/slurm/slurm-17.02.2.tar.bz2" hoping to enable pmix plugin, > but it does not show up as a plugin on the installed slurm. "srun > --mpi=pmix" would say "Couldn't find the specified pluin name for mpi/pmix" Do you have the PMIx development headers available on the system, and in a location that would be detected automatically? I believe you can pass a location in as an argument there add that to the search path. There should be a warning in the rpmbuild logs indicating that it could not find , and is not building those modules. > 2) We have issue running Rmpi (R package) on slurm, this is a rather long > story so hopefully you can understand it better. Hopefully you can provide a > solution. > The R script (slurm_test.r) we are trying to run is quite simple. The script > has a loop with three iterations. Each iteration calls a function (foo) that > will apply a certain operation using threads by creating and stopping a > "cluster". R integration falls outside the scope of what we support; it's unclear to me where the problem is here but I do not believe it's within Slurm. My best guess is that the MPI it is using under the covers is not being made aware of the resources that have been assigned to the job. Exactly how to ensure that MPI stack knows about the additional CPUs and nodes available is something you'll need to investigate further. - Tim Hey Hadrian - Were you able to get things up and running correctly? I was expecting an answer to my questions from comment 1; if you've sorted this out please let me know if I can close this out. - Tim (In reply to Tim Wickberg from comment #1) > (In reply to Hadrian from comment #0) > > Hi, > > > > We have a couple of questions about implementing OpenMPI with Slurm 17.02.2. > > > > 1) We use "rpmbuild -ta --with pmix > > /usr/local/src/slurm/slurm-17.02.2.tar.bz2" hoping to enable pmix plugin, > > but it does not show up as a plugin on the installed slurm. "srun > > --mpi=pmix" would say "Couldn't find the specified pluin name for mpi/pmix" > > Do you have the PMIx development headers available on the system, and in a > location that would be detected automatically? I believe you can pass a > location in as an argument there add that to the search path. > > There should be a warning in the rpmbuild logs indicating that it could not > find , and is not building those modules. > > > 2) We have issue running Rmpi (R package) on slurm, this is a rather long > > story so hopefully you can understand it better. Hopefully you can provide a > > solution. > > The R script (slurm_test.r) we are trying to run is quite simple. The script > > has a loop with three iterations. Each iteration calls a function (foo) that > > will apply a certain operation using threads by creating and stopping a > > "cluster". > > R integration falls outside the scope of what we support; it's unclear to me > where the problem is here but I do not believe it's within Slurm. > > My best guess is that the MPI it is using under the covers is not being made > aware of the resources that have been assigned to the job. Exactly how to > ensure that MPI stack knows about the additional CPUs and nodes available is > something you'll need to investigate further. Marking this closed as 'timedout'. Please reopen if you'd like to continue to discuss this, and please respond to the questions from comment 1. - Tim (In reply to Tim Wickberg from comment #2) > Hey Hadrian - > > Were you able to get things up and running correctly? I was expecting an > answer to my questions from comment 1; if you've sorted this out please let > me know if I can close this out. > > - Tim |