Summary: | MPI allocation regression with mpirun | ||
---|---|---|---|
Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
Component: | Scheduling | Assignee: | Marcin Stolarek <cinek> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | CC: | cinek, uemit.seren |
Version: | 20.11.0 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Stanford | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | Sherlock | CLE Version: | |
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Kilian Cavalotti
2020-12-09 19:02:24 MST
Kilian,
I think you know that's kind of out of scope for us, but could you please check openmpi build with the following patch applied to prrte component:
>./orte/mca/plm/slurm/plm_slurm_module.c
>@@ -267,6 +267,11 @@ static void launch_daemons(int fd, short args, void *cbdata)
> /* start one orted on each node */
> opal_argv_append(&argc, &argv, "--ntasks-per-node=1");
>
>+ /* add all CPUs to this task */
>+ cpus_on_node = getenv("SLURM_CPUS_ON_NODE");
>+ asprintf(&tmp, "--cpus-per-task=%s", cpus_on_node);
>+ opal_argv_append(&argc, &argv, tmp);
>+
> if (!orte_enable_recovery) {
> /* kill the job if any orteds die */
> opal_argv_append(&argc, &argv, "--kill-on-bad-exit");
The above snippet shows the final location of file in the openmpi tar.gz (after prrte subproject inclusion).
cheers,
Marcin
Hi Marcin, (In reply to Marcin Stolarek from comment #3) > I think you know that's kind of out of scope for us, but could you please > check openmpi build with the following patch applied to prrte component: Thanks for the suggestion! I opened this bug here because that same Open MPI installation (4.0.3) worked fine with either srun or mpirun in Slurm 20.02, but it doesn't anymore in Slurm 20.11. If a bug was present in Open MPI, it would certainly present itself the same way in Slurm 20.02 and 20.11, right? Since it didn't in 20.02, I'd tend to think that something changed in 20.11 regarding this, and that's what I'm curious about. Cheers, -- Kilian Hi Marcin, I gave a try at your suggestion (adding --cpus-per-task to the srun command that starts the orted daemons on the nodes), and it looks like it works: $ time mpirun --mca plm_slurm_args '--cpus-per-task=4' bash -c 'printf "%s | CPU: %s (pid: %s)\n" $(hostname) $(ps -h -o psr,pid $$)' sh03-01n72.int | CPU: 2 (pid: 27828) sh03-01n72.int | CPU: 1 (pid: 27827) sh03-01n71.int | CPU: 0 (pid: 17352) sh03-01n72.int | CPU: 0 (pid: 27826) sh03-01n71.int | CPU: 1 (pid: 17353) sh03-01n71.int | CPU: 2 (pid: 17354) sh03-01n72.int | CPU: 3 (pid: 27829) sh03-01n71.int | CPU: 3 (pid: 17355) real 0m46.684s user 0m0.013s sys 0m0.040s But recompiling Open MPI to add this workaround won't really fly with our users, I'm afraid. :\ So I'm still looking for what in 20.11 may have introduced that behavior change. Thanks! -- Kilian Kilian, The thing is that we changed the default for steps being '--exclusive'. The patch I shared is not only a workaround, it got merged to prrte[1] Please take a look at Bug 10383 comment 15 - it covers the complexity of the case. I'll go ahead and close the bug as duplicate now. cheers, Marcin [1]https://github.com/openpmix/prrte/commit/0288ebbc15c36e1d3c32f6d12c47237053e06101 *** This ticket has been marked as a duplicate of ticket 10383 *** (In reply to Marcin Stolarek from comment #6) > Kilian, > > The thing is that we changed the default for steps being '--exclusive'. Ah, so that's coming from a Slurm change, indeed. :) But wow, that's a pretty steep direction reversal. I found the mention of that change (4eccd2f9e) in the RELEASE_NOTES indeed, but did realize the kind of impact it would have. Out of curiosity, what was the rationale for that change of the default behavior? > The > patch I shared is not only a workaround, it got merged to prrte[1] > > Please take a look at Bug 10383 comment 15 - it covers the complexity of the > case. > > I'll go ahead and close the bug as duplicate now. Got it, thanks for pointing all this out! Cheers, -- Kilian |