Bug 1634

Summary: Trying to understand what sbatch does
Product: Slurm Reporter: Marios Hadjieleftheriou <marioh>
Component: OtherAssignee: David Bigagli <david>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: bsantos
Version: 14.11.0   
Hardware: Linux   
OS: Linux   
URL: https://wp.me/dk5d2
Site: Lion Cave Capital Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Marios Hadjieleftheriou 2015-05-01 06:26:25 MDT
I have this simple sbatch script:
1 #!/bin/bash
2 #SBATCH --time=0
3 #SBATCH -n 23
4 srun bash -c "sleep 1; hostname"

When I run it, I expect it to request 23 cores, but then run a single job, and simply print a hostname. Nevertheless, when I execute it the srun launces 23 times.

Actually, I have a simple problem that I cannot solve. I have 1000 individual tasks (one core per task) that I am trying to run. They can all run in parallel. There is no interaction between tasks. Currently, I launch the jobs as 1000 individual sruns. Is there a better way of doing this, so that the jobs execute as a group, with a single submission to slurm, so that the whole batch allocation happens at once? I am trying to avoid having such batch submissions from multiple users interleave with each other.
Comment 1 David Bigagli 2015-05-01 06:47:57 MDT
Hi,
   then -n,--ntasks option makes sbatch to request an allocation for the number of requested tasks in your case 32 and then srun launches these 32 tasks.
See here: http://slurm.schedmd.com/sbatch.html

For you problem you can run one sbatch requesting -n 1000 tasks and then run
the tasks. If there is a different binary for different tasks you can use
the --multi-prog srun option see here: http://slurm.schedmd.com/srun.html.

David
Comment 2 Moe Jette 2015-05-01 07:06:27 MDT
Something else Slurm can do for you is manage the resources in your allocation to run a bunch of job steps. For example:

#!/bin/bash
#SBATCH --time=100            # You probably don't want 0
#SBATCH -n 4                  # 4 CPUs
srun -n1 --exclusive app1 &   # 8 serial applications
srun -n1 --exclusive app2 &
srun -n1 --exclusive app3 &
srun -n1 --exclusive app4 &
srun -n1 --exclusive app5 &
srun -n1 --exclusive app6 &
srun -n1 --exclusive app7 &
srun -n1 --exclusive app8 &
wait

That will give you 4 CPUs to start and then run the 8 apps (each on their own CPU) with new applications being allocated a CPU as they become available.
Comment 3 Marios Hadjieleftheriou 2015-05-01 07:36:26 MDT
This is great.

I am trying multi-prog. Here is my script:
0 bash -c 'hostname'
1 bash -c 'hostname'
2 bash -c 'hostname'
3 bash -c 'hostname'
4 bash -c 'hostname'

Then I run:                                      
srun -v --multi-prog multi --mem 30000 my_script.

I expect slurm to allocate each job to a different node, because each node has 48GB. Nevertheless, they all run on the same node. How can I specify a mem limit per job?

Also I tried setting the job limit to a huge value and srun run without complaints. But if I do:
srun --mem 49000 hostname
I get:
srun: error: Memory specification can not be satisfied
srun: error: Unable to allocate resources: Requested node configuration is not available

So it seems to be that multi-prog ignores --mem
Comment 4 David Bigagli 2015-05-01 08:52:54 MDT
If you want to distribute the tasks across hosts use the --ntasks-per-node option.
This tells the scheduler how do you want to lay out the tasks.

For example:

sbatch -o here -n 2 --ntasks-per-node=1 ./conf
cat conf
#!/bin/sh
srun --multi-prog uu.conf
cat uu.conf 
0 bash -c 'hostname'
1 bash -c 'hostname'

The --mem option indicates the memory usage per task.

David
Comment 5 Marios Hadjieleftheriou 2015-05-01 11:20:54 MDT
I am not trying to distribute them manually. I want slurm to distribute them. I want to run a multi-prog srun with 1000 jobs. Every job has exactly the same memory constraint (20GBs).

If I run 1000 individual sruns --mem 20GB X, then slurm knows that each one needs 20GB and it will distribute two jobs per 48GB node (we have accounting enabled). This works great for us, but we do need to submit 1000 sruns which seems unnecessary since they all belong to the same task.

I was expecting the same behavior from srun --mem 20GB --multi-prog, but instead srun now runs 12 jobs on each node, completely ignoring the memory constraint. In fact, --multi-prog seems to ignore all other arguments to srun.

I don't think I should be using sbatch. I have different binaries that I need to run. They all run in parallel, one core per binary. srun multi-prog seems the natural choice for that.
Comment 6 Moe Jette 2015-05-01 12:00:47 MDT
(In reply to Marios Hadjieleftheriou from comment #5)
> I am not trying to distribute them manually. I want slurm to distribute
> them. I want to run a multi-prog srun with 1000 jobs. Every job has exactly
> the same memory constraint (20GBs).
> 
> If I run 1000 individual sruns --mem 20GB X, then slurm knows that each one
> needs 20GB and it will distribute two jobs per 48GB node (we have accounting
> enabled). This works great for us, but we do need to submit 1000 sruns which
> seems unnecessary since they all belong to the same task.
> 
> I was expecting the same behavior from srun --mem 20GB --multi-prog, but
> instead srun now runs 12 jobs on each node, completely ignoring the memory
> constraint. In fact, --multi-prog seems to ignore all other arguments to
> srun.

the --mem=20gb option means that you want 20gb per node, regardless of the task count. See --mem-per-cpu.

What exactly is your execute line (to srun, also salloc or sbatch if run within an existing allocation.

> I don't think I should be using sbatch. I have different binaries that I
> need to run. They all run in parallel, one core per binary. srun multi-prog
> seems the natural choice for that.

Sure, but sbatch or salloc can be used to submit the job and then use srun --multi-prog to launch the different tasks.
Comment 7 Marios Hadjieleftheriou 2015-05-01 12:10:55 MDT
(In reply to Moe Jette from comment #6)
> (In reply to Marios Hadjieleftheriou from comment #5)
> > I am not trying to distribute them manually. I want slurm to distribute
> > them. I want to run a multi-prog srun with 1000 jobs. Every job has exactly
> > the same memory constraint (20GBs).
> > 
> > If I run 1000 individual sruns --mem 20GB X, then slurm knows that each one
> > needs 20GB and it will distribute two jobs per 48GB node (we have accounting
> > enabled). This works great for us, but we do need to submit 1000 sruns which
> > seems unnecessary since they all belong to the same task.
> > 
> > I was expecting the same behavior from srun --mem 20GB --multi-prog, but
> > instead srun now runs 12 jobs on each node, completely ignoring the memory
> > constraint. In fact, --multi-prog seems to ignore all other arguments to
> > srun.
> 
> the --mem=20gb option means that you want 20gb per node, regardless of the
> task count. See --mem-per-cpu.

Even if that's the case, as I mentioned earlier, I can run "srun --mem 1billion GBs --multi-prog cmnd" and it will run fine. But if I do "srun --mem 1billion GBs cmnd" then it complains.

I was under the impression that with multi-prog, each line in the multi-prog config is it's own independent task. Otherwise multi-prog is the same as sbatch, right?

> 
> What exactly is your execute line (to srun, also salloc or sbatch if run
> within an existing allocation.

Our sruns look like this:
surn -J blah --mem 20GB work -X -Y -Z date1
surn -J blah --mem 20GB work -X -Y -Z date2
surn -J blah --mem 20GB work -X -Y -Z date3
surn -J blah --mem 20GB work -X -Y -Z date4
...

This works great, but I was hoping that I can do the same, but have only one job named "blah" when I do squeue. Also, I hate having to send to the controller 1000 sruns suddenly. It seems completely unnecessary. We also have to manage 1000 active sruns on the client side. Also not great.
Comment 8 David Bigagli 2015-05-04 05:53:42 MDT
Do you specify the memory requirement after the the --multi-prog?
Whatever is specified after the command is assumed to argument to the command itself. 

srun -n 4 --mem=50GB  --multi-prog multi.conf 
srun: error: Memory specification can not be satisfied
srun: Force Terminated job 375
srun: error: Unable to allocate resources: Requested node configuration is not available

versus

srun -n 4 --multi-prog multi.conf --mem=50GB 

which runs even if I don't have 50GB.

Multiprog is similar to sbatch indeed, in both cases you have one jobid in the
squeue output.

David
Comment 9 Marios Hadjieleftheriou 2015-05-04 07:21:09 MDT
Let me try this a different way.

I have servers with 48GB of memory and I have this set of sruns that I run:
#!/bin/bash
srun -J X --mem=20GB hostname &
srun -J X --mem=20GB hostname &
srun -J X --mem=20GB hostname &
srun -J X --mem=20GB hostname &
wait

Output:
s1
s1
s2
s2

All run in parallel, two on each machine, given the memory constraints of each job (or job step rather, since all of this belong to the same job from out point of view).

I would like a single srun or sbatch command that is equivalent to what happens above. 

Here is what I have tried so far.

srun --mem=20GB --multi-prog multi
multi:
0 hostname
1 hostname
2 hostname
3 hostname

Output:
s1
s1
s1
s1

The memory constraint clearly applies to the srun itself, not each individual job of multi-prog.

sbatch batch
#!/bin/bash
#SBATCH --time=100
#SBATCH -n 4
srun --mem=20GB hostname &
srun --mem=20GB hostname &
srun --mem=20GB hostname &
srun --mem=20GB hostname &
wait

Here I see the batch script in squeue. It is assigned to host s1. I top host s1, and I see for srun --mem=20GB hostname commands running. The commands run indefinitely, never producing any output. I actually have to manual cancel the batch script.
Comment 10 David Bigagli 2015-05-04 08:54:03 MDT
If you want to apply the limit to each task in an allocation use the 
--mem-per-cpu variable.

David
Comment 11 Marios Hadjieleftheriou 2015-05-05 05:56:06 MDT
Ok, so now I tried this batch file:
#!/bin/bash
#SBATCH --time=100            # You probably don't want 0
#SBATCH -n 4                  # 4 CPUs
#SBATCH --mem-per-cpu 20GB
srun hostname &
srun hostname &
srun hostname & 
srun hostname &
wait

Here is the output:
lccs8
lccs10
lccs9
lccs9
srun: Job step creation temporarily disabled, retrying
srun: Job step creation temporarily disabled, retrying
lccs8
lccs10
lccs9
lccs9
srun: Job step created
lccs8
lccs10
lccs9
lccs9
srun: Job step created
lccs8
lccs10
lccs9
lccs9

I was expecting to see only 4 hostname invocations, but clearly each invocation is called 4 times, for a total of 16.

I would greatly appreciate it if you could provide something that replicates the scenario I mentioned in my last post:

I have servers with 48GB of memory and a set of sruns that I run:
myscript.sh:
#!/bin/bash
srun -J X --mem=20GB hostname &
srun -J X --mem=20GB hostname &
srun -J X --mem=20GB hostname &
srun -J X --mem=20GB hostname &
wait

> ./myscript.sh
s1
s1
s2
s2
>

It is a very simple use case. There must be a way to do this without having to run individual sruns (because in our case we have 1000s of sruns per script).
Comment 12 Marios Hadjieleftheriou 2015-05-05 06:01:26 MDT
Let me also re-iterate that each srun can execute a different binary, or have different arguments. But the only two things they always share is the job name -J X and the memory limit --mem 20GB.
Comment 13 David Bigagli 2015-05-05 06:23:29 MDT
This will do what you want:

srun -n 4 --mem-per-cpu=20GB -J x --multi-prog multi.conf 

I have 2 hosts with 32BG each and I want to run 2 tasks per host like you.

$ cat multi.conf 
0 hostname
1 hostname
2 hostname
3 hostname

srun -n 4 --mem-per-cpu=15GB --multi-prog multi.conf 
prometeo
prometeo
dario
dario

Your script ran 16 times because every srun had 4 tasks allocated.

David
Comment 14 Marios Hadjieleftheriou 2015-05-05 07:20:33 MDT
Thank you!

So, here is the next hurdle.

I try your example using 2000 commands inside the multi-config file, like this:
srun -n 1000 --mem-per-cpu 20GB --multi-prog multi 
srun: error: Invalid task id, 1000 >= ntasks
srun: error: Line 1001 of configuration file multi invalid

What I need is to execute 1000 at most at a time. But I have 2000 of them. In fact I have more than the total number of cores available in the system.

Notice that my approach of submitting 2000 individual sruns will work fine in this case. I will instantly get all the cores filled, and extra sruns will be queued, waiting.
Comment 15 David Bigagli 2015-05-05 07:53:56 MDT
The numbering in the --multi-prog file starts from 0. To run -n 4 these are the entries in the file:

cat multi.conf 
0 hostname
1 hostname
2 hostname
3 hostname

I think that your method of submitting individual srun is just fine. You cannot ask on the srun command line more tasks than cores you would have to submit srun
in group of maximum number of cores.

David
Comment 16 Marios Hadjieleftheriou 2015-05-05 08:02:50 MDT
Great! At least we know that we are not doing something totally silly. Feel free to close the request.
Comment 17 David Bigagli 2015-05-05 08:07:10 MDT
Ok. Let's us know if you have any other question.

David
Comment 18 Marios Hadjieleftheriou 2015-05-08 03:43:44 MDT
So, today a user tried to start a job that resulted in 10000 concurrent sruns. The problem is that after some time queueing sruns he got this error:
srun: error: Error binding slurm stream socket: Address already in use

Also, most sruns eventually got terminated, but we still ended up with 112 zombie sruns. They appear in the queue occupying resources, but there is no job running of the designated server.

This would not be an issue if we could run all these sruns by using a single srun or sbatch invocation, as long as we could replicate the bevavior of 10000 distinct sruns.

Should I file this as a feature request? Is there another way around it?
Comment 19 Moe Jette 2015-05-08 04:04:04 MDT
(In reply to Marios Hadjieleftheriou from comment #18)
> So, today a user tried to start a job that resulted in 10000 concurrent
> sruns. The problem is that after some time queueing sruns he got this error:
> srun: error: Error binding slurm stream socket: Address already in use

Are these srun's within a job allocation created by salloc or sbatch rather than each being a separate job allocation?


> Also, most sruns eventually got terminated, but we still ended up with 112
> zombie sruns. They appear in the queue occupying resources, but there is no
> job running of the designated server.

What about the mechanism listed in comment #2?


> This would not be an issue if we could run all these sruns by using a single
> srun or sbatch invocation, as long as we could replicate the bevavior of
> 10000 distinct sruns.
> 
> Should I file this as a feature request?

If you want them all to run independently, then they need to all run as separate srun commands, either as job steps within an existing job (see comment #2) or as separate jobs. That is exactly what srun is designed to do. You might also look into job arrays, which are a lightweight mechanism to manage a large number of similar jobs (e.g. "sbatch --array=1-10000 ...").


>  Is there another way around it?

The "Address already in use" error is likely due to exhausting your Linux pool of sockets. There is a description of how to configure your system for high-throughput computing here:
http://slurm.schedmd.com/high_throughput.html
Make particular note of the section "System configuration".
Comment 20 Marios Hadjieleftheriou 2015-05-08 06:31:36 MDT
(In reply to Moe Jette from comment #19)
> (In reply to Marios Hadjieleftheriou from comment #18)
> > So, today a user tried to start a job that resulted in 10000 concurrent
> > sruns. The problem is that after some time queueing sruns he got this error:
> > srun: error: Error binding slurm stream socket: Address already in use
> 
> Are these srun's within a job allocation created by salloc or sbatch rather
> than each being a separate job allocation?

Each srun is a separate job allocation, but they all shares the same job name.

> 
> 
> > Also, most sruns eventually got terminated, but we still ended up with 112
> > zombie sruns. They appear in the queue occupying resources, but there is no
> > job running of the designated server.
> 
> What about the mechanism listed in comment #2?

Comments 9 to 15 explain why none of the methods offered in this thread (except from running thousands of independent sruns) covers our use case.

> 
> 
> > This would not be an issue if we could run all these sruns by using a single
> > srun or sbatch invocation, as long as we could replicate the bevavior of
> > 10000 distinct sruns.
> > 
> > Should I file this as a feature request?
> 
> If you want them all to run independently, then they need to all run as
> separate srun commands, either as job steps within an existing job (see
> comment #2) or as separate jobs. That is exactly what srun is designed to
> do. You might also look into job arrays, which are a lightweight mechanism
> to manage a large number of similar jobs (e.g. "sbatch --array=1-10000 ...").

If you can give me a way to replicate exactly what is described in comments 9 and 14, I would appreciate it. But my take-away from this thread was that slurm does not strictly comply to our use case. The issue is that we want to queue a lot more steps than physical cores, and have each step start executing the moment a core is available, or wait on the queue otherwise; in other words if my job has 10000 steps and needs 10000 cores, I don't want it to wait until 10000 cores are available (because we don't have that many anyway), before the first step starts executing.

> 
> 
> >  Is there another way around it?
> 
> The "Address already in use" error is likely due to exhausting your Linux
> pool of sockets. There is a description of how to configure your system for
> high-throughput computing here:
> http://slurm.schedmd.com/high_throughput.html
> Make particular note of the section "System configuration".

This is great. I will see if I can increase the limits.
Comment 21 Moe Jette 2015-05-08 06:42:49 MDT
(In reply to Marios Hadjieleftheriou from comment #20)
> > >  Is there another way around it?
> > 
> > The "Address already in use" error is likely due to exhausting your Linux
> > pool of sockets. There is a description of how to configure your system for
> > high-throughput computing here:
> > http://slurm.schedmd.com/high_throughput.html
> > Make particular note of the section "System configuration".
> 
> This is great. I will see if I can increase the limits.

I would also suggest looking at using sbatch with job arrays rather than using srun. That should make it easier to manage this set of jobs as a single entity (e.g. cancel all of them with a single command).
Comment 22 Marios Hadjieleftheriou 2015-05-19 06:12:02 MDT
We are still struggling with this.

I cannot use job arrays. Let me repeat my use case:I have a job with 1000 independent tasks. I want to launch the job as a single entity. Suppose I have 1000 total cores, and only 500 of them are available at the moment. I want to have 500 tasks start executing immediately, and the other 500 get queued in slurm, so that they can get priority. Once a new core is available, I would like the next task to run. I do not want to have to specify how many tasks should run at a time. I want to run as many of them as possible, at any given moment. So, if for example suddenly a different jobs finishes and releases an extra 200 cores, I would like to have another 200 tasks start (for a total of 700 now).

If my understanding is correct, sbatch needs to know how many core it should get. Because it does an salloc up-front, to get all the resources that the job needs. The job will not start until all the requested resources are available. In my case I do not want to request resources. I want to use as many of them as available dynamically.

Now, it is ok for us to run 1000s of sruns. That is not a problem. But we are hitting some limits lately. Specifically, we configured our controller server based on your high throughput recommendations. Nevertheless, we are unable to queue more than about 9900 jobs in the slurm queue. srun returns with error:
"srun: error: Slurm controller not responding, sleeping and retrying."

We tried running the slurmctrld daemon with ulimit 16000, but that did not help.

Do you have any ideas of what could be causing this strange queue length limit?

Thanks!
Comment 23 David Bigagli 2015-05-19 07:05:46 MDT
Hi,
   the message indicates the slurmctld is busy so srun is timing out.
Could you send us your slurmctld log fil, the slurm.conf and the output of
sdiag, this will give us better indications on what is going on.

As Moe said in his previous message we suggest to use job arrays. You can configure a large MaxArraySize so you can submit very large array.
Each element of the job array will start running as soon as there will
be available resources.

David
Comment 24 Marios Hadjieleftheriou 2015-05-19 09:10:14 MDT
Great. I am trying the job arrays now.

I looked through the controller logs and I saw this:
error: _create_job_record: MaxJobCount reached (10000)
In our config we do not specify this parameter, so I guess the default is 10000.

That takes care of that!
Comment 25 David Bigagli 2015-05-19 09:25:07 MDT
MaxArraySize can be up to 4000001. In your batch script you have to execute different binaries based on SLURM_ARRAY_TASK_ID.

David
Comment 26 Marios Hadjieleftheriou 2015-05-21 05:15:45 MDT
The array job seems to be working great so far.

Is there an easy way to know when the batch is complete? Ideally I would like the sbatch command to block until all jobs are done, but that does not seem possible.

The only way I have found so far is to launch an extra job with a depend afterany option, but that seems like a hack.
Comment 27 David Bigagli 2015-05-21 05:24:27 MDT
Another possibility is to get notified by email using the --mail-type
option of sbatch.

David
Comment 28 Marios Hadjieleftheriou 2015-05-21 06:11:00 MDT
I need to script this. Once the batch is done, there is a whole lot of other processing that needs to happen before a user can be notified (outside of slurm).

Generally speaking, if there was an option for sbatch to block and wait until the batch is complete, it would be extremely convenient.

So, I tried using afterok and afternotok, but I don't see how these are useful. If the batch is successful, the afternotok script hangs for eternity, and vice versa if the batch is not successful.

The only way I have currently to know that a batch is done with success or failure is to do an scontrol show job=X and collect the exit statuses. But even for that, it appears that after some time (5 minutes?) the controller eliminates the status of older jobs.
Comment 29 David Bigagli 2015-05-21 06:32:09 MDT
Agreed that having a a blocking option would be useful. 
Did you try to use afterany specifying the job array id? That would start the dependent after the entire array is done regardless of the exit status.

The time to clean up the finished jobs can be configured using the MinJobAge
parameter in slurm.conf.

David
Comment 30 Marios Hadjieleftheriou 2015-05-21 06:39:59 MDT
Yes, afterany works well. But how do I know when it executes if I can proceed with processing or notify the user with an error? I would have to scan through the output files for errors.

"scontrol show job" seems like an easier option, because I get the exit code of each job directly without having to also check the output files per job. So I just have a script that wakes up every 5 minutes and checks the state.

Alternatively, I can put a trap on my batch script to echo 0 or 1 in the error output file of each job. Then I can have a process that wakes up every 5 minutes and checks all error files for 0 or 1.

None of these sound robust to me. More like hacks.

Should I submit a request for a blocking sbatch flag?
Comment 31 David Bigagli 2015-05-21 06:48:28 MDT
You need a workflow manager basically. One possibility would be having the epilog
for each job to to save the exit code in some directory with the jobid name, I am sure there could be other solutions as well like yours.

You can log a enhancement request to have a blocking sbatch. 

David
Comment 32 Marios Hadjieleftheriou 2015-05-21 06:56:11 MDT
I have a work flow manager. But the manager has no idea when the sbatch part of the job finishes, because slurm does not have any other way to notify it, except sending an email essentially. So, my work flow manager now has to poll (in any way possible) to figure out when the batch has finished.

The sbatch job array already saves one error and one output file per job_task_id. In my script I trap EXIT and save the exit code in the error file (which is the same as what you suggested). It is just not convenient having to poll over 1000 files to figure out when all of them have written their exit code. In addition, what if jobs print other stuff in stderr? E.g., what if a job prints "1" or "0" in stderr (outside of the trap)? In that case polling will get confused. It will think that the job has finished, when it actually hasn't. So I need to be very careful what I run in these batch scripts. So, scontrol actually seems like the safest option.

I will ask for an enhancement. It seems reasonable to have this feature.
Comment 33 David Bigagli 2015-05-21 06:59:35 MDT
I see your points. What would be the exit status of the blocking sbatch?

David
Comment 34 Marios Hadjieleftheriou 2015-05-21 07:09:47 MDT
I would say the largest exit status across all jobs (if an array), like elsewhere in slurm.

If not an array, then the actual exit status of the script.
Comment 35 Marios Hadjieleftheriou 2015-05-21 07:10:52 MDT
By the way, I would consider the fact that afternotok does not get canceled after a successful job completion as a bug (and vice versa for afterok).
Comment 36 David Bigagli 2015-05-21 08:14:05 MDT
You can specify the kill_invalid_depend parameter in SchedulerParameters 
in slurm.conf to accomplish this.
Comment 37 Steve 2018-12-11 12:53:34 MST
For the sake of anyone else that comes across this (e.g. from Google), you really need to be using a workflow manager that can submit to SLURM for this purpose; I would recommend Nextflow:

https://www.nextflow.io/ 

https://www.nextflow.io/docs/latest/executor.html#slurm

https://www.nextflow.io/docs/latest/config.html#scope-executor

All the difficulties described here would be easily solved with this workflow script (in Nextflow):

process run_hostname {
    executor = 'slurm'
    queue = "cpu_dev"
    clusterOptions = '--ntasks-per-node=1'
    cpus = 1
    memory = "20 GB"
    echo true

    input:
    val(x) from Channel.from(1..5) // change this to 1000, etc

    script:
    """
    hostname
    """
}