Bug 3739 - Job rejected with Invalid job array specification
Summary: Job rejected with Invalid job array specification
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other bugs)
Version: 16.05.10
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-04-26 14:27 MDT by David Backeberg
Modified: 2021-05-25 10:40 MDT (History)
1 user (show)

See Also:
Site: Yale
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Backeberg 2017-04-26 14:27:50 MDT
[root@grace2.grace ~]# scontrol show config | grep Max
MaxArraySize            = 10001
MaxJobCount             = 100000
MaxJobId                = 2147418112
MaxMemPerNode           = UNLIMITED
MaxStepCount            = 40000
MaxTasksPerNode         = 512
PriorityMaxAge          = 14-00:00:00

[mw564@c01n02 ~]$ sbatch -vvv runSlurm.sh
sbatch: debug2: Found in script, argument "--partition=pi_ohern"
sbatch: debug2: Found in script, argument "--job-name=Dmin"
sbatch: debug2: Found in script, argument "--array=1-1152"
sbatch: debug2: Found in script, argument "--ntasks=1"
sbatch: debug2: Found in script, argument "--mem-per-cpu=4000"
sbatch: debug2: Found in script, argument "--time=24:00:00"
sbatch: debug2: Found in script, argument "--output=Job_%A.out"
sbatch: debug2: Found in script, argument "--error=Job_%A.err"
sbatch: defined options for program `sbatch'
sbatch: ----------------- ---------------------
sbatch: user : `mw564'
sbatch: uid : 10925
sbatch: gid : 10017
sbatch: cwd : /gpfs/home/fas/ohern/mw564
sbatch: ntasks : 1 (set)
sbatch: nodes : 1 (default)
sbatch: jobid : 4294967294 (default)
sbatch: partition : pi_ohern
sbatch: profile : `NotSet'
sbatch: job name : `Dmin'
sbatch: reservation : `(null)'
sbatch: wckey : `(null)'
sbatch: distribution : unknown
sbatch: verbose : 3
sbatch: immediate : false
sbatch: overcommit : false
sbatch: time_limit : 1440
sbatch: nice : -2
sbatch: account : (null)
sbatch: comment : (null)
sbatch: dependency : (null)
sbatch: qos : (null)
sbatch: constraints : mem-per-cpu=4000M
sbatch: geometry : (null)
sbatch: reboot : yes
sbatch: rotate : no
sbatch: network : (null)
sbatch: array : 1-1152
sbatch: cpu_freq_min : 4294967294
sbatch: cpu_freq_max : 4294967294
sbatch: cpu_freq_gov : 4294967294
sbatch: mail_type : NONE
sbatch: mail_user : (null)
sbatch: sockets-per-node : -2
sbatch: cores-per-socket : -2
sbatch: threads-per-core : -2
sbatch: ntasks-per-node : 0
sbatch: ntasks-per-socket : -2
sbatch: ntasks-per-core : -2
sbatch: mem_bind : default
sbatch: plane_size : 4294967294
sbatch: propagate : NONE
sbatch: switches : -1
sbatch: wait-for-switches : -1
sbatch: core-spec : NA
sbatch: burst_buffer : `(null)'
sbatch: remote command : `/gpfs/home/fas/ohern/mw564/runSlurm.sh'
sbatch: power :
sbatch: wait : yes
sbatch: debug: propagating SLURM_PRIO_PROCESS=0
sbatch: debug: propagating UMASK=0022
sbatch: debug: auth plugin for Munge (http://code.google.com/p/munge/) loaded
sbatch: Linear node selection plugin loaded with argument 20
sbatch: Consumable Resources (CR) Node Selection plugin loaded with argument 20
sbatch: Cray node selection plugin loaded
sbatch: Serial Job Resource Selection plugin loaded with argument 20
sbatch: error: Batch job submission failed: Invalid job array specification
sbatch: debug2: spank: x11.so: exit = 0
Comment 1 Tim Wickberg 2017-04-26 14:52:37 MDT
(In reply to David Backeberg from comment #0)
> [root@grace2.grace ~]# scontrol show config | grep Max
> MaxArraySize            = 10001
> MaxJobCount             = 100000
> MaxJobId                = 2147418112
> MaxMemPerNode           = UNLIMITED
> MaxStepCount            = 40000
> MaxTasksPerNode         = 512
> PriorityMaxAge          = 14-00:00:00

Did you happen to change MaxArraySize recently, then update it with 'scontrol reconfigure'?

If so, restarting it completely should clear this up - that value is cached internally and needs a full restart to change (although it will start to appear as the new value in 'scontrol show config' even with the older limit still in force).
Comment 2 Tim Wickberg 2017-05-23 18:57:10 MDT
Marking resolved/infogiven; please reopen is there is anything further I can help with on this.

cheers,
- Tim