Ticket 3739

Summary: Job rejected with Invalid job array specification
Product: Slurm Reporter: David Backeberg <david.backeberg>
Component: SchedulingAssignee: Tim Wickberg <tim>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: natchiagenter
Version: 16.05.10   
Hardware: Linux   
OS: Linux   
Site: Yale Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description David Backeberg 2017-04-26 14:27:50 MDT
[root@grace2.grace ~]# scontrol show config | grep Max
MaxArraySize            = 10001
MaxJobCount             = 100000
MaxJobId                = 2147418112
MaxMemPerNode           = UNLIMITED
MaxStepCount            = 40000
MaxTasksPerNode         = 512
PriorityMaxAge          = 14-00:00:00

[mw564@c01n02 ~]$ sbatch -vvv runSlurm.sh
sbatch: debug2: Found in script, argument "--partition=pi_ohern"
sbatch: debug2: Found in script, argument "--job-name=Dmin"
sbatch: debug2: Found in script, argument "--array=1-1152"
sbatch: debug2: Found in script, argument "--ntasks=1"
sbatch: debug2: Found in script, argument "--mem-per-cpu=4000"
sbatch: debug2: Found in script, argument "--time=24:00:00"
sbatch: debug2: Found in script, argument "--output=Job_%A.out"
sbatch: debug2: Found in script, argument "--error=Job_%A.err"
sbatch: defined options for program `sbatch'
sbatch: ----------------- ---------------------
sbatch: user : `mw564'
sbatch: uid : 10925
sbatch: gid : 10017
sbatch: cwd : /gpfs/home/fas/ohern/mw564
sbatch: ntasks : 1 (set)
sbatch: nodes : 1 (default)
sbatch: jobid : 4294967294 (default)
sbatch: partition : pi_ohern
sbatch: profile : `NotSet'
sbatch: job name : `Dmin'
sbatch: reservation : `(null)'
sbatch: wckey : `(null)'
sbatch: distribution : unknown
sbatch: verbose : 3
sbatch: immediate : false
sbatch: overcommit : false
sbatch: time_limit : 1440
sbatch: nice : -2
sbatch: account : (null)
sbatch: comment : (null)
sbatch: dependency : (null)
sbatch: qos : (null)
sbatch: constraints : mem-per-cpu=4000M
sbatch: geometry : (null)
sbatch: reboot : yes
sbatch: rotate : no
sbatch: network : (null)
sbatch: array : 1-1152
sbatch: cpu_freq_min : 4294967294
sbatch: cpu_freq_max : 4294967294
sbatch: cpu_freq_gov : 4294967294
sbatch: mail_type : NONE
sbatch: mail_user : (null)
sbatch: sockets-per-node : -2
sbatch: cores-per-socket : -2
sbatch: threads-per-core : -2
sbatch: ntasks-per-node : 0
sbatch: ntasks-per-socket : -2
sbatch: ntasks-per-core : -2
sbatch: mem_bind : default
sbatch: plane_size : 4294967294
sbatch: propagate : NONE
sbatch: switches : -1
sbatch: wait-for-switches : -1
sbatch: core-spec : NA
sbatch: burst_buffer : `(null)'
sbatch: remote command : `/gpfs/home/fas/ohern/mw564/runSlurm.sh'
sbatch: power :
sbatch: wait : yes
sbatch: debug: propagating SLURM_PRIO_PROCESS=0
sbatch: debug: propagating UMASK=0022
sbatch: debug: auth plugin for Munge (http://code.google.com/p/munge/) loaded
sbatch: Linear node selection plugin loaded with argument 20
sbatch: Consumable Resources (CR) Node Selection plugin loaded with argument 20
sbatch: Cray node selection plugin loaded
sbatch: Serial Job Resource Selection plugin loaded with argument 20
sbatch: error: Batch job submission failed: Invalid job array specification
sbatch: debug2: spank: x11.so: exit = 0
Comment 1 Tim Wickberg 2017-04-26 14:52:37 MDT
(In reply to David Backeberg from comment #0)
> [root@grace2.grace ~]# scontrol show config | grep Max
> MaxArraySize            = 10001
> MaxJobCount             = 100000
> MaxJobId                = 2147418112
> MaxMemPerNode           = UNLIMITED
> MaxStepCount            = 40000
> MaxTasksPerNode         = 512
> PriorityMaxAge          = 14-00:00:00

Did you happen to change MaxArraySize recently, then update it with 'scontrol reconfigure'?

If so, restarting it completely should clear this up - that value is cached internally and needs a full restart to change (although it will start to appear as the new value in 'scontrol show config' even with the older limit still in force).
Comment 2 Tim Wickberg 2017-05-23 18:57:10 MDT
Marking resolved/infogiven; please reopen is there is anything further I can help with on this.

cheers,
- Tim