Ticket 630 - Jobs Blocking Primary Scheduler
Summary: Jobs Blocking Primary Scheduler
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 2.6.6
Hardware: Linux Linux
: --- 3 - Medium Impact
Assignee: David Bigagli
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-03-06 04:32 MST by Moe Jette
Modified: 2014-03-19 13:38 MDT (History)
2 users (show)

See Also:
Site: Harvard University
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 14.03.0-pre7
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
attachment-18387-0.html (1.50 KB, text/html)
2014-03-19 13:38 MDT, Moe Jette
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Moe Jette 2014-03-06 04:32:23 MST
Looks like there is another variant on this bug.  Namely if there are jobs pending for a reservation and the reservation is full it will cause the primary scheduler to stop and not schedule any other jobs for that partition.

This may be a variation on bug 595
Comment 1 Moe Jette 2014-03-16 08:29:14 MDT
Fixed in commit 08f0f57cc18824d0984bbbbba39ad27d4e796a62
Comment 2 Moe Jette 2014-03-16 08:30:37 MDT
One more note, these changes are more extensive than what I would want to put into version 2.6, which we want to keep really stable, so they will only be in version 14.03. Same as fixes for keeping the node CPU load average current.
Comment 3 John Morrissey 2014-03-19 07:44:16 MDT
On Sun, Mar 16, 2014 at 08:30:37PM +0000, bugs@schedmd.com wrote:
> --- Comment #2 from Moe Jette <jette@schedmd.com> ---
> One more note, these changes are more extensive than what I would want to
> put into version 2.6, which we want to keep really stable, so they will
> only be in version 14.03. Same as fixes for keeping the node CPU load
> average current.

I was just about to backport this patch so it would apply cleanly, when I
realized we've already got all the patches this depends on in our local
build of 2.6.5, so it applies cleanly already. :-)

Is there a bug open for the stale node load average problem? I poked through
Bugzilla but didn't see one.

Thanks, Moe.

john
Comment 4 David Bigagli 2014-03-19 08:39:09 MDT
Hi John, 
        I don't recall any problem with stale node load average. Could you please log a new bug.

Thanks,
        David
Comment 5 Moe Jette 2014-03-19 13:38:23 MDT
Created attachment 704 [details]
attachment-18387-0.html

Cpu load issue fixed in version14.03. changes too extensive for v2.6. Not sure if I opened bug on it though.

On March 19, 2014 4:39:09 PM EDT, bugs@schedmd.com wrote:
>http://bugs.schedmd.com/show_bug.cgi?id=630
>
>--- Comment #4 from David Bigagli <david@schedmd.com> ---
>
>Hi John, 
>     I don't recall any problem with stale node load average. Could you
>please log a new bug.
>
>Thanks,
>        David
>
>-- 
>You are receiving this mail because:
>You are on the CC list for the bug.
>You reported the bug.