Ticket 2885 - slurmctld crash on startup
Summary: slurmctld crash on startup
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 16.05.2
Hardware: Linux Linux
: --- 2 - High Impact
Assignee: Dominik Bartkiewicz
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-07-11 05:46 MDT by Paul Edmon
Modified: 2016-07-13 09:00 MDT (History)
1 user (show)

See Also:
Site: Harvard University
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 16.05.3
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
crash log (50.07 KB, text/x-log)
2016-07-11 05:47 MDT, Paul Edmon
Details
slurmctld backtrace (14.83 KB, text/x-log)
2016-07-11 06:00 MDT, Paul Edmon
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Paul Edmon 2016-07-11 05:46:45 MDT
This is on our test cluster so while it is a high serverity bug in that it causes a crash, it is not on our production servers so its not currently causing us problems.  When restarting slurmctld in 16.05.2 the ctld crashes on start.  This may be due to a specific job that is faulty, but slurm shouldn't crash on that.  I've included the log and I have a core dump.  The ctld is running CentOS 7.

Let me know what other info you need from the core dump.  Thanks.

-Paul Edmon-
Comment 1 Paul Edmon 2016-07-11 05:47:17 MDT
Created attachment 3292 [details]
crash log
Comment 2 Dominik Bartkiewicz 2016-07-11 05:56:58 MDT
Can you get full backtrace from all thread?

thread apply all bt full

Dominik
Comment 3 Paul Edmon 2016-07-11 06:00:11 MDT
Created attachment 3293 [details]
slurmctld backtrace
Comment 6 Dominik Bartkiewicz 2016-07-11 07:54:42 MDT
This bug was fixed in commit 65b4f283ef2  (https://github.com/SchedMD/slurm/commit/65b4f283ef2a908b6e3e8921acf62dad73528f00.patch).
This patch will be included also in 16.05.3 release.

Dominik
Comment 7 Paul Edmon 2016-07-11 07:58:15 MDT
Thanks!

-Paul Edmon-

On 07/11/2016 09:54 AM, bugs@schedmd.com wrote:
>
> *Comment # 6 <https://bugs.schedmd.com/show_bug.cgi?id=2885#c6> on bug 
> 2885 <https://bugs.schedmd.com/show_bug.cgi?id=2885> from Dominik 
> Bartkiewicz <mailto:bart@schedmd.com> *
> This bug was fixed in commit 65b4f283ef2
> (https://github.com/SchedMD/slurm/commit/65b4f283ef2a908b6e3e8921acf62dad73528f00.patch).
> This patch will be included also in 16.05.3 release.
>
> Dominik
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>
Comment 9 Dominik Bartkiewicz 2016-07-13 09:00:54 MDT
Please reopen if the problem still exists after this commit.

Dominik