OSDs keep coring
Issue
-
OSDs on the machine will not stay up. Sometimes they crash immediately, and sometimes they crash after tens of minutes of runtime. They are crashing hard, leaving behind multigigabyte core files.
-
Multiple rounds of OSD restarts, and another soft reboot of the machine have been ineffective. Starting OSDs with long waits between each OSD is ineffective. They just keep falling over again. Currently 9 out of 36 OSDs on the machine are up.
-
The stack track looks like this:
0> 2015-03-09 15:05:00.482720 7f21a4658700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f21a4658700 time 2015-03-09 15:05:00.481864
common/Thread.cc: 129: FAILED assert(ret == 0)
ceph version 0.80.8-3-g399e8dc (399e8dcef0142b46b83f702342ca63420118d5b7)
1: (Thread::create(unsigned long)+0x8a) [0xa673da]
2: (SimpleMessenger::add_accept_pipe(int)+0x6c) [0xa5ddfc]
3: (Accepter::entry()+0x218) [0xb16ac8]
4: (()+0x7e9a) [0x7f21baee1e9a]
5: (clone()+0x6d) [0x7f21b98994bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2015-03-09 15:05:00.620454 7f21a4658700 -1 *** Caught signal (Aborted) **
in thread 7f21a4658700
ceph version 0.80.8-3-g399e8dc (399e8dcef0142b46b83f702342ca63420118d5b7)
1: /usr/bin/ceph-osd() [0x99d83a]
2: (()+0xfcb0) [0x7f21baee9cb0]
3: (gsignal()+0x35) [0x7f21b97dd445]
4: (abort()+0x17b) [0x7f21b97e0bab]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f21ba12b69d]
6: (()+0xb5846) [0x7f21ba129846]
7: (()+0xb5873) [0x7f21ba129873]
8: (()+0xb596e) [0x7f21ba12996e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0xa7fbff]
10: (Thread::create(unsigned long)+0x8a) [0xa673da]
11: (SimpleMessenger::add_accept_pipe(int)+0x6c) [0xa5ddfc]
12: (Accepter::entry()+0x218) [0xb16ac8]
13: (()+0x7e9a) [0x7f21baee1e9a]
14: (clone()+0x6d) [0x7f21b98994bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Environment
- Red Hat Ceph Storage
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.