Ceph - QEMU crash when Ceph pool pg_num is changed
Issue
When changing pg_num on a Ceph pool, QEMU VMs using a PG that is modified by the pg_num change operation die a horrible death:
osd/osd_types.cc: In function 'bool pg_t::is_split(unsigned int, unsigned int,
std::set<pg_t>*) const' thread 7fd5e7fff700 time 2015-08-20 15:38:42.272380
osd/osd_types.cc: 459: FAILED assert(m_seed < old_pg_num)
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
1: (()+0x15376b) [0x7fd5fd94576b]
2: (()+0x222f11) [0x7fd5fda14f11]
3: (()+0x222fed) [0x7fd5fda14fed]
4: (()+0xc5379) [0x7fd5fd8b7379]
5: (()+0xdc4bc) [0x7fd5fd8ce4bc]
6: (()+0xdcd0a) [0x7fd5fd8ced0a]
7: (()+0xde272) [0x7fd5fd8d0272]
8: (()+0xe3fef) [0x7fd5fd8d5fef]
9: (()+0x2c3ba9) [0x7fd5fdab5ba9]
10: (()+0x2f15cd) [0x7fd5fdae35cd]
11: (()+0x8182) [0x7fd5f946c182]
12: (clone()+0x6d) [0x7fd5f919947d]
I found a Ceph bug (http://tracker.ceph.com/issues/10399) that is the exact issue and it looks like a fix was just merged to master. This ticket is to express our interest in seeing this quickly backported to Hammer.
Environment
- Red Hat Ceph Storage 1.3
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.