Ceph - Librbd-backed QEMU instances spontaneously crashing with the following failure: FAILED assert(m_ictx->owner_lock.is_locked())

Solution Verified - Updated -

Issue

  • Since upgrading from Upstream Hammer 0.94.2 to 0.94.4 Librbd-backed QEMU instances are spontaneously crashing with the following failure: FAILED assert(m_ictx->owner_lock.is_locked()).
  • Logs from the QEMU instance report the following:
librbd/LibrbdWriteback.cc: In function 'virtual ceph_tid_t librbd::LibrbdWriteback::write(const object_t&, const object_locator_t&, uint64_t, uint64_t, const SnapContext&, const bufferlist&, utime_t, uint64_t, __u32, Context*)' thread 7f28edffb700 time 2015-10-20 11:49:08.120786
librbd/LibrbdWriteback.cc: 160: FAILED assert(m_ictx->owner_lock.is_locked())
 ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
 1: (()+0x17258b) [0x7f291798858b]
 2: (()+0xa9573) [0x7f29178bf573]
 3: (()+0x3a90ca) [0x7f2917bbf0ca]
 4: (()+0x3b583d) [0x7f2917bcb83d]
 5: (()+0x7212c) [0x7f291788812c]
 6: (()+0x9590f) [0x7f29178ab90f]
 7: (()+0x969a3) [0x7f29178ac9a3]
 8: (()+0x4782a) [0x7f291785d82a]
 9: (()+0x56599) [0x7f291786c599]
 10: (()+0x7284e) [0x7f291788884e]
 11: (()+0x162b7e) [0x7f2917978b7e]
 12: (()+0x163c10) [0x7f2917979c10]
 13: (()+0x8182) [0x7f2910e66182]
 14: (clone()+0x6d) [0x7f2910b9347d]
  • CORE dump from crashed QEMU process have following backtrace:
#0  0x00007fb7f95cbcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fb7f95cf0d8 in __GI_abort () at abort.c:89
#2  0x00007fb7f7d12535 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fb7f7d106d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fb7f7d10703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fb7f7d10922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fb800484778 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=160, 
    func=0x7fb80072ae80 <librbd::LibrbdWriteback::write(object_t const&, object_locator_t const&, unsigned long, unsigned long, SnapContext const&, ceph::buffer::list const&, utime_t, unsigned long, unsigned int, Context*)::__PRETTY_FUNCTION__> "virtual ceph_tid_t librbd::LibrbdWriteback::write(const object_t&, const object_locator_t&, uint64_t, uint64_t, const SnapContext&, const bufferlist&, utime_t, uint64_t, __u32, Context*)") at common/assert.cc:77
#7  0x00007fb8003bb573 in librbd::LibrbdWriteback::write (this=0x7fb805415600, oid=..., oloc=..., off=off@entry=1048576, len=len@entry=4096, snapc=..., bl=..., mtime=..., 
    trunc_size=trunc_size@entry=0, trunc_seq=trunc_seq@entry=0, oncommit=oncommit@entry=0x7fb7d00aecc0) at librbd/LibrbdWriteback.cc:160
#8  0x00007fb8006bb0ca in ObjectCacher::bh_write (this=this@entry=0x7fb805415ff0, bh=bh@entry=0x7fb7d00312c0) at osdc/ObjectCacher.cc:847
#9  0x00007fb8006c783d in ObjectCacher::_readx (this=0x7fb805415ff0, rd=0x7fb7d0077480, oset=0x7fb805416a80, onfinish=0x7fb7d006db20, external_call=true)
    at osdc/ObjectCacher.cc:1108
#10 0x00007fb80038412c in librbd::ImageCtx::aio_read_from_cache (this=this@entry=0x7fb805414740, o=..., object_no=object_no@entry=0, bl=bl@entry=0x7fb7d00013d0, 
    len=len@entry=4096, off=off@entry=1671168, onfinish=onfinish@entry=0x7fb7d006db20, fadvise_flags=fadvise_flags@entry=0) at librbd/ImageCtx.cc:614
#11 0x00007fb8003a790f in librbd::aio_read (ictx=ictx@entry=0x7fb805414740, image_extents=..., buf=buf@entry=0x7fb80547a600 "\350= ", pbl=pbl@entry=0x0, 
    c=c@entry=0x7fb80558d690, op_flags=op_flags@entry=0) at librbd/internal.cc:3627
#12 0x00007fb8003a89a3 in librbd::aio_read (ictx=0x7fb805414740, off=1671168, len=4096, buf=0x7fb80547a600 "\350= ", bl=0x0, c=0x7fb80558d690, op_flags=0)
    at librbd/internal.cc:3491
#13 0x00007fb80035982a in (anonymous namespace)::C_AioReadWQ::finish (this=<optimized out>, r=<optimized out>) at librbd/librbd.cc:67
#14 0x00007fb800368599 in Context::complete (this=0x7fb8054e0e70, r=<optimized out>) at ./include/Context.h:65
#15 0x00007fb80038484e in ThreadPool::WorkQueueVal<std::pair<Context*, int>, std::pair<Context*, int> >::_void_process (this=0x7fb805417220, handle=...)
    at ./common/WorkQueue.h:191
#16 0x00007fb800474b7e in ThreadPool::worker (this=0x7fb805416c80, wt=0x7fb805416f90) at common/WorkQueue.cc:128
#17 0x00007fb800475c10 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:318
#18 0x00007fb7f9962182 in start_thread (arg=0x7fb7d6ffd700) at pthread_create.c:312
#19 0x00007fb7f968f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Environment

  • Ceph Hammer v0.94.4
  • Ceph v9.x
  • librbd clients

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content