Ceph - OSD's assert during snap trim with ReplicatedPG::RepGather* ReplicatedPG::trim_object
Issue
- One or more OSD's assert during snap trim.
- In the OSD log (/var/log/ceph/ceph-osd.*.log) the following stack is logged when the OSD asserts:
2016-06-27 08:08:16.909337 7f19777c0700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f19777c0700 time 2016-06-27 08:08:16.903355
osd/ReplicatedPG.cc: 2655: FAILED assert(0)
ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1fab]
2: (ReplicatedPG::trim_object(hobject_t const&)+0x1e4) [0x85bb64]
3: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x427) [0x85e287]
4: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb4) [0x8bf1f4]
5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x5f) [0x8ab92f]
6: (ReplicatedPG::snap_trimmer()+0x52c) [0x82f7fc]
7: (OSD::SnapTrimWQ::_process(PG*)+0x1a) [0x6c43aa]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xba2a0e]
9: (ThreadPool::WorkThread::entry()+0x10) [0xba3ab0]
10: (()+0x8182) [0x7f199ee58182]
11: (clone()+0x6d) [0x7f199d3c347d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
- OSD logs for OSD which took assert will show the following during snaptrimming:
Start snap trimming for specified object rbd_data.b77eb164a531e5.0000000000004fdf
SnapTrimmer state<Trimming/TrimmingObjects>: TrimmingObjects react trimming 0/bddc9ef1/rbd_data.b77eb164a531e5.0000000000004fdf/1e73e
-3> 2016-06-27 08:11:06.781905 7f08f1268700 10 osd.234 pg_epoch: 547937 pg[0.1ef1( v 547936'1181836 (543861'1178763,547936'1181836] local-les=547936 n=71899 ec=1 les/c 547936/547936 547935/547935/547935) [234,259,19] r=0 lpr=547935 crt=547932'1181831 lcod 547932'1181834 mlcod 547932'1181834 active+clean
When trimming we are attempting to obtain the object info and are unable to find, It's trying to getattr OI_ATTR which is the object info attribute:
get_object_context: obc NOT found in cache: 0/bddc9ef1/rbd_data.b77eb164a531e5.0000000000004fdf/1e73e
-2> 2016-06-27 08:11:06.784323 7f08f1268700 10 osd.234 pg_epoch: 547937 pg[0.1ef1( v 547936'1181836 (543861'1178763,547936'1181836] local-les=547936 n=71899 ec=1 les/c 547936/547936 547935/547935/547935) [234,259,19] r=0 lpr=547935 crt=547932'1181831 lcod 547932'1181834 mlcod 547932'1181834 active+clean
We report that no object info can be obtained:
get_object_context: no obc for soid 0/bddc9ef1/rbd_data.b77eb164a531e5.0000000000004fdf/1e73e and !can_create
-1> 2016-06-27 08:11:06.786646 7f08f1268700 -1 osd.234 pg_epoch: 547937 pg[0.1ef1( v 547936'1181836 (543861'1178763,547936'1181836] local-les=547936 n=71899 ec=1 les/c 547936/547936 547935/547935/547935) [234,259,19] r=0 lpr=547935 crt=547932'1181831 lcod 547932'1181834 mlcod 547932'1181834 active+clean
snaptrim on object fails:
trim_objectcould not find coid 0/bddc9ef1/rbd_data.b77eb164a531e5.0000000000004fdf/1e73e
Environment
- Red Hat Enterprise Linux 7
- Red Hat Ceph Storage 1.3.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.