Ceph - OSD's assert during snap trim with ReplicatedPG::RepGather* ReplicatedPG::trim_object
Issue
- One or more OSD's assert during snap trim.
- In the OSD log (/var/log/ceph/ceph-osd.*.log) the following stack is logged when the OSD asserts:
2016-06-27 08:08:16.909337 7f19777c0700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f19777c0700 time 2016-06-27 08:08:16.903355
osd/ReplicatedPG.cc: 2655: FAILED assert(0)
ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1fab]
2: (ReplicatedPG::trim_object(hobject_t const&)+0x1e4) [0x85bb64]
3: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x427) [0x85e287]
4: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb4) [0x8bf1f4]
5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x5f) [0x8ab92f]
6: (ReplicatedPG::snap_trimmer()+0x52c) [0x82f7fc]
7: (OSD::SnapTrimWQ::_process(PG*)+0x1a) [0x6c43aa]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xba2a0e]
9: (ThreadPool::WorkThread::entry()+0x10) [0xba3ab0]
10: (()+0x8182) [0x7f199ee58182]
11: (clone()+0x6d) [0x7f199d3c347d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
- OSD logs for OSD which took assert will show the following during snaptrimming:
Start snap trimming for specified object rbd_data.b77eb164a531e5.0000000000004fdf
SnapTrimmer state<Trimming/TrimmingObjects>: TrimmingObjects react trimming 0/bddc9ef1/rbd_data.b77eb164a531e5.0000000000004fdf/1e73e
-3> 2016-06-27 08:11:06.781905 7f08f1268700 10 osd.234 pg_epoch: 547937 pg[0.1ef1( v 547936'1181836 (543861'1178763,547936'1181836] local-les=547936 n=71899 ec=1 les/c 547936/547936 547935/547935/547935) [234,259,19] r=0 lpr=547935 crt=547932'1181831 lcod 547932'1181834 mlcod 547932'1181834 active+clean
When trimming we are attempting to obtain the object info and are unable to find, It's trying to getattr OI_ATTR which is the object info attribute:
get_object_context: obc NOT found in cache: 0/bddc9ef1/rbd_data.b77eb164a531e5.0000000000004fdf/1e73e
-2> 2016-06-27 08:11:06.784323 7f08f1268700 10 osd.234 pg_epoch: 547937 pg[0.1ef1( v 547936'1181836 (543861'1178763,547936'1181836] local-les=547936 n=71899 ec=1 les/c 547936/547936 547935/547935/547935) [234,259,19] r=0 lpr=547935 crt=547932'1181831 lcod 547932'1181834 mlcod 547932'1181834 active+clean
We report that no object info can be obtained:
get_object_context: no obc for soid 0/bddc9ef1/rbd_data.b77eb164a531e5.0000000000004fdf/1e73e and !can_create
-1> 2016-06-27 08:11:06.786646 7f08f1268700 -1 osd.234 pg_epoch: 547937 pg[0.1ef1( v 547936'1181836 (543861'1178763,547936'1181836] local-les=547936 n=71899 ec=1 les/c 547936/547936 547935/547935/547935) [234,259,19] r=0 lpr=547935 crt=547932'1181831 lcod 547932'1181834 mlcod 547932'1181834 active+clean
snaptrim on object fails:
trim_objectcould not find coid 0/bddc9ef1/rbd_data.b77eb164a531e5.0000000000004fdf/1e73e
Environment
- Red Hat Enterprise Linux 7
- Red Hat Ceph Storage 1.3.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
