Why OSD crash frequently in RHCS ?

Solution Verified - Updated -

Issue

  • Why OSDs crash intermediately in SafeTimer::timer_thread call ?
  • Why frequent rebalance happening within cluster when multiple OSDs crashing ?
  • Why Ceph-mgr report unknown health metric for crashing OSD ?
# grep -r "osd,10" /var/log/ceph/ceph-mgr.<mgr-id>.log
2018-06-01 03:06:16.528955 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:18.622329 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:20.759340 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:22.902219 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P

Environment

  • Red Hat Enterprise Linux 7.x
  • Red Hat Ceph Storage 3.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content