Why OSD crash frequently in RHCS ?

Solution Verified - Updated -

Issue

  • Why OSDs crash intermediately in SafeTimer::timer_thread call ?
  • Why frequent rebalance happening within cluster when multiple OSDs crashing ?
  • Why Ceph-mgr report unknown health metric for crashing OSD ?
# grep -r "osd,10" /var/log/ceph/ceph-mgr.<mgr-id>.log
2018-06-01 03:06:16.528955 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:18.622329 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:20.759340 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:22.902219 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P

Environment

  • Red Hat Enterprise Linux 7.x
  • Red Hat Ceph Storage 3.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In