Why OSD crash frequently in RHCS ?
Issue
- Why OSDs crash intermediately in SafeTimer::timer_thread call ?
- Why frequent rebalance happening within cluster when multiple OSDs crashing ?
- Why Ceph-mgr report unknown health metric for crashing OSD ?
# grep -r "osd,10" /var/log/ceph/ceph-mgr.<mgr-id>.log
2018-06-01 03:06:16.528955 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:18.622329 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:20.759340 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:22.902219 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
Environment
- Red Hat Enterprise Linux 7.x
- Red Hat Ceph Storage 3.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.