Why OSD crash frequently in RHCS ?
Issue
- Why OSDs crash intermediately in SafeTimer::timer_thread call ?
- Why frequent rebalance happening within cluster when multiple OSDs crashing ?
- Why Ceph-mgr report unknown health metric for crashing OSD ?
# grep -r "osd,10" /var/log/ceph/ceph-mgr.<mgr-id>.log
2018-06-01 03:06:16.528955 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:18.622329 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:20.759340 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
2018-06-01 03:06:22.902219 abdababdabda -1 mgr.server send_report send_report osd,10.0xbabdbabdbabab sent me an unknown health metric: P
Environment
- Red Hat Enterprise Linux 7.x
- Red Hat Ceph Storage 3.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
