How to handle inconsistent placement groups in Ceph?
Environment
- Red Hat Ceph Storage 1.2.3
- Red Hat Ceph Storage 1.3
- Red Hat Ceph Storage 2.x
- Red Hat Ceph Storage 3.x
Issue
- A "ceph status" or "ceph -s" reports inconsistent PGs. How can this be handled?
# ceph -s
30 active+clean+inconsistent
- A detailed probe shows more information
# ceph health detail
....
pg 11.eeef is active+clean+inconsistent, acting [106,427,854]
pg 5.ee92 is active+clean+inconsistent, acting [247,183,125]
.......
Resolution
-
Ceph offers the ability to repair inconsistent PGs with the ceph pg repair command. Before doing this, it is important to know exactly why the pgs are inconsistent since there are cases that are dangerous to repair with this tool.
-
Trigger a deep-scrub on the placement group.
# ceph pg deep-scrub 11.eeef
instructing pg 11.eeef on osd.106 to deep-scrub
- Watch the ceph log for the result of the scrub.
# ceph -w | grep 11.eeef
2015-02-26 01:35:36.778215 osd.106 [ERR] 11.eeef deep-scrub stat mismatch, got 636/635 objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 1855455/1854371 bytes.
2015-02-26 01:35:36.788334 osd.106 [ERR] 11.eeef deep-scrub 1 errors
- Some of the different errors that are safe to repair, as of Firefly 0.80.7.
<pg.id> shard <osd>: soid <object> missing attr _, missing attr <attr type>
<pg.id> shard <osd>: soid <object> digest 0 != known digest <digest>, size 0 != known size <size>
<pg.id> shard <osd>: soid <object> size 0 != known size <size>
<pg.id> deep-scrub stat mismatch, got <mismatch>
<pg.id> shard <osd>: soid <object> candidate had a read error, digest 0 != known digest <digest>
-
The read errors shown above, can be repaired as the HDD will reallocate the sector on write. SMART (Self-Monitoring, Analysis and Reporting Technology) will need to be checked on the disks to see if they are growing other bad sectors and should be failed preemptively.
-
An example of repairing a placement group.
# ceph pg repair 11.eeef
instructing pg 11.eeef on osd.106 to repair
root@logagg7109:~# ceph -w | grep 11.eeef
2015-02-26 01:49:28.164677 osd.106 [ERR] 11.eeef repair stat mismatch, got 636/635 objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 1855455/1854371 bytes.
2015-02-26 01:49:28.164957 osd.106 [ERR] 11.eeef repair 1 errors, 1 fixed
- Here are examples of errors that should not be repaired via the ceph pg repair utility
<pg.id> shard <osd>: soid <object> digest <digest> != known digest <digest>
<pg.id> shard <osd>: soid <object> omap_digest <digest> != known omap_digest <digest>
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
