Chapter 13. Self-heal does not complete

If a self-heal operation never completes, the cause could be a Gluster File ID (GFID) mismatch.

13.1. Gluster File ID mismatch

Diagnosis

  1. Check self-heal state.

    Run the following command several times over a few minutes. Note the entries that are shown.

    # gluster volume heal <volname> info

    If the same entries are shown each time, these entries may have a GFID mismatch.

  2. Check the GFID of each entry on each host.

    On each host, run the following command for each entry:

    # getfattr -d -m. -ehex <backend_path> -h

    The <backend_path> for an entry is comprised of the brick path and the entry. For example, if the brick for the engine volume has the path of /gluster_bricks/engine/engine and the entry shown in heal info is 58d392a6-e5b1-4aed-9bbc-952210a7137d/ha_agent/hosted-engine.metadata, the backend_path to use is /gluster_bricks/engine/engine/58d392a6-e5b1-4aed-9bbc-952210a7137d/ha_agent/hosted-engine.metadata.

  3. Compare the output from each host.

    If the trusted.gfid for an entry is not the same on all hosts, there is a GFID mismatch.

Solution

  1. Resolve the mismatch in favor of the GFID with the most recent modification time:

    # gluster volume heal <volume> split-brain latest-mtime <entry>

    For example:

    # gluster volume heal engine split-brain latest-mtime /58d392a6-e5b1-4aed-9bbc-952210a7137d/ha_agent/hosted-engine.metadata
  2. Manually trigger a heal on the volume.

    # gluster volume heal <volname>