How to reduce number of heals from client fuse mount on replica volumes

Environment

Red Hat Gluster Storage 3.X

Issue

High number of heals are observed and not progressing, those entries are marked by gluster as possibly damaged due inconsistent data on all copies of the bricks, here's an example of output, omitting the name of the file:

#>gluster volume heal kcsvol info | egrep "Number of entries|Brick"
Brick 551371.dc02.its.kcs.net:/opt/gluster/kcsvol/brick2
Number of entries: 578
Brick 551444.dc02.its.kcs.net:/opt/gluster/kcsvol/brick2
Number of entries: 578
Brick 551414.dc02.its.kcs.net:/opt/gluster/kcsvol/brick2
Number of entries: 0

Resolution

We can try to clean the healings through client on mounted gluster volume, on replicated volumes we can try to access the data to force gluster replicated volumes to heal the entry, some tunes on the volume must to be set to be effective:

cluster.data-self-heal: Specifies whether proactive data self-healing on replicated volumes is activated.
```
#>gluster volume set $VOLNAME cluster.data-self-heal on
```
cluster.metadata-self-heal: Specifies whether proactive metadata self-healing on replicated volumes is activated.
```
#>gluster volume set $VOLNAME cluster.metadata-self-heal on
```
cluster.entry-self-heal:Specifies whether proactive self-healing of the contents of a directory on replicated volumes is activated.
```
#>gluster volume set $VOLNAME cluster.entry-self-heal on
```
cluster.self-heal-daemon:Specifies whether proactive self-healing on replicated volumes is activated.
```
#>gluster volume set $VOLNAME self-heal-daemon on
```

(Where $VOLNAME is the name of related volume on gluster)

After those settings are applied or confirmed, we can try to run on a client, where fuse mounts the volume with the first or second approach below:

FIRST APPROACH

This command forces the gluster self heal daemon to check all the files that are accessed by the command, if some files are with "I/O" errors, indicates that the file is damaged.
```
#>ls -R <GLUSTER_FUSE_CLIENT_MOUNTED_VOLUME>
```

SECOND APPROACH

On a second attempt we can try to move or copy the data related inside temporal folder on gluster, if observed that the content is from one specific folder, as ex /glusterVol/data/folder1:

Create temporal folder
```
#>mkdir /glusterVol/data/folder1.bak
```

If is not accessed you can move:

#>mv -vR /glusterVol/data/folder1/* /glusterVol/data/folder1.bak/

If used at this time can be copied:

#>cp -vR /glusterVol/data/folder1/* /glusterVol/data/folder1.bak/

After that can be checked with another heal info

Root Cause

Mismatches on metadata of the files on timestamps (Access, Modify or creation) is detected by gluster, the content matches in size, fragments and gfids but daemon detect those files as not healthy.

Diagnostic Steps

Observed number of heals that are not progressing, check type of the volume Type is Replicate

#> gluster volume kcsvol info
Volume Name: kcsvol
Type: Replicate
Volume ID: 7777777-4444-4f0f-b0c3-fb744f35bf33
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3 
Transport-type: tcp 
Bricks:
Brick1: 551460.dc02.its.kcs.net:/opt/gluster/kcsvol/brick2
Brick2: 551465.dc02.its.kcs.net:/opt/gluster/kcsvol/brick2
Brick3: 551463.dc02.its.kcs.net:/opt/gluster/kcsvol/brick2 (arbiter)
Options Reconfigured:
performance.quick-read: off 
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
server.ssl: on
client.ssl: on
performance.open-behind: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
auth.ssl-allow: {$LIST_OF_NODES}
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
features.read-only: on
cluster.server-quorum-ratio: 51%
cluster.enable-shared-storage: enable

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

How to reduce number of heals from client fuse mount on replica volumes

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links