Why Are Some Gluster Files Being Constantly Healed?

Solution Verified - Updated -

Issue

  • Running the command gluster v heal test-vol info, it shows an stuck entry, needing to be healed:

    gluster volume heal test-vol info
    
    Brick node1:/brick1/brick
    file1
    Status: Connected
    Number of entries: 1
    
    Brick node2:/brick2/brick
    Status: Connected
    Number of entries: 0
    
    Brick node3:/brick3/arbiter
    file
    Status: Connected
    Number of entries: 1
    
  • At the time of collecting the extended attributes of this entry, two of the nodes point at a third one, as the source of the heal:

    [root@node1 ~]# getfattr -d -m . -e hex /brick1/brick/file1
    
    getfattr: Removing leading '/' from absolute path names
    # file: /brick1/brick/file
    trusted.afr.dirty=0x000000000000000000000000
    trusted.afr.test-vol-client-1=0x0000001a0000000000000000
    trusted.gfid=0xd2dc2973dfff404a8e7b23e4d6e7b83b
    trusted.glusterfs.shard.block-size=0x0000000000400000
    trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000
    
    [root@node2 ~]# getfattr -d -m . -e hex /brick2/brick/file1
    
    getfattr: Removing leading '/' from absolute path names
    # file: /brick2/brick/file1
    trusted.afr.dirty=0x000000000000000000000000
    trusted.afr.test-vol-client-0=0x000000000000000000000000
    trusted.afr.test-vol-client-2=0x000000000000000000000000
    trusted.gfid=0xd2dc2973dfff404a8e7b23e4d6e7b83b
    trusted.glusterfs.shard.block-size=0x0000000000400000
    trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000
    
    [root@node3 ~]# getfattr -d -m . -e hex /brick3/brick/file1
    
    getfattr: Removing leading '/' from absolute path names
    # file: /brick3/brick/file1
    trusted.afr.dirty=0x000000000000000000000000
    trusted.afr.test-vol-client-1=0x0000001d0000000000000000
    trusted.gfid=0xd2dc2973dfff404a8e7b23e4d6e7b83b
    trusted.glusterfs.shard.block-size=0x0000000000400000
    trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000
    

    From the above, nodes 1 and 3 ( clients 0 and 2 for the AFR xlator ) agree that node 2 ( client 1 ), is the one that needs to be healed. Looking at the log files of the self-healing daemon, available at /var/log/glusterfs/glustershd.log, this is exactly the case. This file is being healed from these two nodes in a loop:

     [2020-06-29 12:11:25.034839] I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-test-vol-replicate-0: Completed data selfheal on d2dc2973-dfff-404a-8e7b-23e4d6e7b83b. sources=[0] 2  sinks=1
    [2020-06-29 12:11:27.819586] I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-test-vol-replicate-0: Completed data selfheal on d2dc2973-dfff-404a-8e7b-23e4d6e7b83b. sources=[0] 2  sinks=1
    [2020-06-29 12:41:32.100811] I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-test-vol-replicate-0: Completed data selfheal on d2dc2973-dfff-404a-8e7b-23e4d6e7b83b. sources=[0] 2  sinks=1
    
  • Why is this occurring? How to fix this problem?

Environment

Red Hat Gluster Storage version 3.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content