Red Hat Gluster Storage : Memory consumption for Self-Heal daemon increases with toggling "cluster.self-heal-daemon" volume set option

Solution Verified - Updated -

Environment

  • Red Hat Gluster Storage 3.X

Issue

  • Memory consumption for Self-Heal daemon increases with toggling "cluster.self-heal-daemon" volume set option.
  • Why Self-Heal daemon is consuming very high memory.
  • Self-Heal memory consumption is very high.

Resolution

  • In general, disabling self-heal daemon is not recommended and toggling it very frequently is strictly prohibited.
  • With recommendation given above, there is very rare chance to get affected by the identified leak.
  • Once you hit the issue, please follow the given workaround:

    • Search for PID of Self-Heal daemon(glustershd)
    # ps aux | grep glustershd
    
    • Kill glustershd process
    # kill -9 <glustershd_pid>
    
    • Restart volume with "force" option
    # gluster volume <volname> start force
    

    Note: The workaround above doesn't affect ongoing IO and Management traffic. Thus, it is a safe workaround and doesn't impact anything. This is a generic workaround and can be applied in any situation where high memory consumption for glustershd process is observed.

Root Cause

  • This is a known issue, there is memory leak in graph switch path.
  • The issue is already captured in Bug 1529501 and work to fix the leak is in progress.
  • For real-time progress on the bug, please follow up via Bugzilla or contact Red Hat Support.

Diagnostic Steps

  • Run volume set command for "cluster.self-heal-daemon" in a loop and observe memory consumption.

For example :

 # for i in {1..300};do gluster volume set VOLNAME$i cluster.self-heal-daemon off;sleep 3 done

300 volume can be created before running above script.
  • Self-heal daemon occupies almost 4.6G of Resident space in memory post all volume set operations.

BEFORE VOL SET :

# ps aux|grep glus
root      8078 12.4  2.6 28807468 1315220 ?    Ssl  05:13   0:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/e32d8903c5b60efed5cc4e725235c143.socket --xlator-option *replicate*.node-uuid=cedc8e7d-d3a0-47f2-a50e-ebe12fe964bc

AFTER VOL SET :

# ps aux|grep glustershd
root      8078  3.0  9.4 31756588 4677648 ?    Ssl  05:13   3:56 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/e32d8903c5b60efed5cc4e725235c143.socket --xlator-option *replicate*.node-uuid=cedc8e7d-d3a0-47f2-a50e-ebe12fe964bc
  • It keeps increasing with each shd option toggle.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.