What is the best method to improve self heal performance on replicated or disperse volumes on Red Hat Gluster Storage?

Solution Verified - Updated -

Environment

  • Red Hat Gluster Storage - 3.2+

Issue

  • Is there any way to run multiple self-heal daemon to speed up healing?
  • What is the best method to speed up healing when there is huge data to be healed between Red Hat Gluster Storage replicate volume bricks?

Resolution

In the Red Hat Gluster Storage(RHGS) 3.2 release a feature called "Multi Threaded Self Heal" was implemented. Prior to this release the self heal daemon was single threaded, this single threaded architecture lead to a CPU bottleneck that slowed down healing. By making the self heal daemon multi-threaded the load was distributed across multiple threads / CPU cores. This multi threaded architecture was able to take better advantage of the multiple CPU cores available in today's hardware, hence removing the single thread bottleneck.

There are two different tunibles associated with Multi Threaded Self Heal(MTSH), it should be noted that the tunibles for disperse(erasure coded) and replica(replica 2, replica 3, replica 3 + arbiter) are different.

The tunibles for replicated volumes are:

  1. SHD max threads:
    Option: cluster.shd-max-threads
    Default Value: 1
    Description: Maximum number of parallel heals SHD can do per local brick. This can substantially lower heal times, but can also crush your bricks if you don't have the storage hardware to support this.

  2. SHD wait queue length:
    Option: cluster.shd-wait-qlength
    Default Value: 1024
    Description: This option can be used to control number of heals that can wait in SHD per subvolume

The tunibles for disperse volumes are:

  1. Option: disperse.shd-max-threads
    Default Value: 1
    Description: Maximum number of parallel heals SHD can do per local brick. This can substantially lower heal times, but can also crush your bricks if you don't have the storage hardware to support this.

  2. Option: disperse.shd-wait-qlength
    Default Value: 1024
    Description: This option can be used to control number of heals that can wait in SHD per subvolume

Multithreaded Self-heal is documented in the Red Hat Gluster Storage Admin Guide in section:
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/sect-managing_split-brain#Triggering_Self-Healing_on_Replicated_Volumes

After much testing I found that the largest performance impact was realized when switching from 1 to 2 threads. This improvement continued to be significant from 2-4 threads but the performance boost started to diminish when hitting about 8 threads. After 8 threads I saw significantly diminishing returns, and almost no improvement between 16-64 threads. This could be a case of limited resources as my hardware looked to be bottle necked at the bricks(note the description about crushing the bricks with too many threads). Given enough CPUs and fast enough storage going over 8 threads may be helpful but on the test setup I was using(8 CPUs, 12 disk RAID 6, 7.2k RPM SAS, 10G NIC) I didn't see much improvement going over 8 threads. It should be noted that using lots of SHD threads will have some impact on your client I/O, this should be taken into account when tuning SHD max threads. I normally recommend leaving SHD at 2 the majority of the time. If I hit a situation where lots of data needs to be healed I tune SHD threads up to 4 or 8. If client side I/O impact is a problem I recommend tuning SHD threads up during times of lower access(say overnight or over a weekend) and back down when client access is heavy. Here is an example of tuning SHD max threads on a replica volume:

gluster volume set <your volume> cluster.shd-max-threads 4

The second tunible, shd-wait-qlength, I normally leave at the default. I haven't seen any situation where tuning this improved performance, I found the default of 1024 to be quite adequate. If there is a really large number of small files it may make sense to tune this up a bit but in given my HW I never found a scenario where it was really helpful. Here is an example of tuning SHD wait qlength:

gluster volume set <your volume> disperse.shd-wait-qlength 2048

Root Cause

-The Gluster self heal daemon was bottle-necked due to its single threaded architecture.

Diagnostic Steps

Verify files need healing:

# gluster volume heal <your volume> info

Check current SHD thread count:

# gluster volume get <your volume> cluster.shd-max-threads

After validating number of threads follow advice found in the "Resolution" section.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments