What is the best method to improve self heal performance on replicated or disperse volumes on Red Hat Gluster Storage?
Environment
- Red Hat Gluster Storage - 3.2+
Issue
- Is there any way to run multiple self-heal daemon to speed up healing?
- What is the best method to speed up healing when there is huge data to be healed between Red Hat Gluster Storage replicate volume bricks?
Resolution
In the Red Hat Gluster Storage(RHGS) 3.2 release a feature called "Multi Threaded Self Heal" was implemented. Prior to this release the self heal daemon was single threaded, this single threaded architecture lead to a CPU bottleneck that slowed down healing. By making the self heal daemon multi-threaded the load was distributed across multiple threads / CPU cores. This multi threaded architecture was able to take better advantage of the multiple CPU cores available in today's hardware, hence removing the single thread bottleneck.
There are two different tunibles associated with Multi Threaded Self Heal(MTSH), it should be noted that the tunibles for disperse(erasure coded) and replica(replica 2, replica 3, replica 3 + arbiter) are different.
The tunibles for replicated volumes are:
-
SHD max threads:
Option: cluster.shd-max-threads
Default Value: 1
Description: Maximum number of parallel heals SHD can do per local brick. This can substantially lower heal times, but can also crush your bricks if you don't have the storage hardware to support this. -
SHD wait queue length:
Option: cluster.shd-wait-qlength
Default Value: 1024
Description: This option can be used to control number of heals that can wait in SHD per subvolume
The tunibles for disperse volumes are:
-
Option: disperse.shd-max-threads
Default Value: 1
Description: Maximum number of parallel heals SHD can do per local brick. This can substantially lower heal times, but can also crush your bricks if you don't have the storage hardware to support this. -
Option: disperse.shd-wait-qlength
Default Value: 1024
Description: This option can be used to control number of heals that can wait in SHD per subvolume
Multithreaded Self-heal is documented in the Red Hat Gluster Storage Admin Guide in section:
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/sect-managing_split-brain#Triggering_Self-Healing_on_Replicated_Volumes
After much testing I found that the largest performance impact was realized when switching from 1 to 2 threads. This improvement continued to be significant from 2-4 threads but the performance boost started to diminish when hitting about 8 threads. After 8 threads I saw significantly diminishing returns, and almost no improvement between 16-64 threads. This could be a case of limited resources as my hardware looked to be bottle necked at the bricks(note the description about crushing the bricks with too many threads). Given enough CPUs and fast enough storage going over 8 threads may be helpful but on the test setup I was using(8 CPUs, 12 disk RAID 6, 7.2k RPM SAS, 10G NIC) I didn't see much improvement going over 8 threads. It should be noted that using lots of SHD threads will have some impact on your client I/O, this should be taken into account when tuning SHD max threads. I normally recommend leaving SHD at 2 the majority of the time. If I hit a situation where lots of data needs to be healed I tune SHD threads up to 4 or 8. If client side I/O impact is a problem I recommend tuning SHD threads up during times of lower access(say overnight or over a weekend) and back down when client access is heavy. Here is an example of tuning SHD max threads on a replica volume:
gluster volume set <your volume> cluster.shd-max-threads 4
The second tunible, shd-wait-qlength, I normally leave at the default. I haven't seen any situation where tuning this improved performance, I found the default of 1024 to be quite adequate. If there is a really large number of small files it may make sense to tune this up a bit but in given my HW I never found a scenario where it was really helpful. Here is an example of tuning SHD wait qlength:
gluster volume set <your volume> disperse.shd-wait-qlength 2048
Root Cause
-The Gluster self heal daemon was bottle-necked due to its single threaded architecture.
Diagnostic Steps
Verify files need healing:
# gluster volume heal <your volume> info
Check current SHD thread count:
# gluster volume get <your volume> cluster.shd-max-threads
After validating number of threads follow advice found in the "Resolution" section.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments