Software-Emulated Watchdog Known Limitations
Overview
In Red Hat Enterprise Linux (RHEL) High Availability Cluster with for example storage based fencing (fence_sbd
), it is recommended that a Watchdog is enabled. This Watchdog ensures that the node being fenced is able to reboot itself in order to prevent resources running on more than one node at the time and can cause data integrity issues.
This watchdog timer device used by SBD, cannot be emulated by software, as is done by the softdog
driver. Such programs operate within the limitations and available resources provided by the kernel, and thus cannot be guaranteed to carry out the necessary reboot action if the operating system is malfunctioning or starved of resources. This can lead to other nodes in the cluster assuming the node has been fenced and thus rebooted while in fact this is not the case and leads to an inconsistent state of the cluster or possible data loss.
This article discusses known limitations and configurations which are strongly not recommended for its use so that end users can evaluate if the risks of such a configuration are acceptable within a particular installation.
Known Issues
Software Lockup
By using a software-emulated watchdog on a node, the action to reboot itself in the event of a fence event, is executed in the same environment as all other processes on the node. In the case of a software lockup or the system is starved of resources, the kernel is not able to schedule any processes and thus is also no longer able to execute the reboot of itself.
Virtualized Platforms
When cluster nodes are running as a Virtual Machine on a hypervisor, they may be impacted by, for example, Live Migration or suspend/resume activities. The guest OS does not have control when and how often a Live Migration activity occurs. However during such activity (after the Live Migration preparation), the guest OS is temporarily suspended to complete the Live Migration and continue operations on a different hypervisor. If during the Live Migration, a fence event occurs, the software watchdog may be impacted by the temporary suspension and is thus may not be able to complete the self fencing and execute a reboot.
Particularly Sensitive Workloads
Shared Data Access
In a setup where cluster nodes have access to Resilient Storage (GFS2, SMB/CIFS), it is particularly important to ensure a consistent state of all cluster nodes. In the case of (but not limited to) one of the above mentioned circumstances, where a cluster node is not able to reboot itself after a fence event, already running processes may continue to write to the shared storage. In this case, other node(s) in the cluster trust that a particular node has been fenced and thus its processes are no longer running and not writing to the shared storage. If, because of the failed reboot, those processes are still able to write to the shared storage, it may lead to data inconsistency or even data loss. This situation differs from other shared storage that have for example SCSI Fencing configured that forcibly prevents the fenced node from writing to the shared storage.
Mission-critical workloads
Because of the mentioned scenarios, Red Hat does not recommend using a software-emulated watchdog with Fence SBD in production or mission-critical workloads. Red Hat Enterprise Linux High Availability clusters should be designed to minimize risk of external events that have an impact on the availability and data integrity of a cluster. In contrast to a hardware watchdog or other fencing mechanisms, a software-emulated watchdog is less reliable and prone to having an impact on availability and data integrity.
Comments