32.4. Detecting and Recovering from Successive Crashed Nodes

Red Hat JBoss Data Grid is unable to distinguish whether a node left the cluster because of a process or machine crash, or because of a network failure.
If a single node exits the cluster, and if the value of numOwners is greater than 1, the cluster remains available and JBoss Data Grid attempts to create new replicas of the lost data. However, if additional nodes crash during this rebalancing process, it is possible that for some entries, all copies of its data have left the node and therefore cannot be recovered.
The recommended way to protect the data grid against successive crashed nodes is to enable partition handling (see Section 32.6, “Configure Partition Handling” for instructions) and to set an appropriately high value for numOwners to ensure that even if a large number of nodes leave the cluster in rapid succession, JBoss Data Grid is able to rebalance the nodes to recover the lost data.