Automatically Taking Sites Offline with Asynchronous Cross-Site Replication in JDG cluster
Environment
- Red Hat JBoss Data Grid (JDG)
- 7.3.2.+
Issue
- How to know if the site is taking offline automatically when configuring
after-failures
parameter? - How to know if the site is taking offline automatically when configuring
min-wait
parameter?
Resolution
Data Grid applies the take-offline configuration when using Cross-Site replication capabilities.
The following configuration provides an example to take sites offline automatically after 20 seconds:
<backups>
<backup site="site01" strategy="ASYNC">
<take-offline after-failures="-1" min-wait="20000"/>
</backup>
</backups>
-
after-failures
- the number of failed backup operations after which this site should be taken offline. Defaults to 0 (never). A negative value would mean that the site will be taken offline after minTimeToWait -
min-wait
- the number of milliseconds in which a site is not marked offline even if it is unreachable forafter-failures
number of times. If smaller or equal to 0, then onlyafter-failures
is considered.
NOTE: Automatically taking sites offline with strategy="ASYNC"
is only available to JDG 7.3.2 upper, minor releases only apply strategy="SYNC".
Diagnostic Steps
Enable TRACE level log messages for classorg.infinispan.xsite.OfflineStatus
.
When using after-failures
parameter search for min failures reached
in the server.log
file as follows:
2019-07-12 16:08:09,654 TRACE [org.infinispan.xsite.OfflineStatus] (jgroups-45,jdg-d-cachesrv-01) Site is failed: min failures reached.
2019-07-12 16:08:09,654 INFO [org.infinispan.CLUSTER] (jgroups-45,jdg-d-cachesrv-01) [Context=api-general-filestore][Context=jdg-d-cachesrv-01]ISPN100006: Site 'site02' is offline.
If setting up min-wait
parameter search for The minTimeToWait has passed
in the server.log
file as follows:
2019-07-15 15:11:36,371 TRACE [org.infinispan.xsite.OfflineStatus] (HotRod-ServerHandler-7-56) The minTimeToWait has passed: minTime=20000, timeSinceFirstFailure=38378
Infinispan
only updates the site status when it needs to replicate data (put operation) to the backup site.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.