Chapter 13. Setting up cross-site replication

Ensure availability with Data Grid Operator by configuring geographically distributed clusters as a unified service.

You can configure clusters to perform cross-site replication with:

  • Connections that Data Grid Operator manages.
  • Connections that you configure and manage.
Note

You can use both managed and manual connections for Data Grid clusters in the same Infinispan CR. You must ensure that Data Grid clusters establish connections in the same way at each site.

13.1. Cross-site replication expose types

You can use a NodePort service, a LoadBalancer service, or an OpenShift Route to handle network traffic for backup operations between Data Grid clusters. Before you start setting up cross-site replication you should determine what expose type is available for your Red Hat OpenShift cluster. In some cases you may require an administrator to provision services before you can configure an expose type.

NodePort

A NodePort is a service that accepts network traffic at a static port, in the 30000 to 32767 range, on an IP address that is available externally to the OpenShift cluster.

To use a NodePort as the expose type for cross-site replication, an administrator must provision external IP addresses for each OpenShift node. In most cases, an administrator must also configure DNS routing for those external IP addresses.

LoadBalancer

A LoadBalancer is a service that directs network traffic to the correct node in the OpenShift cluster.

Whether you can use a LoadBalancer as the expose type for cross-site replication depends on the host platform. AWS supports network load balancers (NLB) while some other cloud platforms do not. To use a LoadBalancer service, an administrator must first create an ingress controller backed by an NLB.

Route

An OpenShift Route allows Data Grid clusters to connect with each other through a public secure URL.

Data Grid uses TLS with the SNI header to send backup requests between clusters through an OpenShift Route. To do this you must add a keystore with TLS certificates so that Data Grid can encrypt network traffic for cross-site replication.

When you specify Route as the expose type for cross-site replication, Data Grid Operator creates a route with TLS passthrough encryption for each Data Grid cluster that it manages. You can specify a hostname for the Route but you cannot specify a Route that you have already created.

13.2. Managed cross-site replication

Data Grid Operator can discover Data Grid clusters running in different data centers to form global clusters.

When you configure managed cross-site connections, Data Grid Operator creates router pods in each Data Grid cluster. Data Grid pods use the <cluster_name>-site service to connect to these router pods and send backup requests.

Router pods maintain a record of all pod IP addresses and parse RELAY message headers to forward backup requests to the correct Data Grid cluster. If a router pod crashes then all Data Grid pods start using any other available router pod until OpenShift restores it.

Important

To manage cross-site connections, Data Grid Operator uses the Kubernetes API. Each OpenShift cluster must have network access to the remote Kubernetes API and a service account token for each backup cluster.

Note

Data Grid clusters do not start running until Data Grid Operator discovers all backup locations that you configure.

13.2.1. Creating service account tokens for managed cross-site connections

Generate service account tokens on OpenShift clusters that allow Data Grid Operator to automatically discover Data Grid clusters and manage cross-site connections.

Prerequisites

  • Ensure all OpenShift clusters have access to the Kubernetes API.
    Data Grid Operator uses this API to manage cross-site connections.

    Note

    Data Grid Operator does not modify remote Data Grid clusters. The service account tokens provide read-only access through the Kubernetes API.

Procedure

  1. Log in to an OpenShift cluster.
  2. Create a service account.

    For example, create a service account at LON:

    oc create sa -n <namespace> lon
  3. Add the view role to the service account with the following command:

    oc policy add-role-to-user view -n <namespace> -z lon
  4. If you use a NodePort service to expose Data Grid clusters on the network, you must also add the cluster-reader role to the service account:

    oc adm policy add-cluster-role-to-user cluster-reader -z lon -n <namespace>
  5. Repeat the preceding steps on your other OpenShift clusters.
  6. Exchange service account tokens on each OpenShift cluster.

13.2.2. Exchanging service account tokens

Generate service account tokens on your OpenShift clusters and add them into secrets at each backup location. The tokens that you generate in this procedure do not expire. For bound service account tokens, see Exchanging bound service account tokens.

Prerequisites

  • You have created a service account.

Procedure

  1. Log in to your OpenShift cluster.
  2. Create a service account token secret file as follows:

    sa-token.yaml

    apiVersion: v1
    kind: Secret
    metadata:
      name: ispn-xsite-sa-token 1
      annotations:
        kubernetes.io/service-account.name: "<service-account>" 2
    type: kubernetes.io/service-account-token

    1
    Specifies the name of the secret.
    2
    Specifies the service account name.
  3. Create the secret in your OpenShift cluster:

    oc -n <namespace> create -f sa-token.yaml
  4. Retrieve the service account token:

    oc -n <namespace> get secrets ispn-xsite-sa-token -o jsonpath="{.data.token}" | base64 -d

    The command prints the token in the terminal.

  5. Copy the token for deployment in the backup OpenShift cluster.
  6. Log in to the backup OpenShift cluster.
  7. Add the service account token for a backup location:

    oc -n <namespace> create secret generic <token-secret> --from-literal=token=<token>

    The <token-secret> is the name of the secret configured in the Infinispan CR.

Next steps

  • Repeat the preceding steps on your other OpenShift clusters.

13.2.3. Exchanging bound service account tokens

Create service account tokens with a limited lifespan and add them into secrets at each backup location. You must refresh the token periodically to prevent Data Grid Operator from losing access to the remote OpenShift cluster. For non-expiring tokens, see Exchanging service account tokens.

Prerequisites

  • You have created a service account.

Procedure

  1. Log in to your OpenShift cluster.
  2. Create a bound token for the service account:

    oc -n <namespace> create token <service-account>
    Note

    By default, service account tokens are valid for one hour. Use the command option --duration to specify the lifespan in seconds..

    The command prints the token in the terminal.

  3. Copy the token for deployment in the backup OpenShift cluster(s).
  4. Log in to the backup OpenShift cluster.
  5. Add the service account token for a backup location:

    oc -n <namespace> create secret generic <token-secret> --from-literal=token=<token>

    The <token-secret> is the name of the secret configured in the Infinispan CR.

  6. Repeat the steps on other OpenShift clusters.
Deleting expired tokens

When a token expires, delete the expired token secret, and then repeat the procedure to generate and exchange a new one.

  1. Log in to the backup OpenShift cluster.
  2. Delete the expired secret <token-secret>:

    oc -n <namespace> delete secrets <token-secret>
  3. Repeat the procedure to create a new token and generate a new <token-secret>.

13.2.4. Configuring managed cross-site connections

Configure Data Grid Operator to establish cross-site views with Data Grid clusters.

Prerequisites

  • Determine a suitable expose type for cross-site replication.
    If you use an OpenShift Route you must add a keystore with TLS certificates and secure cross-site connections.
  • Create and exchange Red Hat OpenShift service account tokens for each Data Grid cluster.

Procedure

  1. Create an Infinispan CR for each Data Grid cluster.
  2. Specify the name of the local site with spec.service.sites.local.name.
  3. Configure the expose type for cross-site replication.

    1. Set the value of the spec.service.sites.local.expose.type field to one of the following:

      • NodePort
      • LoadBalancer
      • Route
    2. Optionally specify a port or custom hostname with the following fields:

      • spec.service.sites.local.expose.nodePort if you use a NodePort service.
      • spec.service.sites.local.expose.port if you use a LoadBalancer service.
      • spec.service.sites.local.expose.routeHostName if you use an OpenShift Route.
  4. Specify the number of pods that can send RELAY messages with the service.sites.local.maxRelayNodes field.

    Tip

    Configure all pods in your cluster to send RELAY messages for better performance. If all pods send backup requests directly, then no pods need to forward backup requests.

  5. Provide the name, URL, and secret for each Data Grid cluster that acts as a backup location with spec.service.sites.locations.
  6. If Data Grid cluster names or namespaces at the remote site do not match the local site, specify those values with the clusterName and namespace fields.

    The following are example Infinispan CR definitions for LON and NYC:

    • LON

      apiVersion: infinispan.org/v1
      kind: Infinispan
      metadata:
        name: infinispan
      spec:
        replicas: 3
        version: <Data Grid_version>
        service:
          type: DataGrid
          sites:
            local:
              name: LON
              expose:
                type: LoadBalancer
                port: 65535
              maxRelayNodes: 1
            locations:
              - name: NYC
                clusterName: <nyc_cluster_name>
                namespace: <nyc_cluster_namespace>
                url: openshift://api.rhdg-nyc.openshift-aws.myhost.com:6443
                secretName: nyc-token
        logging:
          categories:
            org.jgroups.protocols.TCP: error
            org.jgroups.protocols.relay.RELAY2: error
    • NYC

      apiVersion: infinispan.org/v1
      kind: Infinispan
      metadata:
        name: nyc-cluster
      spec:
        replicas: 2
        version: <Data Grid_version>
        service:
          type: DataGrid
          sites:
            local:
              name: NYC
              expose:
                type: LoadBalancer
                port: 65535
              maxRelayNodes: 1
            locations:
              - name: LON
                clusterName: infinispan
                namespace: rhdg-namespace
                url: openshift://api.rhdg-lon.openshift-aws.myhost.com:6443
                secretName: lon-token
        logging:
          categories:
            org.jgroups.protocols.TCP: error
            org.jgroups.protocols.relay.RELAY2: error
      Important

      Be sure to adjust logging categories in your Infinispan CR to decrease log levels for JGroups TCP and RELAY2 protocols. This prevents a large number of log files from uses container storage.

      spec:
        logging:
          categories:
            org.jgroups.protocols.TCP: error
            org.jgroups.protocols.relay.RELAY2: error
  7. Configure your Infinispan CRs with any other Data Grid service resources and then apply the changes.
  8. Verify that Data Grid clusters form a cross-site view.

    1. Retrieve the Infinispan CR.

      oc get infinispan -o yaml
    2. Check for the type: CrossSiteViewFormed condition.

Next steps

If your clusters have formed a cross-site view, you can start adding backup locations to caches.

13.3. Manually configuring cross-site connections

You can specify static network connection details to perform cross-site replication with Data Grid clusters running outside OpenShift. Manual cross-site connections are necessary in any scenario where access to the Kubernetes API is not available outside the OpenShift cluster where Data Grid runs.

Prerequisites

  • Determine a suitable expose type for cross-site replication.
    If you use an OpenShift Route you must add a keystore with TLS certificates and secure cross-site connections.
  • Ensure you have the correct host names and ports for each Data Grid cluster and each <cluster-name>-site service.

    Manually connecting Data Grid clusters to form cross-site views requires predictable network locations for Data Grid services, which means you need to know the network locations before they are created.

Procedure

  1. Create an Infinispan CR for each Data Grid cluster.
  2. Specify the name of the local site with spec.service.sites.local.name.
  3. Configure the expose type for cross-site replication.

    1. Set the value of the spec.service.sites.local.expose.type field to one of the following:

      • NodePort
      • LoadBalancer
      • Route
    2. Optionally specify a port or custom hostname with the following fields:

      • spec.service.sites.local.expose.nodePort if you use a NodePort service.
      • spec.service.sites.local.expose.port if you use a LoadBalancer service.
      • spec.service.sites.local.expose.routeHostName if you use an OpenShift Route.
  4. Provide the name and static URL for each Data Grid cluster that acts as a backup location with spec.service.sites.locations, for example:

    • LON

      apiVersion: infinispan.org/v1
      kind: Infinispan
      metadata:
        name: infinispan
      spec:
        replicas: 3
        version: <Data Grid_version>
        service:
          type: DataGrid
          sites:
            local:
              name: LON
              expose:
                type: LoadBalancer
                port: 65535
              maxRelayNodes: 1
            locations:
              - name: NYC
                url: infinispan+xsite://infinispan-nyc.myhost.com:7900
        logging:
          categories:
            org.jgroups.protocols.TCP: error
            org.jgroups.protocols.relay.RELAY2: error
    • NYC

      apiVersion: infinispan.org/v1
      kind: Infinispan
      metadata:
        name: infinispan
      spec:
        replicas: 2
        version: <Data Grid_version>
        service:
          type: DataGrid
          sites:
            local:
              name: NYC
              expose:
                type: LoadBalancer
                port: 65535
              maxRelayNodes: 1
            locations:
              - name: LON
                url: infinispan+xsite://infinispan-lon.myhost.com
        logging:
          categories:
            org.jgroups.protocols.TCP: error
            org.jgroups.protocols.relay.RELAY2: error
      Important

      Be sure to adjust logging categories in your Infinispan CR to decrease log levels for JGroups TCP and RELAY2 protocols. This prevents a large number of log files from uses container storage.

      spec:
        logging:
          categories:
            org.jgroups.protocols.TCP: error
            org.jgroups.protocols.relay.RELAY2: error
  5. Configure your Infinispan CRs with any other Data Grid service resources and then apply the changes.
  6. Verify that Data Grid clusters form a cross-site view.

    1. Retrieve the Infinispan CR.

      oc get infinispan -o yaml
    2. Check for the type: CrossSiteViewFormed condition.

Next steps

If your clusters have formed a cross-site view, you can start adding backup locations to caches.

13.4. Allocating CPU and memory for Gossip router pod

Allocate CPU and memory resources to Data Grid Gossip router.

Prerequisite

  • Have Gossip router enabled. The service.sites.local.discovery.launchGossipRouter property must be set to true, which is the default value.

Procedure

  1. Allocate the number of CPU units using the service.sites.local.discovery.cpu field.
  2. Allocate the amount of memory, in bytes, using the service.sites.local.discovery.memory field.

    The cpu and memory fields have values in the format of <limit>:<requests>. For example, cpu: "2000m:1000m" limits pods to a maximum of 2000m of CPU and requests 1000m of CPU for each pod at startup. Specifying a single value sets both the limit and request.

  3. Apply your Infinispan CR.
spec:
  service:
    type: DataGrid
    sites:
      local:
        name: LON
        discovery:
          launchGossipRouter: true
          memory: "2Gi:1Gi"
          cpu: "2000m:1000m"

13.5. Disabling local Gossip router and service

The Data Grid Operator starts a Gossip router on each site, but you only need a single Gossip router to manage traffic between the Data Grid cluster members. You can disable the additional Gossip routers to save resources.

For example, you have Data Grid clusters in LON and NYC sites. The following procedure shows how you can disable Gossip router in LON site and connect to NYC that has the Gossip router enabled.

Procedure

  1. Create an Infinispan CR for each Data Grid cluster.
  2. Specify the name of the local site with the spec.service.sites.local.name field.
  3. For the LON cluster, set false as the value for the spec.service.sites.local.discovery.launchGossipRouter field.
  4. For the LON cluster, specify the url with the spec.service.sites.locations.url to connect to the NYC.
  5. In the NYC configuration, do not specify the spec.service.sites.locations.url.

    LON

    apiVersion: infinispan.org/v1
    kind: Infinispan
    metadata:
      name: infinispan
    spec:
      replicas: 3
      service:
        type: DataGrid
        sites:
          local:
            name: LON
            discovery:
              launchGossipRouter: false
          locations:
            - name: NYC
              url: infinispan+xsite://infinispan-nyc.myhost.com:7900

    NYC

    apiVersion: infinispan.org/v1
    kind: Infinispan
    metadata:
      name: infinispan
    spec:
      replicas: 3
      service:
        type: DataGrid
        sites:
          local:
            name: NYC
          locations:
            - name: LON

Important

If you have three or more sites, Data Grid recommends to keep the Gossip router enabled on all the remote sites. When you have multiple Gossip routers and one of them becomes unavailable, the remaining routers continue exchanging messages. If a single Gossip router is defined, and it becomes unavailable, the connection between the remote sites breaks.

Next steps

If your clusters have formed a cross-site view, you can start adding backup locations to caches.

Additional resources

13.6. Resources for configuring cross-site replication

The following tables provides fields and descriptions for cross-site resources.

Table 13.1. service.type

FieldDescription

service.type: DataGrid

Data Grid supports cross-site replication with Data Grid service clusters only.

Table 13.2. service.sites.local

FieldDescription

service.sites.local.name

Names the local site where a Data Grid cluster runs.

service.sites.local.maxRelayNodes

Specifies the maximum number of pods that can send RELAY messages for cross-site replication. The default value is 1.

service.sites.local.discovery.launchGossipRouter

If false, the cross-site services and the Gossip router pod are not created in the local site. The default value is true.

service.sites.local.discovery.memory

Allocates the amount of memory in bytes. It uses the following format <limit>:<requests> (example "2Gi:1Gi").

service.sites.local.discovery.cpu

Allocates the number of CPU units. It uses the following format <limit>:<requests> (example "2000m:1000m").

service.sites.local.expose.type

Specifies the network service for cross-site replication. Data Grid clusters use this service to communicate and perform backup operations. You can set the value to NodePort, LoadBalancer, or Route.

service.sites.local.expose.nodePort

Specifies a static port within the default range of 30000 to 32767 if you expose Data Grid through a NodePort service. If you do not specify a port, the platform selects an available one.

service.sites.local.expose.port

Specifies the network port for the service if you expose Data Grid through a LoadBalancer service. The default port is 7900.

service.sites.local.expose.routeHostName

Specifies a custom hostname if you expose Data Grid through an OpenShift Route. If you do not set a value then OpenShift generates a hostname.

Table 13.3. service.sites.locations

FieldDescription

service.sites.locations

Provides connection information for all backup locations.

service.sites.locations.name

Specifies a backup location that matches .spec.service.sites.local.name.

service.sites.locations.url

Specifies the URL of the Kubernetes API for managed connections or a static URL for manual connections.

Use openshift:// to specify the URL of the Kubernetes API for an OpenShift cluster.

Note that the openshift:// URL must present a valid, CA-signed certificate. You cannot use self-signed certificates.

Use the infinispan+xsite://<hostname>:<port> format for static hostnames and ports. The default port is 7900.

service.sites.locations.secretName

Specifies the secret that contains the service account token for the backup site.

service.sites.locations.clusterName

Specifies the cluster name at the backup location if it is different to the cluster name at the local site.

service.sites.locations.namespace

Specifies the namespace of the Data Grid cluster at the backup location if it does not match the namespace at the local site.

Managed cross-site connections

spec:
  service:
    type: DataGrid
    sites:
      local:
        name: LON
        expose:
          type: LoadBalancer
        maxRelayNodes: 1
      locations:
      - name: NYC
        clusterName: <nyc_cluster_name>
        namespace: <nyc_cluster_namespace>
        url: openshift://api.site-b.devcluster.openshift.com:6443
        secretName: nyc-token

Manual cross-site connections

spec:
  service:
    type: DataGrid
    sites:
      local:
        name: LON
        expose:
          type: LoadBalancer
          port: 65535
        maxRelayNodes: 1
      locations:
      - name: NYC
        url: infinispan+xsite://infinispan-nyc.myhost.com:7900

13.7. Securing cross-site connections

Add keystores and trust stores so that Data Grid clusters can secure cross-site replication traffic.

You must add a keystore to use an OpenShift Route as the expose type for cross-site replication. Securing cross-site connections is optional if you use a NodePort or LoadBalancer as the expose type.

Note

Cross-site replication does not support the OpenShift CA service. You must provide your own certificates.

Prerequisites

  • Have a PKCS12 keystore that Data Grid can use to encrypt and decrypt RELAY messages.

    You must provide a keystore for relay pods and router pods to secure cross-site connections.
    The keystore can be the same for relay pods and router pods or you can provide separate keystores for each.
    You can also use the same keystore for each Data Grid cluster or a unique keystore for each cluster.

  • Have a PKCS12 trust store that contains part of the certificate chain or root CA certificate that verifies public certificates for Data Grid relay pods and router pods.

Procedure

  1. Create cross-site encryption secrets.

    1. Create keystore secrets.
    2. Create trust store secrets.
  2. Modify the Infinispan CR for each Data Grid cluster to specify the secret name for the encryption.transportKeyStore.secretName and encryption.routerKeyStore.secretName fields.
  3. Configure any other fields to encrypt RELAY messages as required and then apply the changes.

    apiVersion: infinispan.org/v1
    kind: Infinispan
    metadata:
      name: infinispan
    spec:
      replicas: 2
      version: <Data Grid_version>
      expose:
        type: LoadBalancer
      service:
        type: DataGrid
        sites:
          local:
            name: SiteA
            # ...
            encryption:
              protocol: TLSv1.3
              transportKeyStore:
                secretName: transport-tls-secret
                alias: transport
                filename: keystore.p12
              routerKeyStore:
                secretName: router-tls-secret
                alias: router
                filename: keystore.p12
              trustStore:
                secretName: truststore-tls-secret
                filename: truststore.p12
          locations:
            # ...

13.7.1. Resources for configuring cross-site encryption

The following tables provides fields and descriptions for encrypting cross-site connections.

Table 13.4. service.type.sites.local.encryption

FieldDescription

service.type.sites.local.encryption.protocol

Specifies the TLS protocol to use for cross-site connections. The default value is TLSv1.2 but you can set TLSv1.3 if required.

service.type.sites.local.encryption.transportKeyStore

Configures a keystore secret for relay pods.

service.type.sites.local.encryption.routerKeyStore

Configures a keystore secret for router pods.

service.type.sites.local.encryption.trustStore

Configures a trust store secret for relay pods and router pods.

Table 13.5. service.type.sites.local.encryption.transportKeyStore

FieldDescription

secretName

Specifies the secret that contains a keystore that relay pods can use to encrypt and decrypt RELAY messages. This field is required.

alias

Optionally specifies the alias of the certificate in the keystore. The default value is transport.

filename

Optionally specifies the filename of the keystore. The default value is keystore.p12.

Table 13.6. service.type.sites.local.encryption.routerKeyStore

FieldDescription

secretName

Specifies the secret that contains a keystore that router pods can use to encrypt and decrypt RELAY messages. This field is required.

alias

Optionally specifies the alias of the certificate in the keystore. The default value is router.

filename

Optionally specifies the filename of the keystore. The default value is keystore.p12.

Table 13.7. service.type.sites.local.encryption.trustStore

FieldDescription

secretName

Specifies the secret that contains a trust store to verify public certificates for relay pods and router pods. This field is required.

filename

Optionally specifies the filename of the trust store. The default value is truststore.p12.

13.7.2. Cross-site encryption secrets

Cross-site replication encryption secrets add keystores and trust store for securing cross-site connections.

Cross-site encryption secrets

apiVersion: v1
kind: Secret
metadata:
  name: tls-secret
type: Opaque
stringData:
  password: changeme
  type: pkcs12
data:
  <file-name>: "MIIKDgIBAzCCCdQGCSqGSIb3DQEHA..."

FieldDescription

stringData.password

Specifies the password for the keystore or trust store.

stringData.type

Optionally specifies the keystore or trust store type. The default value is pkcs12.

data.<file-name>

Adds a base64-encoded keystore or trust store.

13.8. Configuring sites in the same OpenShift cluster

For evaluation and demonstration purposes, you can configure Data Grid to back up between pods in the same OpenShift cluster.

Important

Using ClusterIP as the expose type for cross-site replication is intended for demonstration purposes only. It would be appropriate to use this expose type only to perform a temporary proof-of-concept deployment on a laptop or something of that nature.

Procedure

  1. Create an Infinispan CR for each Data Grid cluster.
  2. Specify the name of the local site with spec.service.sites.local.name.
  3. Set ClusterIP as the value of the spec.service.sites.local.expose.type field.
  4. Provide the name of the Data Grid cluster that acts as a backup location with spec.service.sites.locations.clusterName.
  5. If both Data Grid clusters have the same name, specify the namespace of the backup location with spec.service.sites.locations.namespace.

    apiVersion: infinispan.org/v1
    kind: Infinispan
    metadata:
      name: example-clustera
    spec:
      replicas: 1
      expose:
        type: LoadBalancer
      service:
        type: DataGrid
        sites:
          local:
            name: SiteA
            expose:
              type: ClusterIP
            maxRelayNodes: 1
          locations:
            - name: SiteB
              clusterName: example-clusterb
              namespace: cluster-namespace
  6. Configure your Infinispan CRs with any other Data Grid service resources and then apply the changes.
  7. Verify that Data Grid clusters form a cross-site view.

    1. Retrieve the Infinispan CR.

      oc get infinispan -o yaml
    2. Check for the type: CrossSiteViewFormed condition.