Chapter 2. The Embedded CacheManager

The CacheManager is Red Hat Data Grid’s main entry point. You use a CacheManager to

  • configure and obtain caches
  • manage and monitor your nodes
  • execute code across a cluster
  • more…​

Depending on whether you are embedding Red Hat Data Grid in your application or you are using it remotely, you will be dealing with either an EmbeddedCacheManager or a RemoteCacheManager. While they share some methods and properties, be aware that there are semantic differences between them. The following chapters focus mostly on the embedded implementation. For details on the remote implementation refer to Hot Rod Java Client.

CacheManagers are heavyweight objects, and we foresee no more than one CacheManager being used per JVM (unless specific setups require more than one; but either way, this would be a minimal and finite number of instances).

The simplest way to create a CacheManager is:

EmbeddedCacheManager manager = new DefaultCacheManager();

which starts the most basic, local mode, non-clustered cache manager with no caches. CacheManagers have a lifecycle and the default constructors also call start(). Overloaded versions of the constructors are available, that do not start the CacheManager, although keep in mind that CacheManagers need to be started before they can be used to create Cache instances.

Once constructed, CacheManagers should be made available to any component that require to interact with it via some form of application-wide scope such as JNDI, a ServletContext or via some other mechanism such as an IoC container.

When you are done with a CacheManager, you must stop it so that it can release its resources:

manager.stop();

This will ensure all caches within its scope are properly stopped, thread pools are shutdown. If the CacheManager was clustered it will also leave the cluster gracefully.

2.1. Configuration

Red Hat Data Grid offers both declarative and programmatic configuration.

2.1.1. Configuring caches declaratively

Declarative configuration comes in a form of XML document that adheres to a provided Red Hat Data Grid configuration XML schema.

Every aspect of Red Hat Data Grid that can be configured declaratively can also be configured programmatically. In fact, declarative configuration, behind the scenes, invokes the programmatic configuration API as the XML configuration file is being processed. One can even use a combination of these approaches. For example, you can read static XML configuration files and at runtime programmatically tune that same configuration. Or you can use a certain static configuration defined in XML as a starting point or template for defining additional configurations in runtime.

There are two main configuration abstractions in Red Hat Data Grid: global and cache.

Global configuration

Global configuration defines global settings shared among all cache instances created by a single EmbeddedCacheManager. Shared resources like thread pools, serialization/marshalling settings, transport and network settings, JMX domains are all part of global configuration.

Cache configuration

Cache configuration is specific to the actual caching domain itself: it specifies eviction, locking, transaction, clustering, persistence etc. You can specify as many named cache configurations as you need. One of these caches can be indicated as the default cache, which is the cache returned by the CacheManager.getCache() API, whereas other named caches are retrieved via the CacheManager.getCache(String name) API.

Whenever they are specified, named caches inherit settings from the default cache while additional behavior can be specified or overridden. Red Hat Data Grid also provides a very flexible inheritance mechanism, where you can define a hierarchy of configuration templates, allowing multiple caches to share the same settings, or overriding specific parameters as necessary.

Note

Embedded and Server configuration use different schemas, but we strive to maintain them as compatible as possible so that you can easily migrate between the two.

One of the major goals of Red Hat Data Grid is to aim for zero configuration. A simple XML configuration file containing nothing more than a single infinispan element is enough to get you started. The configuration file listed below provides sensible defaults and is perfectly valid.

infinispan.xml

<infinispan />

However, that would only give you the most basic, local mode, non-clustered cache manager with no caches. Non-basic configurations are very likely to use customized global and default cache elements.

Declarative configuration is the most common approach to configuring Red Hat Data Grid cache instances. In order to read XML configuration files one would typically construct an instance of DefaultCacheManager by pointing to an XML file containing Red Hat Data Grid configuration. Once the configuration file is read you can obtain reference to the default cache instance.

EmbeddedCacheManager manager = new DefaultCacheManager("my-config-file.xml");
Cache defaultCache = manager.getCache();

or any other named instance specified in my-config-file.xml.

Cache someNamedCache = manager.getCache("someNamedCache");

The name of the default cache is defined in the <cache-container> element of the XML configuration file, and additional caches can be configured using the <local-cache>,<distributed-cache>,<invalidation-cache> or <replicated-cache> elements.

The following example shows the simplest possible configuration for each of the cache types supported by Red Hat Data Grid:

<infinispan>
   <cache-container default-cache="local">
      <transport cluster="mycluster"/>
      <local-cache name="local"/>
      <invalidation-cache name="invalidation" mode="SYNC"/>
      <replicated-cache name="repl-sync" mode="SYNC"/>
      <distributed-cache name="dist-sync" mode="SYNC"/>
   </cache-container>
</infinispan>

2.1.1.1. Cache configuration templates

As mentioned above, Red Hat Data Grid supports the notion of configuration templates. These are full or partial configuration declarations which can be shared among multiple caches or as the basis for more complex configurations.

The following example shows how a configuration named local-template is used to define a cache named local.

<infinispan>
   <cache-container default-cache="local">
      <!-- template configurations -->
      <local-cache-configuration name="local-template">
         <expiration interval="10000" lifespan="10" max-idle="10"/>
      </local-cache-configuration>

      <!-- cache definitions -->
      <local-cache name="local" configuration="local-template" />
   </cache-container>
</infinispan>

Templates can inherit from previously defined templates, augmenting and/or overriding some or all of the configuration elements:

<infinispan>
   <cache-container default-cache="local">
      <!-- template configurations -->
      <local-cache-configuration name="base-template">
         <expiration interval="10000" lifespan="10" max-idle="10"/>
      </local-cache-configuration>

      <local-cache-configuration name="extended-template" configuration="base-template">
         <expiration lifespan="20"/>
         <memory>
            <object size="2000"/>
         </memory>
      </local-cache-configuration>

      <!-- cache definitions -->
      <local-cache name="local" configuration="base-template" />
      <local-cache name="local-bounded" configuration="extended-template" />
   </cache-container>
</infinispan>

In the above example, base-template defines a local cache with a specific expiration configuration. The extended-template configuration inherits from base-template, overriding just a single parameter of the expiration element (all other attributes are inherited) and adds a memory element. Finally, two caches are defined: local which uses the base-template configuration and local-bounded which uses the extended-template configuration.

Warning

Be aware that for multi-valued elements (such as properties) the inheritance is additive, i.e. the child configuration will be the result of merging the properties from the parent and its own.

2.1.1.2. Cache configuration wildcards

An alternative way to apply templates to caches is to use wildcards in the template name, e.g. basecache*. Any cache whose name matches the template wildcard will inherit that configuration.

<infinispan>
    <cache-container>
        <local-cache-configuration name="basecache*">
            <expiration interval="10500" lifespan="11" max-idle="11"/>
        </local-cache-configuration>
        <local-cache name="basecache-1"/>
        <local-cache name="basecache-2"/>
    </cache-container>
</infinispan>

Above, caches basecache-1 and basecache-2 will use the basecache* configuration. The configuration will also be applied when retrieving undefined caches programmatically.

Note

If a cache name matches multiple wildcards, i.e. it is ambiguous, an exception will be thrown.

2.1.1.3. Declarative configuration reference

For more details on the declarative configuration schema, refer to the configuration reference. If you are using XML editing tools for configuration writing you can use the provided Red Hat Data Grid schema to assist you.

2.1.2. Configuring caches programmatically

Programmatic Red Hat Data Grid configuration is centered around the CacheManager and ConfigurationBuilder API. Although every single aspect of Red Hat Data Grid configuration could be set programmatically, the most usual approach is to create a starting point in a form of XML configuration file and then in runtime, if needed, programmatically tune a specific configuration to suit the use case best.

EmbeddedCacheManager manager = new DefaultCacheManager("my-config-file.xml");
Cache defaultCache = manager.getCache();

Let’s assume that a new synchronously replicated cache is to be configured programmatically. First, a fresh instance of Configuration object is created using ConfigurationBuilder helper object, and the cache mode is set to synchronous replication. Finally, the configuration is defined/registered with a manager.

Configuration c = new ConfigurationBuilder().clustering().cacheMode(CacheMode.REPL_SYNC).build();

String newCacheName = "repl";
manager.defineConfiguration(newCacheName, c);
Cache<String, String> cache = manager.getCache(newCacheName);

The default cache configuration (or any other cache configuration) can be used as a starting point for creation of a new cache. For example, lets say that infinispan-config-file.xml specifies a replicated cache as a default and that a distributed cache is desired with a specific L1 lifespan while at the same time retaining all other aspects of a default cache. Therefore, the starting point would be to read an instance of a default Configuration object and use ConfigurationBuilder to construct and modify cache mode and L1 lifespan on a new Configuration object. As a final step the configuration is defined/registered with a manager.

EmbeddedCacheManager manager = new DefaultCacheManager("infinispan-config-file.xml");
Configuration dcc = manager.getDefaultCacheConfiguration();
Configuration c = new ConfigurationBuilder().read(dcc).clustering().cacheMode(CacheMode.DIST_SYNC).l1().lifespan(60000L).build();
 
String newCacheName = "distributedWithL1";
manager.defineConfiguration(newCacheName, c);
Cache<String, String> cache = manager.getCache(newCacheName);

As long as the base configuration is the default named cache, the previous code works perfectly fine. However, other times the base configuration might be another named cache. So, how can new configurations be defined based on other defined caches? Take the previous example and imagine that instead of taking the default cache as base, a named cache called "replicatedCache" is used as base. The code would look something like this:

EmbeddedCacheManager manager = new DefaultCacheManager("infinispan-config-file.xml");
Configuration rc = manager.getCacheConfiguration("replicatedCache");
Configuration c = new ConfigurationBuilder().read(rc).clustering().cacheMode(CacheMode.DIST_SYNC).l1().lifespan(60000L).build();
 
String newCacheName = "distributedWithL1";
manager.defineConfiguration(newCacheName, c);
Cache<String, String> cache = manager.getCache(newCacheName);

Refer to CacheManager , ConfigurationBuilder , Configuration , and GlobalConfiguration javadocs for more details.

2.1.2.1. ConfigurationBuilder Programmatic Configuration API

While the above paragraph shows how to combine declarative and programmatic configuration, starting from an XML configuration is completely optional. The ConfigurationBuilder fluent interface style allows for easier to write and more readable programmatic configuration. This approach can be used for both the global and the cache level configuration. GlobalConfiguration objects are constructed using GlobalConfigurationBuilder while Configuration objects are built using ConfigurationBuilder. Let’s look at some examples on configuring both global and cache level options with this API:

2.1.2.2. Configuring the transport

One of the most commonly configured global option is the transport layer, where you indicate how an Red Hat Data Grid node will discover the others:

GlobalConfiguration globalConfig = new GlobalConfigurationBuilder().transport()
        .defaultTransport()
        .clusterName("qa-cluster")
        .addProperty("configurationFile", "jgroups-tcp.xml")
        .machineId("qa-machine").rackId("qa-rack")
      .build();

2.1.2.3. Using a custom JChannel

If you want to construct the JGroups JChannel by yourself, you can do so.

Note

The JChannel must not be already connected.

GlobalConfigurationBuilder global = new GlobalConfigurationBuilder();
JChannel jchannel = new JChannel();
// Configure the jchannel to your needs.
JGroupsTransport transport = new JGroupsTransport(jchannel);
global.transport().transport(transport);
new DefaultCacheManager(global.build());

2.1.2.4. Enabling JMX MBeans and statistics

Sometimes you might also want to enable collection of global JMX statistics at cache manager level or get information about the transport. To enable global JMX statistics simply do:

GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
  .globalJmxStatistics()
  .enable()
  .build();

Please note that by not enabling (or by explicitly disabling) global JMX statistics your are just turning off statistics collection. The corresponding MBean is still registered and can be used to manage the cache manager in general, but the statistics attributes do not return meaningful values.

Further options at the global JMX statistics level allows you to configure the cache manager name which comes handy when you have multiple cache managers running on the same system, or how to locate the JMX MBean Server:

GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
  .globalJmxStatistics()
    .cacheManagerName("SalesCacheManager")
    .mBeanServerLookup(new JBossMBeanServerLookup())
  .build();

2.1.2.5. Configuring the thread pools

Some of the Red Hat Data Grid features are powered by a group of the thread pool executors which can also be tweaked at this global level. For example:

GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
   .replicationQueueThreadPool()
     .threadPoolFactory(ScheduledThreadPoolExecutorFactory.create())
  .build();

You can not only configure global, cache manager level, options, but you can also configure cache level options such as the cluster mode:

Configuration config = new ConfigurationBuilder()
  .clustering()
    .cacheMode(CacheMode.DIST_SYNC)
    .sync()
    .l1().lifespan(25000L)
    .hash().numOwners(3)
  .build();

Or you can configure eviction and expiration settings:

Configuration config = new ConfigurationBuilder()
           .memory()
             .size(20000)
          .expiration()
             .wakeUpInterval(5000L)
             .maxIdle(120000L)
           .build();

2.1.2.6. Configuring transactions and locking

An application might also want to interact with an Red Hat Data Grid cache within the boundaries of JTA and to do that you need to configure the transaction layer and optionally tweak the locking settings. When interacting with transactional caches, you might want to enable recovery to deal with transactions that finished with an heuristic outcome and if you do that, you will often want to enable JMX management and statistics gathering too:

Configuration config = new ConfigurationBuilder()
  .locking()
    .concurrencyLevel(10000).isolationLevel(IsolationLevel.REPEATABLE_READ)
    .lockAcquisitionTimeout(12000L).useLockStriping(false).writeSkewCheck(true)
    .versioning().enable().scheme(VersioningScheme.SIMPLE)
  .transaction()
    .transactionManagerLookup(new GenericTransactionManagerLookup())
    .recovery()
  .jmxStatistics()
  .build();

2.1.2.7. Configuring cache stores

Configuring Red Hat Data Grid with chained cache stores is simple too:

Configuration config = new ConfigurationBuilder()
   .persistence().passivation(false)
   .addSingleFileStore().location("/tmp").async().enable()
   .preload(false).shared(false).threadPoolSize(20).build();

2.1.2.8. Advanced programmatic configuration

The fluent configuration can also be used to configure more advanced or exotic options, such as advanced externalizers:

GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
  .serialization()
    .addAdvancedExternalizer(998, new PersonExternalizer())
    .addAdvancedExternalizer(999, new AddressExternalizer())
  .build();

Or, add custom interceptors:

Configuration config = new ConfigurationBuilder()
  .customInterceptors().addInterceptor()
    .interceptor(new FirstInterceptor()).position(InterceptorConfiguration.Position.FIRST)
    .interceptor(new LastInterceptor()).position(InterceptorConfiguration.Position.LAST)
    .interceptor(new FixPositionInterceptor()).index(8)
    .interceptor(new AfterInterceptor()).after(NonTransactionalLockingInterceptor.class)
    .interceptor(new BeforeInterceptor()).before(CallInterceptor.class)
  .build();

For information on the individual configuration options, please check the configuration guide.

2.1.3. Configuration Migration Tools

The configuration format of Red Hat Data Grid has changed since schema version 6.0 in order to align the embedded schema with the one used by the server. For this reason, when upgrading to schema 7.x or later, you should use the configuration converter included in the all distribution. Simply invoke it from the command-line passing the old configuration file as the first parameter and the name of the converted file as the second parameter.

To convert on Unix/Linux/macOS:

bin/config-converter.sh oldconfig.xml newconfig.xml

on Windows:

bin\config-converter.bat oldconfig.xml newconfig.xml
Tip

If you wish to help write conversion tools from other caching systems, please contact infinispan-dev.

2.1.4. Clustered Configuration

Red Hat Data Grid uses JGroups for network communications when in clustered mode. Red Hat Data Grid ships with pre-configured JGroups stacks that make it easy for you to jump-start a clustered configuration.

2.1.4.1. Using an external JGroups file

If you are configuring your cache programmatically, all you need to do is:

GlobalConfiguration gc = new GlobalConfigurationBuilder()
   .transport().defaultTransport()
   .addProperty("configurationFile", "jgroups.xml")
   .build();

and if you happen to use an XML file to configure Red Hat Data Grid, just use:

<infinispan>
  <jgroups>
     <stack-file name="external-file" path="jgroups.xml"/>
  </jgroups>
  <cache-container default-cache="replicatedCache">
    <transport stack="external-file" />
    <replicated-cache name="replicatedCache"/>
  </cache-container>

  ...

</infinispan>

In both cases above, Red Hat Data Grid looks for jgroups.xml first in your classpath, and then for an absolute path name if not found in the classpath.

2.1.4.2. Use one of the pre-configured JGroups files

Red Hat Data Grid ships with a few different JGroups files (packaged in infinispan-core.jar) which means they will already be on your classpath by default. All you need to do is specify the file name, e.g., instead of jgroups.xml above, specify /default-configs/default-jgroups-tcp.xml.

The configurations available are:

  • default-jgroups-udp.xml - Uses UDP as a transport, and UDP multicast for discovery. Usually suitable for larger (over 100 nodes) clusters or if you are using replication or invalidation. Minimises opening too many sockets.
  • default-jgroups-tcp.xml - Uses TCP as a transport and UDP multicast for discovery. Better for smaller clusters (under 100 nodes) only if you are using distribution, as TCP is more efficient as a point-to-point protocol
  • default-jgroups-ec2.xml - Uses TCP as a transport and S3_PING for discovery. Suitable on Amazon EC2 nodes where UDP multicast isn’t available.
  • default-jgroups-kubernetes.xml - Uses TCP as a transport and KUBE_PING for discovery. Suitable on Kubernetes and OpenShift nodes where UDP multicast is not always available.
2.1.4.2.1. Tuning JGroups settings

The settings above can be further tuned without editing the XML files themselves. Passing in certain system properties to your JVM at startup can affect the behaviour of some of these settings. The table below shows you which settings can be configured in this way. E.g.,

$ java -cp ... -Djgroups.tcp.port=1234 -Djgroups.tcp.address=10.11.12.13

Table 2.1. default-jgroups-udp.xml

System Property

Description

Default

Required?

jgroups.udp.mcast_addr

IP address to use for multicast (both for communications and discovery). Must be a valid Class D IP address, suitable for IP multicast.

228.6.7.8

No

jgroups.udp.mcast_port

Port to use for multicast socket

46655

No

jgroups.udp.ip_ttl

Specifies the time-to-live (TTL) for IP multicast packets. The value here refers to the number of network hops a packet is allowed to make before it is dropped

2

No

Table 2.2. default-jgroups-tcp.xml

System Property

Description

Default

Required?

jgroups.tcp.address

IP address to use for the TCP transport.

127.0.0.1

No

jgroups.tcp.port

Port to use for TCP socket

7800

No

jgroups.udp.mcast_addr

IP address to use for multicast (for discovery). Must be a valid Class D IP address, suitable for IP multicast.

228.6.7.8

No

jgroups.udp.mcast_port

Port to use for multicast socket

46655

No

jgroups.udp.ip_ttl

Specifies the time-to-live (TTL) for IP multicast packets. The value here refers to the number of network hops a packet is allowed to make before it is dropped

2

No

Table 2.3. default-jgroups-ec2.xml

System Property

Description

Default

Required?

jgroups.tcp.address

IP address to use for the TCP transport.

127.0.0.1

No

jgroups.tcp.port

Port to use for TCP socket

7800

No

jgroups.s3.access_key

The Amazon S3 access key used to access an S3 bucket

 

No

jgroups.s3.secret_access_key

The Amazon S3 secret key used to access an S3 bucket

 

No

jgroups.s3.bucket

Name of the Amazon S3 bucket to use. Must be unique and must already exist

 

No

Table 2.4. default-jgroups-kubernetes.xml

System Property

Description

Default

Required?

jgroups.tcp.address

IP address to use for the TCP transport.

eth0

No

jgroups.tcp.port

Port to use for TCP socket

7800

No

2.1.4.3. Further reading

JGroups also supports more system property overrides, details of which can be found on this page: SystemProps

In addition, the JGroups configuration files shipped with Red Hat Data Grid are intended as a jumping off point to getting something up and running, and working. More often than not though, you will want to fine-tune your JGroups stack further to extract every ounce of performance from your network equipment. For this, your next stop should be the JGroups manual which has a detailed section on configuring each of the protocols you see in a JGroups configuration file.

2.2. Obtaining caches

After you configure the CacheManager, you can obtain and control caches.

Invoke the getCache() method to obtain caches, as follows:

Cache<String, String> myCache = manager.getCache("myCache");

The preceding operation creates a cache named myCache, if it does not already exist, and returns it.

Using the getCache() method creates the cache only on the node where you invoke the method. In other words, it performs a local operation that must be invoked on each node across the cluster. Typically, applications deployed across multiple nodes obtain caches during initialization to ensure that caches are symmetric and exist on each node.

Invoke the createCache() method to create caches dynamically across the entire cluster, as follows:

Cache<String, String> myCache = manager.administration().createCache("myCache", "myTemplate");

The preceding operation also automatically creates caches on any nodes that subsequently join the cluster.

Caches that you create with the createCache() method are ephemeral by default. If the entire cluster shuts down, the cache is not automatically created again when it restarts.

Use the PERMANENT flag to ensure that caches can survive restarts, as follows:

Cache<String, String> myCache = manager.administration().withFlags(AdminFlag.PERMANENT).createCache("myCache", "myTemplate");

For the PERMANENT flag to take effect, you must enable global state and set a configuration storage provider.

For more information about configuration storage providers, see GlobalStateConfigurationBuilder#configurationStorage().

2.3. Clustering Information

The EmbeddedCacheManager has quite a few methods to provide information as to how the cluster is operating. The following methods only really make sense when being used in a clustered environment (that is when a Transport is configured).

2.3.1. Member Information

When you are using a cluster it is very important to be able to find information about membership in the cluster including who is the owner of the cluster.

getMembers()

The getMembers() method returns all of the nodes in the current cluster.

getCoordinator()

The getCoordinator() method will tell you which one of the members is the coordinator of the cluster. For most intents you shouldn’t need to care who the coordinator is. You can use isCoordinator() method directly to see if the local node is the coordinator as well.

2.3.2. Other methods

getTransport()

This method provides you access to the underlying Transport that is used to send messages to other nodes. In most cases a user wouldn’t ever need to go to this level, but if you want to get Transport specific information (in this case JGroups) you can use this mechanism.

getStats()

The stats provided here are coalesced from all of the active caches in this manager. These stats can be useful to see if there is something wrong going on with your cluster overall.