Chapter 4. Eviction and Data Container

Red Hat Data Grid supports eviction of entries, such that you do not run out of memory. Eviction is typically used in conjunction with a cache store, so that entries are not permanently lost when evicted, since eviction only removes entries from memory and not from cache stores or the rest of the cluster.

Red Hat Data Grid supports storing data in a few different formats. Data can be stored as the object iself, binary as a byte[], and off-heap which stores the byte[] in native memory.

Tip

Passivation is also a popular option when using eviction, so that only a single copy of an entry is maintained - either in memory or in a cache store, but not both. The main benefit of using passivation over a regular cache store is that updates to entries which exist in memory are cheaper since the update doesn’t need to be made to the cache store as well.

Important

Eviction occurs on a local basis, and is not cluster-wide. Each node runs an eviction thread to analyse the contents of its in-memory container and decide what to evict. Eviction does not take into account the amount of free memory in the JVM as threshold to starts evicting entries. You have to set size attribute of the eviction element to be greater than zero in order for eviction to be turned on. If size is too large you can run out of memory. The size attribute will probably take some tuning in each use case.

4.1. Enabling Eviction

Eviction is configured by adding the <memory /> element to your <*-cache /> configuration sections or using MemoryConfigurationBuilder API programmatic approach.

All cache entry are evicted by piggybacking on user threads that are hitting the cache.

4.1.1. Eviction strategy

Strategies control how the eviction is handled.

The possible choices are

NONE

Eviction is not enabled and it is assumed that the user will not invoke evict directly on the cache. If passivation is enabled this will cause aa warning message to be emitted. This is the default strategy.

MANUAL

This strategy is just like <b>NONE</b> except that it assumes the user will be invoking evict directly. This way if passivation is enabled no warning message is logged.

REMOVE

This strategy will actually evict "old" entries to make room for incoming ones.

Eviction is handled by Caffeine utilizing the TinyLFU algorithm with an additional admission window. This was chosen as provides high hit rate while also requiring low memory overhead. This provides a better hit ratio than LRU while also requiring less memory than LIRS.

EXCEPTION

This strategy actually prevents new entries from being created by throwing a ContainerFullException. This strategy only works with transactional caches that always run with 2 phase commit, that is no 1 phase commit or synchronization optimizations allowed.

4.1.2. Eviction types

Eviction type applies only when the size is set to something greater than 0. The eviction type below determines when the container will decide to remove entries.

COUNT

This type of eviction will remove entries based on how many there are in the cache. Once the count of entries has grown larger than the size then an entry will be removed to make room.

MEMORY

This type of eviction will estimate how much each entry will take up in memory and will remove an entry when the total size of all entries is larger than the configured size. This type does not work with OBJECT storage type below.

4.1.3. Storage type

Red Hat Data Grid allows the user to configure in what form their data is stored. Each form supports the same features of Red Hat Data Grid, however eviction can be limited for some forms. There are currently three storage formats that Red Hat Data Grid provides, they are:

OBJECT

Stores the keys and values as objects in the Java heap Only COUNT eviction type is supported.

BINARY

Stores the keys and values as a byte[] in the Java heap. This will use the configured marshaller for the cache if there is one. Both COUNT and MEMORY eviction types are supported.

OFF-HEAP

Stores the keys and values in native memory outside of the Java heap as bytes. The configured marshaller will be used if the cache has one. Both COUNT and MEMORY eviction types are supported.

Warning

Both BINARY and OFF-HEAP violate equality and hashCode that they are dictated by the resulting byte[] they generate instead of the object instance.

4.1.4. More defaults

By default when no <memory /> element is specified, no eviction takes place, OBJECT storage type is used, and a strategy of NONE is assumed.

In case there is an memory element, this table describes the behaviour of eviction based on information provided in the xml configuration ("-" in Supplied size or Supplied strategy column means that the attribute wasn’t supplied)

Supplied sizeExampleEviction behaviour

-

<memory />

no eviction as an object

-

<memory> <object strategy="MANUAL" /> </memory>

no eviction as an object and won’t log warning if passivation is enabled

> 0

<memory> <object size="100" /> </memory>

eviction takes place and stored as objects

> 0

<memory> <binary size="100" eviction="MEMORY"/> </memory>

eviction takes place and stored as a binary removing to make sure memory doens’t go higher than 100

> 0

<memory> <off-heap size="100" /> </memory>

eviction takes place and stored in off-heap

> 0

<memory> <off-heap size="100" strategy="EXCEPTION" /> </memory>

entries are stored in off-heap and if 100 entries are in container exceptions will be thrown for additional

0

<memory> <object size="0" /> </memory>

no eviction

< 0

<memory> <object size="-1" /> </memory>

no eviction

4.2. Expiration

Similar to, but unlike eviction, is expiration. Expiration allows you to attach lifespan and/or maximum idle times to entries. Entries that exceed these times are treated as invalid and are removed. When removed expired entries are not passivated like evicted entries (if passivation is turned on).

Tip

Unlike eviction, expired entries are removed globally - from memory, cache stores, and cluster-wide.

By default entries created are immortal and do not have a lifespan or maximum idle time. Using the cache API, mortal entries can be created with lifespans and/or maximum idle times. Further, default lifespans and/or maximum idle times can be configured by adding the <expiration /> element to your <*-cache /> configuration sections.

When an entry expires it resides in the data container or cache store until it is accessed again by a user request. An expiration reaper is also available to check for expired entries and remove them at a configurable interval of milliseconds.

You can enable the expiration reaper declaratively with the reaper-interval attribute or programmatically with the enableReaper method in the ExpirationConfigurationBuilder class.

Note
  • The expiration reaper cannot be disabled when a cache store is present.
  • When using a maximum idle time in a clustered cache, you should always enable the expiration reaper. For more information, see Clustered Max Idle.

4.2.1. Difference between Eviction and Expiration

Both Eviction and Expiration are means of cleaning the cache of unused entries and thus guarding the heap against OutOfMemory exceptions, so now a brief explanation of the difference.

With eviction you set maximal number of entries you want to keep in the cache and if this limit is exceeded, some candidates are found to be removed according to a choosen eviction strategy (LRU, LIRS, etc…​). Eviction can be setup to work with passivation, which is eviction to a cache store.

With expiration you set time criteria for entries to specify how long you want to keep them in the cache.

lifespan
Specifies how long entries can remain in the cache before they expire. The default value is -1, which is unlimited time.
maximum idle time
Specifies how long entries can remain idle before they expire. An entry in the cache is idle when no operation is performed with the key. The default value is -1, which is unlimited time.

4.3. Expiration details

  1. Expiration is a top-level construct, represented in the configuration as well as in the cache API.
  2. While eviction is local to each cache instance , expiration is cluster-wide . Expiration lifespan and maxIdle values are replicated along with the cache entry.
  3. Maximum idle times for cache entries require additional network messages in clustered environments. For this reason, setting maxIdle in clustered caches can result in slower operation times.
  4. Expiration lifespan and maxIdle are also persisted in CacheStores, so this information survives eviction/passivation.

4.3.1. Maximum Idle Expiration

Maximum idle expiration has different behavior in local and clustered cache environments.

Important

Maximum idle expiration, max-idle, does not currently work with entries stored in off-heap memory. Likewise, max-idle does not work if caches use cache stores as a persistence layer.

4.3.1.1. Local Max Idle

In local cache mode, Red Hat Data Grid expires entries with the maxIdle configuration when:

  • accessed directly (Cache.get()).
  • iterated upon (Cache.size()).
  • the expiration reaper thread runs.

4.3.1.2. Clustered Max Idle

In clustered cache modes, when clients read entries that have max-idle expiration values, Red Hat Data Grid sends touch commands to all owners. This ensures that the entries have the same relative access time across the cluster.

When nodes detect that an entry reaches the maximum idle time, Red Hat Data Grid removes it from the cache and does not return the entry to the client that requested it.

Before using max-idle with clustered cache modes, you should review the following points:

  • Cache.get() does not return until the touch commands complete. This synchronous behavior increases latency of client requests.
  • Clustered max-idle also updates the "recently accessed" metadata for cache entries on all owners, which Red Hat Data Grid uses for eviction.
  • Iteration across a clustered cache returns entries that might be expired with the maximum idle time. This behavior ensures performance because no remote invocations are performed during the iteration. However this does not refresh any expired entries, which are removed by the expiration reaper or when accessed directly (Cache.get()).
Important
  • Clustered caches should always use the expiration reaper with the maxIdle configuration.
  • When using maxIdle expiration with exception-based eviction, entries that are expired but not removed from the cache count towards the size of the data container.

4.3.2. Configuration

Eviction and Expiration may be configured using the programmatic or declarative XML configuration. This configuration is on a per-cache basis. Valid eviction/expiration-related configuration elements are:

<!-- Eviction -->
<memory>
   <object size="2000"/>
</memory>
<!-- Expiration -->
<expiration lifespan="1000" max-idle="500" interval="1000" />

Programmatically, the same would be defined using:

Configuration c = new ConfigurationBuilder()
               .memory().size(2000)
               .expiration().wakeUpInterval(5000l).lifespan(1000l).maxIdle(500l)
               .build();

4.3.3. Memory Based Eviction Configuration

Memory based eviction may require some additional configuration options if you are using your own custom types (as Red Hat Data Grid is normally used). In this case Red Hat Data Grid cannot estimate the memory usage of your classes and as such you are required to use storeAsBinary when memory based eviction is used.

<!-- Enable memory based eviction with 1 GB/>
<memory>
   <binary size="1000000000" eviction="MEMORY"/>
</memory>
Configuration c = new ConfigurationBuilder()
               .memory()
               .storageType(StorageType.BINARY)
               .evictionType(EvictionType.MEMORY)
               .size(1_000_000_000)
               .build();

4.3.4. Default values

Eviction is disabled by default. Default values are used:

  • size: -1 is used if not specified, which means unlimited entries.
  • 0 means no entries, and the eviction thread will strive to keep the cache empty.

Expiration lifespan and maxIdle both default to -1, which means that entries will be created immortal by default. This can be overridden per entry with the API.

4.3.5. Using expiration

Expiration allows you to set either a lifespan or a maximum idle time on each key/value pair stored in the cache. This can either be set cache-wide using the configuration, as described above, or it can be defined per-key/value pair using the Cache interface. Any values defined per key/value pair overrides the cache-wide default for the specific entry in question.

For example, assume the following configuration:

<expiration lifespan="1000" />
// this entry will expire in 1000 millis
cache.put("pinot noir", pinotNoirPrice);

// this entry will expire in 2000 millis
cache.put("chardonnay", chardonnayPrice, 2, TimeUnit.SECONDS);

// this entry will expire 1000 millis after it is last accessed
cache.put("pinot grigio", pinotGrigioPrice, -1,
          TimeUnit.SECONDS, 1, TimeUnit.SECONDS);

// this entry will expire 1000 millis after it is last accessed, or
// in 5000 millis, which ever triggers first
cache.put("riesling", rieslingPrice, 5,
          TimeUnit.SECONDS, 1, TimeUnit.SECONDS);

4.4. Expiration designs

Central to expiration is an ExpirationManager.

The purpose of the ExpirationManager is to drive the expiration thread which periodically purges items from the DataContainer. If the expiration thread is disabled (wakeupInterval set to -1) expiration can be kicked off manually using ExprationManager.processExpiration(), for example from another maintenance thread that may run periodically in your application.

The expiration manager processes expirations in the following manner:

  1. Causes the data container to purge expired entries
  2. Causes cache stores (if any) to purge expired entries