Chapter 9. Configuring and Cleaning the Data Container

Configure Data Grid to evict and expire cache entries, keeping only recently active entries in memory and protecting the size of the data container.

9.1. Eviction and Expiration Overview

Eviction and expiration are two strategies that have similar results in that they remove old, unused entries. Although eviction and expiration are similar, they have some important differences that you should take into account when planning your configuration.

  • ✓ Eviction prevents Data Grid from exceeding the maximum size of the data container. Data Grid performs eviction when you add entries to the cache.
  • ✓ Expiration limits the amount of time entries can exist. Data Grid uses a scheduler to periodically remove expired entries. Entries that are expired but not yet removed are immediately removed on access; in this case get() calls for expired entries return "null" values.
  • ✓ Eviction is local to Data Grid nodes.
  • ✓ Expiration takes place across Data Grid clusters.
  • ✓ You can use eviction and expiration together or independently of each other.
  • ✓ You can configure eviction and expiration declaratively in infinispan.xml to apply cache-wide defaults for entries.
  • ✓ You can explicitly define expiration settings for specific entries but you cannot define eviction on a per-entry basis.
  • ✓ You can manually evict entries and manually trigger expiration.

Data containers

In the context of eviction and expiration, the term "data container" refers to where in-memory data is stored, which is either on or off the JVM heap.

9.2. Eviction

Eviction provides a way to manage how much memory Data Grid uses.

Data Grid lets you configure the maximum size of the data container. Eviction removes entries from the cache to ensure that Data Grid does not exceed that maximum size.

Note

Eviction removes entries from memory but not from persistent cache stores.

9.2.1. How Eviction Works

Data Grid eviction removes entries from the data container to make space when adding new entries.

Important

To prevent data loss, you should always configure a persistent cache store if you enable eviction.

Data Grid eviction relies on two configurations:

  • Size of the data container.
  • Eviction strategy.

Calculating data container size

You configure the maximum size of the data container and specify if Data Grid stores cache entries as:

  • Object in the Java heap.
  • Binary byte[] in the Java heap.
  • Bytes in native memory (off-heap).
Storage typeSize of the data container is calculated as:

Object

Number of entries.

Binary

Number of entries, if the eviction type is COUNT.
Amount of memory, in bytes, if the eviction type is MEMORY.

Off-heap

Number of entries, if the eviction type is COUNT.
Amount of memory, in bytes, if the eviction type is MEMORY.

Note

When using MEMORY, Data Grid can determine only an approximate size of data containers, which is optimized for the HotSpot JVM.

When using MEMORY with off-heap storage, the calculation is a closer approximation than on heap.

Evicting cache entries

When an entry is added or modified in the data container, Data Grid compares the current eviction size to the maximum size. If the current size exceeds the maximum, Data Grid evicts entries.

Eviction happens immediately in the thread that adds an entry that exceeds the maximum size.

For example, consider the following configuration:

<memory>
  <object size="50" />
</memory>

In this case, entries are stored as objects and the data container has a maximum size of 50 entries.

If 50 entries are in the data container, and a put() request attempts to create a new entry, Data Grid performs eviction.

Eviction strategies

Strategies control how Data Grid performs eviction. You can either perform eviction manually or configure Data Grid to do one of the following:

  • Remove old entries to make space for new ones.
  • Throw ContainerFullException and prevent new entries from being created.

    The exception eviction strategy works only with transactional caches that use 2 phase commits; not with 1 phase commits or synchronization optimizations.

Note

Data Grid includes the Caffeine caching library that implements a variation of the Least Frequently Used (LFU) cache replacement algorithm known as TinyLFU. For off-heap storage, Data Grid uses a custom implementation of the Least Recently Used (LRU) algorithm.

9.2.2. Eviction Examples

You configure eviction in infinispan.xml as part of your cache definition.

Default memory configuration

Eviction is not enabled, which is the default configuration. Data Grid stores cache entries as objects in the data container.

<memory />

Passivation with eviction

Passivation persists data to cache stores when Data Grid evicts entries. You should always enable eviction if you enable passivation.

<persistence passivation="true">
  ...
</persistence>

<memory />

Manual eviction

Data Grid stores cache entries as objects. Eviction is not enabled but performed manually using the evict() method.

<memory>
  <object strategy="MANUAL" />
</memory>

Object storage with eviction

Data Grid stores cache entries as objects. Eviction happens when there are 100 entries in the data container and Data Grid gets a request to create a new entry:

<memory>
  <object size="100" />
</memory>

Binary storage with memory-based eviction

Data Grid stores cache entries as bytes. Eviction happens when the size of the data container reaches 100 bytes and Data Grid gets a request to create a new entry:

<memory>
  <binary size="100" eviction="MEMORY"/>
</memory>

Off-heap storage with count-based eviction

Data Grid stores cache entries as bytes in native memory. Eviction happens when there are 100 entries in the data container and Data Grid gets a request to create a new entry:

<memory>
  <off-heap size="100" />
</memory>

Off-heap storage with the exception strategy

Data Grid stores cache entries as bytes in native memory. When there are 100 entries in the data container, and Data Grid gets a request to create a new entry, it throws an exception and does not allow the new entry:

<memory>
  <off-heap size="100" strategy="EXCEPTION" />
</memory>

9.2.3. Custom Classes with Memory-Based Eviction

You must use binary or off-heap storage memory based eviction, as in the following examples:

Declarative configuration

<!-- Enable memory based eviction with 1 GB/> -->
<memory>
   <binary size="1000000000" eviction="MEMORY"/>
</memory>

Programmatic configuration

Configuration c = new ConfigurationBuilder()
               .memory()
               .storageType(StorageType.BINARY)
               .evictionType(EvictionType.MEMORY)
               .size(1_000_000_000)
               .build();

9.3. Expiration

Expiration removes entries from caches when they reach one of the following time limits:

Lifespan
Sets the maximum amount of time that entries can exist.
Maximum idle

Specifies how long entries can remain idle. If operations do not occur for entries, they become idle.

Important

Maximum idle expiration does not currently support cache configurations with persistent cache stores.

When using expiration with an exception-based eviction policy, entries that are expired but not yet removed from the cache count towards the size of the data container.

9.3.1. How Expiration Works

When you configure expiration, Data Grid stores keys with metadata that determines when entries expire.

  • Lifespan uses a creation timestamp and the value for the lifespan configuration property.
  • Maximum idle uses a last used timestamp and the value for the max-idle configuration property.

Data Grid checks if lifespan or maximum idle metadata is set and then compares the values with the current time.

If (creation + lifespan > currentTime) or (lastUsed + maxIdle > currentTime) then Data Grid detects that the entry is expired.

Expiration occurs whenever entries are accessed or found by the expiration reaper.

For example, k1 reaches the maximum idle time and a client makes a Cache.get(k1) request. In this case, Data Grid detects that the entry is expired and removes it from the data container. The Cache.get() returns null.

Data Grid also expires entries from cache stores, but only with lifespan expiration. Maximum idle expiration does not work with cache stores. In the case of cache loaders, Data Grid cannot expire entries because loaders can only read from external storage.

Note

Data Grid adds expiration metadata as long primitive data types to cache entries. This can increase the size of keys by as much as 32 bytes.

9.3.2. Expiration Reaper

Data Grid uses a reaper thread that runs periodically to detect and remove expired entries. The expiration reaper ensures that expired entries that are no longer accessed are removed.

The Data Grid ExpirationManager interface handles the expiration reaper and exposes the processExpiration() method.

In some cases, you can disable the expiration reaper and manually expire entries by calling processExpiration(); for instance, if you are using local cache mode with a custom application where a maintenance thread runs periodically.

Important

If you use clustered cache modes, you should never disable the expiration reaper.

Data Grid always uses the expiration reaper when using cache stores. In this case you cannot disable it.

9.3.3. Maximum Idle and Clustered Caches

Because maximum idle expiration relies on the last access time for cache entries, it has some limitations with clustered cache modes.

With lifespan expiration, the creation time for cache entries provides a value that is consistent across clustered caches. For example, the creation time for k1 is always the same on all nodes.

For maximum idle expiration with clustered caches, last access time for entries is not always the same on all nodes. To ensure that entries have the same relative access times across clusters, Data Grid sends touch commands to all owners when keys are accessed.

The touch commands that Data Grid send have the following considerations:

  • Cache.get() requests do not return until all touch commands complete. This synchronous behavior increases latency of client requests.
  • The touch command also updates the "recently accessed" metadata for cache entries on all owners, which Data Grid uses for eviction.

Additional information

  • Maximum idle expiration does not work with invalidation mode.
  • Iteration across a clustered cache can return expired entries that have exceeded the maximum idle time limit. This behavior ensures performance because no remote invocations are performed during the iteration. Also note that iteration does not refresh any expired entries.

9.3.4. Expiration Examples

When you configure Data Grid to expire entries, you can set lifespan and maximum idle times for:

  • All entries in a cache (cache-wide). You can configure cache-wide expiration in infinispan.xml or programmatically using the ConfigurationBuilder.
  • Per entry, which takes priority over cache-wide expiration values. You configure expiration for specific entries when you create them.
Note

When you explicitly define lifespan and maximum idle time values for cache entries, Data Grid replicates those values across the cluster along with the cache entries. Likewise, Data Grid persists expiration values along with the entries if you configure cache stores.

Configuring expiration for all cache entries

Expire all cache entries after 2 seconds:

<expiration lifespan="2000" />

Expire all cache entries 1 second after last access time:

<expiration max-idle="1000" />

Disable the expiration reaper with the interval attribute and manually expire entries 1 second after last access time:

<expiration max-idle="1000" interval="-1" />

Expire all cache entries after 5 seconds or 1 second after the last access time, whichever happens first:

<expiration lifespan="5000" max-idle="1000" />

Configuring expiration when creating cache entries

The following example shows how to configure lifespan and maximum idle values when creating cache entries:

// Use the cache-wide expiration configuration.
cache.put("pinot noir", pinotNoirPrice); 1

// Define a lifespan value of 2.
cache.put("chardonnay", chardonnayPrice, 2, TimeUnit.SECONDS); 2

// Define a lifespan value of -1 (disabled) and a max-idle value of 1.
cache.put("pinot grigio", pinotGrigioPrice,
          -1, TimeUnit.SECONDS, 1, TimeUnit.SECONDS); 3

// Define a lifespan value of 5 and a max-idle value of 1.
cache.put("riesling", rieslingPrice,
          5, TimeUnit.SECONDS, 1, TimeUnit.SECONDS); 4

If the Data Grid configuration defines a lifespan value of 1000 for all entries, the preceding Cache.put() requests cause the entries to expire:

1
After 1 second.
2
After 2 seconds.
3
1 second after last access time.
4
After 5 seconds or 1 second after the last access time, whichever happens first.