Chapter 5. Configuring JVM memory usage

Control how Data Grid stores data in JVM memory by:

  • Managing JVM memory usage with eviction that automatically removes data from caches.
  • Adding lifespan and maximum idle times to expire entries and prevent stale data.
  • Configuring Data Grid to store data in off-heap, native memory.

5.1. Default memory configuration

By default Data Grid stores cache entries as objects in the JVM heap. Over time, as applications add entries, the size of caches can exceed the amount of memory that is available to the JVM. Likewise, if Data Grid is not the primary data store, then entries become out of date which means your caches contain stale data.

XML

<distributed-cache>
  <memory storage="HEAP"/>
</distributed-cache>

JSON

{
  "distributed-cache": {
    "memory" : {
      "storage": "HEAP"
    }
  }
}

YAML

distributedCache:
  memory:
    storage: "HEAP"

5.2. Eviction and expiration

Eviction and expiration are two strategies for cleaning the data container by removing old, unused entries. Although eviction and expiration are similar, they have some important differences.

  • ✓ Eviction lets Data Grid control the size of the data container by removing entries when the container becomes larger than a configured threshold.
  • ✓ Expiration limits the amount of time entries can exist. Data Grid uses a scheduler to periodically remove expired entries. Entries that are expired but not yet removed are immediately removed on access; in this case get() calls for expired entries return "null" values.
  • ✓ Eviction is local to Data Grid nodes.
  • ✓ Expiration takes place across Data Grid clusters.
  • ✓ You can use eviction and expiration together or independently of each other.
  • ✓ You can configure eviction and expiration declaratively in infinispan.xml to apply cache-wide defaults for entries.
  • ✓ You can explicitly define expiration settings for specific entries but you cannot define eviction on a per-entry basis.
  • ✓ You can manually evict entries and manually trigger expiration.

5.3. Eviction with Data Grid caches

Eviction lets you control the size of the data container by removing entries from memory in one of two ways:

  • Total number of entries (max-count).
  • Maximum amount of memory (max-size).

Eviction drops one entry from the data container at a time and is local to the node on which it occurs.

Important

Eviction removes entries from memory but not from persistent cache stores. To ensure that entries remain available after Data Grid evicts them, and to prevent inconsistencies with your data, you should configure persistent storage.

When you configure memory, Data Grid approximates the current memory usage of the data container. When entries are added or modified, Data Grid compares the current memory usage of the data container to the maximum size. If the size exceeds the maximum, Data Grid performs eviction.

Eviction happens immediately in the thread that adds an entry that exceeds the maximum size.

5.3.1. Eviction strategies

When you configure Data Grid eviction you specify:

  • The maximum size of the data container.
  • A strategy for removing entries when the cache reaches the threshold.

You can either perform eviction manually or configure Data Grid to do one of the following:

  • Remove old entries to make space for new ones.
  • Throw ContainerFullException and prevent new entries from being created.

    The exception eviction strategy works only with transactional caches that use 2 phase commits; not with 1 phase commits or synchronization optimizations.

Refer to the schema reference for more details about the eviction strategies.

Note

Data Grid includes the Caffeine caching library that implements a variation of the Least Frequently Used (LFU) cache replacement algorithm known as TinyLFU. For off-heap storage, Data Grid uses a custom implementation of the Least Recently Used (LRU) algorithm.

5.3.2. Configuring maximum count eviction

Limit the size of Data Grid caches to a total number of entries.

Procedure

  1. Open your Data Grid configuration for editing.
  2. Specify the total number of entries that caches can contain before Data Grid performs eviction with either the max-count attribute or maxCount() method.
  3. Set one of the following as the eviction strategy to control how Data Grid removes entries with the when-full attribute or whenFull() method.

    • REMOVE Data Grid performs eviction. This is the default strategy.
    • MANUAL You perform eviction manually for embedded caches.
    • EXCEPTION Data Grid throws an exception instead of evicting entries.
  4. Save and close your Data Grid configuration.
Maximum count eviction

In the following example, Data Grid removes an entry when the cache contains a total of 500 entries and a new entry is created:

XML

<distributed-cache>
  <memory max-count="500" when-full="REMOVE"/>
</distributed-cache>

JSON

{
  "distributed-cache" : {
    "memory" : {
      "max-count" : "500",
      "when-full" : "REMOVE"
    }
  }
}

YAML

distributedCache:
  memory:
    maxCount: "500"
    whenFull: "REMOVE"

ConfigurationBuilder

ConfigurationBuilder builder = new ConfigurationBuilder();
builder.memory().maxCount(500).whenFull(EvictionStrategy.REMOVE);

5.3.3. Configuring maximum size eviction

Limit the size of Data Grid caches to a maximum amount of memory.

Procedure

  1. Open your Data Grid configuration for editing.
  2. Specify application/x-protostream as the media type for cache encoding.

    You must specify a binary media type to use maximum size eviction.

  3. Configure the maximum amount of memory, in bytes, that caches can use before Data Grid performs eviction with the max-size attribute or maxSize() method.
  4. Optionally specify a byte unit of measurement.

    The default is B (bytes). Refer to the configuration schema for supported units.

  5. Set one of the following as the eviction strategy to control how Data Grid removes entries with either the when-full attribute or whenFull() method.

    • REMOVE Data Grid performs eviction. This is the default strategy.
    • MANUAL You perform eviction manually for embedded caches.
    • EXCEPTION Data Grid throws an exception instead of evicting entries.
  6. Save and close your Data Grid configuration.
Maximum size eviction

In the following example, Data Grid removes an entry when the size of the cache reaches 1.5 GB (gigabytes) and a new entry is created:

XML

<distributed-cache>
  <encoding media-type="application/x-protostream"/>
  <memory max-size="1.5GB" when-full="REMOVE"/>
</distributed-cache>

JSON

{
  "distributed-cache" : {
    "encoding" : {
      "media-type" : "application/x-protostream"
    },
    "memory" : {
      "max-size" : "1.5GB",
      "when-full" : "REMOVE"
    }
  }
}

YAML

distributedCache:
  encoding:
    mediaType: "application/x-protostream"
  memory:
    maxSize: "1.5GB"
    whenFull: "REMOVE"

ConfigurationBuilder

ConfigurationBuilder builder = new ConfigurationBuilder();
builder.encoding().mediaType("application/x-protostream")
       .memory()
         .maxSize("1.5GB")
         .whenFull(EvictionStrategy.REMOVE);

5.3.4. Manual eviction

If you choose the manual eviction strategy, Data Grid does not perform eviction. You must do so manually with the evict() method.

You should use manual eviction with embedded caches only. For remote caches, you should always configure Data Grid with the REMOVE or EXCEPTION eviction strategy.

Note

This configuration prevents a warning message when you enable passivation but do not configure eviction.

XML

<distributed-cache>
  <memory max-count="500" when-full="MANUAL"/>
</distributed-cache>

JSON

{
  "distributed-cache" : {
    "memory" : {
      "max-count" : "500",
      "when-full" : "MANUAL"
    }
  }
 }

YAML

distributedCache:
  memory:
    maxCount: "500"
    whenFull: "MANUAL"

ConfigurationBuilder

ConfigurationBuilder builder = new ConfigurationBuilder();
builder.encoding().mediaType("application/x-protostream")
       .memory()
         .maxSize("1.5GB")
         .whenFull(EvictionStrategy.REMOVE);

5.3.5. Passivation with eviction

Passivation persists data to cache stores when Data Grid evicts entries. You should always enable eviction if you enable passivation, as in the following examples:

XML

<distributed-cache>
  <persistence passivation="true">
    <!-- Persistent storage configuration. -->
  </persistence>
  <memory max-count="100"/>
</distributed-cache>

JSON

{
  "distributed-cache": {
    "memory" : {
      "max-count" : "100"
    },
    "persistence" : {
      "passivation" : true
    }
  }
}

YAML

distributedCache:
  memory:
    maxCount: "100"
  persistence:
    passivation: "true"

ConfigurationBuilder

ConfigurationBuilder builder = new ConfigurationBuilder();
builder.memory().maxCount(100);
builder.persistence().passivation(true); //Persistent storage configuration

5.4. Expiration with lifespan and maximum idle

Expiration configures Data Grid to remove entries from caches when they reach one of the following time limits:

Lifespan
Sets the maximum amount of time that entries can exist.
Maximum idle
Specifies how long entries can remain idle. If operations do not occur for entries, they become idle.
Important

Maximum idle expiration does not currently support caches with persistent storage.

Note

If you use expiration and eviction with the EXCEPTION eviction strategy, entries that are expired, but not yet removed from the cache, count towards the size of the data container.

5.4.1. How expiration works

When you configure expiration, Data Grid stores keys with metadata that determines when entries expire.

  • Lifespan uses a creation timestamp and the value for the lifespan configuration property.
  • Maximum idle uses a last used timestamp and the value for the max-idle configuration property.

Data Grid checks if lifespan or maximum idle metadata is set and then compares the values with the current time.

If (creation + lifespan < currentTime) or (lastUsed + maxIdle < currentTime) then Data Grid detects that the entry is expired.

Expiration occurs whenever entries are accessed or found by the expiration reaper.

For example, k1 reaches the maximum idle time and a client makes a Cache.get(k1) request. In this case, Data Grid detects that the entry is expired and removes it from the data container. The Cache.get(k1) request returns null.

Data Grid also expires entries from cache stores, but only with lifespan expiration. Maximum idle expiration does not work with cache stores. In the case of cache loaders, Data Grid cannot expire entries because loaders can only read from external storage.

Note

Data Grid adds expiration metadata as long primitive data types to cache entries. This can increase the size of keys by as much as 32 bytes.

5.4.2. Expiration reaper

Data Grid uses a reaper thread that runs periodically to detect and remove expired entries. The expiration reaper ensures that expired entries that are no longer accessed are removed.

The Data Grid ExpirationManager interface handles the expiration reaper and exposes the processExpiration() method.

In some cases, you can disable the expiration reaper and manually expire entries by calling processExpiration(); for instance, if you are using local cache mode with a custom application where a maintenance thread runs periodically.

Important

If you use clustered cache modes, you should never disable the expiration reaper.

Data Grid always uses the expiration reaper when using cache stores. In this case you cannot disable it.

5.4.3. Maximum idle and clustered caches

Because maximum idle expiration relies on the last access time for cache entries, it has some limitations with clustered cache modes.

With lifespan expiration, the creation time for cache entries provides a value that is consistent across clustered caches. For example, the creation time for k1 is always the same on all nodes.

For maximum idle expiration with clustered caches, last access time for entries is not always the same on all nodes. To ensure that entries have the same relative access times across clusters, Data Grid sends touch commands to all owners when keys are accessed.

The touch commands that Data Grid send have the following considerations:

  • Cache.get() requests do not return until all touch commands complete. This synchronous behavior increases latency of client requests.
  • The touch command also updates the "recently accessed" metadata for cache entries on all owners, which Data Grid uses for eviction.
  • With scattered cache mode, Data Grid sends touch commands to all nodes, not just primary and backup owners.

Additional information

  • Maximum idle expiration does not work with invalidation mode.
  • Iteration across a clustered cache can return expired entries that have exceeded the maximum idle time limit. This behavior ensures performance because no remote invocations are performed during the iteration. Also note that iteration does not refresh any expired entries.

5.4.4. Configuring lifespan and maximum idle times for caches

Set lifespan and maximum idle times for all entries in a cache.

Procedure

  1. Open your Data Grid configuration for editing.
  2. Specify the amount of time, in milliseconds, that entries can stay in the cache with the lifespan attribute or lifespan() method.
  3. Specify the amount of time, in milliseconds, that entries can remain idle after last access with the max-idle attribute or maxIdle() method.
  4. Save and close your Data Grid configuration.
Expiration for Data Grid caches

In the following example, Data Grid expires all cache entries after 5 seconds or 1 second after the last access time, whichever happens first:

XML

<replicated-cache>
  <expiration lifespan="5000" max-idle="1000" />
</replicated-cache>

JSON

{
  "replicated-cache" : {
    "expiration" : {
      "lifespan" : "5000",
      "max-idle" : "1000"
    }
  }
}

YAML

replicatedCache:
  expiration:
    lifespan: "5000"
    maxIdle: "1000"

ConfigurationBuilder

ConfigurationBuilder builder = new ConfigurationBuilder();
builder.expiration().lifespan(5000, TimeUnit.MILLISECONDS)
                    .maxIdle(1000, TimeUnit.MILLISECONDS);

5.4.5. Configuring lifespan and maximum idle times per entry

Specify lifespan and maximum idle times for individual entries. When you add lifespan and maximum idle times to entries, those values take priority over expiration configuration for caches.

Note

When you explicitly define lifespan and maximum idle time values for cache entries, Data Grid replicates those values across the cluster along with the cache entries. Likewise, Data Grid writes expiration values along with the entries to persistent storage.

Procedure

  • For remote caches, you can add lifespan and maximum idle times to entries interactively with the Data Grid Console.

    With the Data Grid Command Line Interface (CLI), use the --max-idle= and --ttl= arguments with the put command.

  • For both remote and embedded caches, you can add lifespan and maximum idle times with cache.put() invocations.

    //Lifespan of 5 seconds.
    //Maximum idle time of 1 second.
    cache.put("hello", "world", 5, TimeUnit.SECONDS, 1, TimeUnit.SECONDS);
    
    //Lifespan is disabled with a value of -1.
    //Maximum idle time of 1 second.
    cache.put("hello", "world", -1, TimeUnit.SECONDS, 1, TimeUnit.SECONDS);

5.5. JVM heap and off-heap memory

Data Grid stores cache entries in JVM heap memory by default. You can configure Data Grid to use off-heap storage, which means that your data occupies native memory outside the managed JVM memory space.

The following diagram is a simplified illustration of the memory space for a JVM process where Data Grid is running:

Figure 5.1. JVM memory space

This diagram depicts the JVM memory space divided into heap and off-heap memory.

JVM heap memory

The heap is divided into young and old generations that help keep referenced Java objects and other application data in memory. The GC process reclaims space from unreachable objects, running more frequently on the young generation memory pool.

When Data Grid stores cache entries in JVM heap memory, GC runs can take longer to complete as you start adding data to your caches. Because GC is an intensive process, longer and more frequent runs can degrade application performance.

Off-heap memory

Off-heap memory is native available system memory outside JVM memory management. The JVM memory space diagram shows the Metaspace memory pool that holds class metadata and is allocated from native memory. The diagram also represents a section of native memory that holds Data Grid cache entries.

Off-heap memory:

  • Uses less memory per entry.
  • Improves overall JVM performance by avoiding Garbage Collector (GC) runs.

One disadvantage, however, is that JVM heap dumps do not show entries stored in off-heap memory.

5.5.1. Off-heap data storage

When you add entries to off-heap caches, Data Grid dynamically allocates native memory to your data.

Data Grid hashes the serialized byte[] for each key into buckets that are similar to a standard Java HashMap. Buckets include address pointers that Data Grid uses to locate entries that you store in off-heap memory.

Important

Even though Data Grid stores cache entries in native memory, run-time operations require JVM heap representations of those objects. For instance, cache.get() operations read objects into heap memory before returning. Likewise, state transfer operations hold subsets of objects in heap memory while they take place.

Object equality

Data Grid determines equality of Java objects in off-heap storage using the serialized byte[] representation of each object instead of the object instance.

Data consistency

Data Grid uses an array of locks to protect off-heap address spaces. The number of locks is twice the number of cores and then rounded to the nearest power of two. This ensures that there is an even distribution of ReadWriteLock instances to prevent write operations from blocking read operations.

5.5.2. Configuring off-heap memory

Configure Data Grid to store cache entries in native memory outside the JVM heap space.

Procedure

  1. Open your Data Grid configuration for editing.
  2. Set OFF_HEAP as the value for the storage attribute or storage() method.
  3. Set a boundary for the size of the cache by configuring eviction.
  4. Save and close your Data Grid configuration.
Off-heap storage

Data Grid stores cache entries as bytes in native memory. Eviction happens when there are 100 entries in the data container and Data Grid gets a request to create a new entry:

XML

<replicated-cache>
  <memory storage="OFF_HEAP" max-count="500"/>
</replicated-cache>

JSON

{
  "replicated-cache" : {
    "memory" : {
      "storage" : "OBJECT",
      "max-count" : "500"
    }
  }
}

YAML

replicatedCache:
  memory:
    storage: "OFF_HEAP"
    maxCount: "500"

ConfigurationBuilder

ConfigurationBuilder builder = new ConfigurationBuilder();
builder.memory().storage(StorageType.OFF_HEAP).maxCount(500);