Chapter 1. Configuring cache encoding

Find out how to configure Data Grid caches with different media types and how encoding affects the ways you can use Data Grid.

1.1. Cache encoding

Encoding is the format, identified by a media type, that Data Grid uses to store entries (key/value pairs) in caches.

Remote caches

Data Grid Server stores entries in remote caches with the encoding that is set in the cache configuration.

Hot Rod and REST clients include a media type with each request they make to Data Grid Server. To handle multiple clients making read and write requests with different media types, Data Grid Server converts data on-demand to and from the media type that is set in the cache configuration.

If the remote cache does not have any encoding configuration, Data Grid Server stores keys and values as generic byte[] without any media type information, which can lead to unexpected results when converting data for clients request different formats.

Use ProtoStream encoding

Data Grid Server returns an error when client requests include a media type that it cannot convert to or from the media type that is set in the cache configuration.

Data Grid recommends always configuring cache encoding with the application/x-protostream media type if you want to use multiple clients, such as Data Grid Console or CLI, Hot Rod, or REST. ProtoStream encoding also lets you use server-side tasks and perform indexed queries on remote caches.

Embedded caches

Data Grid stores entries in embedded caches as Plain Old Java Objects (POJOs) by default.

For clustered embedded caches, Data Grid needs to marshall any POJOs to a byte array that can be replicated between nodes and then unmarshalled back into POJOs. This means you must ensure that Data Grid can serialize your POJOs with the ProtoStream marshaller if you do not configure another marshaller.

Note

If you store mutable POJOs in embedded caches, you should always update values using new POJO instances. For example, if you store a HashMap as a key/value pair, the other members of the Data Grid cluster do not see any local modifications to the Map. Additionally, it is possible that a ConcurrentModificationException could occur if the Map instance is updated at the same time that Data Grid is marshalling the object.

Additional resources

1.2. Protobuf cache encoding

Protocol Buffers (Protobuf) is a lightweight binary media type for structured data. As a cache encoding, Protobuf gives you excellent performance as well as interoperability between client applications in different programming languages for both Hot Rod and REST endpoints.

Data Grid uses a ProtoStream library to encode caches as Protobuf with the application/x-protostream media type.

The following example shows a Protobuf message that describes a Person object:

message Person {
    optional int32 id = 1;
    optional string name = 2;
    optional string surname = 3;
    optional Address address = 4;
    repeated PhoneNumber phoneNumbers = 5;
    optional uint32 age = 6;
    enum Gender {
        MALE = 0;
        FEMALE = 1;
    }
}

Interoperability

Because it is language neutral, Protobuf encoding means Data Grid can handle requests from client applications written in Java, C++, C#, Python, Go, and more.

Protobuf also enables clients on different remote endpoints, Hot Rod or REST, to operate on the same data. Because it uses the REST API, you can access and work with Protobuf-encoded caches through Data Grid Console.

Note

You cannot use Data Grid Console with any binary encoding other than application/x-protostream.

You should always use Protobuf cache encoding with the application/x-protostream media type for integration with any Red Hat technology because it allows communication between applications and services.

Queries

Data Grid needs a structured representation of data in caches for fast and reliable queries. To search caches with the Ickle query language, you register Protobuf schema that describe your objects.

Custom types

Data Grid includes an implementation of the ProtoStream API with native support for frequently used types, including String and Integer. If you want to store custom types in your caches, use ProtoStream marshalling to generate and register serialization contexts with Data Grid so that it can marshall your objects.

1.2.1. Encoding caches as ProtoStream

Configure Data Grid to use the ProtoStream library to store cache entries as Protocol Buffers (Protobuf).

Procedure

  • Specify the application/x-protostream media type for keys and values.

Declarative

<distributed-cache>
   <encoding>
      <key media-type="application/x-protostream"/>
      <value media-type="application/x-protostream"/>
   </encoding>
</distributed-cache>

Programmatic

//Create cache configuration that encodes keys and values as ProtoStream
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.clustering().cacheMode(CacheMode.DIST_SYNC)
       .encoding().key().mediaType("application/x-protostream")
       .encoding().value().mediaType("application/x-protostream");

Alternatively you can use the same encoding for keys and values:

Declarative

<encoding media-type="application/x-protostream"/>

Programmatic

.encoding().mediaType("application/x-protostream");

1.3. Text-based cache encoding

Text-based encoding is human-readable content such as plain text. The classic "Hello World" example entry could be stored in a cache as follows:

key=hello
value=world

If you encode caches with the text/plain media type, Data Grid can convert to and from the following media types:

  • application/xml
  • application/json
  • application/x-protostream

The following example configuration encodes keys and values with the text/plain; charset=UTF-8 media type:

<distributed-cache>
   <encoding>
      <key media-type="text/plain; charset=UTF-8"/>
      <value media-type="text/plain; charset=UTF-8"/>
   </encoding>
</distributed-cache>

1.3.1. Clients and text-based encoding

If you configure encoding to store keys and values with a text-based media type, then you also need to configure clients to operate on those caches.

Hot Rod clients

Data Grid uses the ProtoStream library to handle String and byte[] types natively. If you configure cache encoding with the text/plain media type, Hot Rod clients might not necessarily require any marshaller configuration to perform cache operations.

For other text-based media types, such as JSON or XML, Hot Rod clients can use the org.infinispan.commons.marshall.UTF8StringMarshaller marshaller that converts to and from the text/plain media type.

REST clients

REST clients must include the media type for caches in the request headers.

For example if you configure cache encoding as text/plain; charset=UTF-8 then REST clients should send the following headers:

  • Accept: text/plain; charset=UTF-8 for read operations.
  • Content-Type: text/plain; charset=UTF-8 or Key-Content-Type: text/plain; charset=UTF-8 for write operations.

1.4. Marshalled Java objects

Data Grid stores marshalled Java objects in caches as byte arrays. For example, the following is a simple representation of a Person object stored as a value in memory:

value=[61 6c 61 6e 0a 70 61 72 74 72 69 64 67 65]

To store marshalled objects in caches, you should use the ProtoStream marshaller unless a strict requirement exists. For example, when migrating client applications from older versions of Data Grid, you might need to temporarily use JBoss marshalling with your Hot Rod Java clients.

Data Grid stores marshalled Java objects as byte arrays with the following media types:

  • application/x-protostream
  • application/x-jboss-marshalling
  • application/x-java-serialized-object
Note

When storing unmarshalled Java objects, Data Grid uses the object implementation of equals() and hashCode(). When storing marshalled objects, the marshalled bytes are compared for equality and hashed instead.

1.4.1. Clients and marshalled objects

When you configure Hot Rod Java clients to use a marshaller, you must configure your cache with the encoding for that marshaller.

Each marshaller uses a different media type to produce byte[] content that the client can transmit to Data Grid Server. When reading from the server, the client marshaller performs the opposite operation, using the media type to produce data from byte[] content.

Your cache encoding must be compatible with the Hot Rod client marshaller. For example, if you configure a cache encoding as application/x-protostream, you can use the ProtoStream marshaller with your clients to operate on that cache. However if the client marshaller uses an encoding that Data Grid cannot convert to and from application/x-protostream, Data Grid throws an error message.

If you use JavaSerializationMarshaller or GenericJBossMarshaller you should encode caches with the application/x-java-serialized-object or application/x-jboss-marshalling media type, respectively.

ProtoStream to JSON conversion

Data Grid converts keys and values encoded with the application/x-protostream media type to application/json.

This allows REST clients to include the JSON media type in request headers and perform operations on caches that use ProtoStream encoding:

  • Accept: application/json for read operations.
  • Content-Type: application/json for write operations.

1.5. Plain Old Java Objects (POJO)

For best performance, Data Grid recommends storing unmarshalled POJOs in embedded caches only. However, you can configure keys and values with the following media type:

  • application/x-java-object

1.5.1. Clients and POJOs

Even though Data Grid does not recommend doing so, clients can operate on caches that store unmarshalled POJOs with the application/x-java-object media type.

Hot Rod clients

Hot Rod client marshallers must be available to Data Grid Server so it can deserialize your Java objects. By default, the ProtoStream and Java Serialization marshallers are available on the server.

REST clients

REST clients must use either JSON or XML for keys and values so Data Grid can convert to and from POJOs.

Note

Data Grid requires you to add Java classes to the deserialization allowlist to convert XML to and from POJOs.

1.6. Adding JARs to Data Grid Server installations

Make custom JAR files available to Data Grid Server by adding them to the classpath.

Important
  • Data Grid loads JAR files during startup only.

    You should bring all nodes in the cluster down gracefully and make any JAR files available to each node before bringing the cluster back up.

  • You should add custom JAR files to the $RHDG_HOME/server/lib directory only.

    The $RHDG_HOME/lib directory is reserved for Data Grid JAR files.

Procedure

  1. Stop Data Grid Server if it is running.
  2. Add JAR files to the server/lib directory, for example:

    ├── server
    │   └── lib
    │       └── UserObjects.jar