Red Hat Training

A Red Hat training course is available for Red Hat JBoss Data Virtualization

Appendix C. Binary Values

The hierarchical database is now capable of handling binary values that are larger than available memory. This is because it never loads the whole value onto the heap, but instead streams the value to and from the persistent store. You can also configure where the hierarchical database stores the binary values independently of where the rest of the content is stored.
The hierarchical database stores all binary content by its SHA-1 hash. The SHA-1 cryptographic hash function is not used for security purposes, but is instead used because the SHA-1 can reliably be determined entirely from the content itself, and because two binary contents will only have the same SHA-1 if they are indeed identical. Thus, the SHA-1 hash of some binary content serves as an excellent key for storing and referencing that content.
Using the SHA-1 hash as the identifier for the binary content also means that the hierarchical database never needs to store a given binary content more than once, no matter how many nodes or properties refer to it. It also means that if your JCR client already knows (or can compute) the SHA-1 of a large value, the JCR client can use APIs specific to the hierarchical database to easily determine if that value has already been stored in the repository.

C.1. Extended Binary Interface

The hierarchical database public API defines the org.modeshape.jcr.api.Binary interface as a simple extension to the standard javax.jcr.Binary interface. The hierarchical database adds useful methods to get the SHA-1 hash (as a binary array and as a hexadecimal string) and the MIME type for the content:
@Immutable
public interface Binary extends javax.jcr.Binary {

    /**
     * Get the SHA-1 hash of the contents. This hash can be used to determine whether two
     * Binary instances contain the same content.
     *
     * Repeatedly calling this method should generally be efficient, as it most implementations
     * will compute the hash only once.
     *
     * @return the hash of the contents as a byte array, or an empty array if the hash could
     * not be computed.
     * @see #getHexHash()
     */
    byte[] getHash();

    /**
     * Get the hexadecimal form of the SHA-1 hash of the contents. This hash can be used to
     * determine whether two Binary instances contain the same content.
     *
     * Repeatedly calling this method should generally be efficient, as it most implementations
     * will compute the hash only once.
     *
     * @return the hexadecimal form of the getHash(), or a null string if the hash could
     * not be computed or is not known
     * @see #getHash()
     */
    String getHexHash();

    /**
     * Get the MIME type for this binary value.
     *
     * @return the MIME type, or null if it cannot be determined (e.g., the Binary is empty)
     * @throws IOException if there is a problem reading the binary content
     * @throws RepositoryException if an error occurs.
     */
    String getMimeType() throws IOException, RepositoryException;

    /**
     * Get the MIME type for this binary value.
     *
     * @param name the name of the binary value, useful in helping to determine the MIME type
     * @return the MIME type, or null if it cannot be determined (e.g., the Binary is empty)
     * @throws IOException if there is a problem reading the binary content
     * @throws RepositoryException if an error occurs.
     */
    String getMimeType( String name ) throws IOException, RepositoryException;
}
All javax.jcr.Binary values returned will implement this public interface, so you can cast the values to gain access to the additional methods.