Red Hat Training

A Red Hat training course is available for Red Hat JBoss Data Virtualization

3.3. Architecture

3.3.1. The Hierarchical Database Engine

Perhaps the most important component in the hierarchical database is the engine, which is responsible for managing and making available all of the configured repositories. When the database is embedded into an application, the application is better of manually instantiating the org.modeshape.jcr.ModeShapeEngine class and explicitly invoking the start() , deployRepository(...) and shutdown() methods in appropriate places within the application's own lifecycle. Note that repository configurations can be updated even when the repository is running and in use. The hierarchical database can also be deployed to a server (e.g., JBoss EAP, Tomcat, etc.) so that the server manages the lifecycle of the engine.
Every repository in a ModeShapeEngine instance has a unique name, and applications can easily use the engine to get a particular repository by name. If used within an environment that has JNDI, the hierarchical database will also register each repository into JNDI so that applications can easily look it up. See the documentation for all the ways to find a repository.

3.3.2. Repository Configuration

Each repository is configured separately with a file that conforms to the JSON format. (Note that when installed into JBoss EAP, configuring the hierarchical database is done through EAP's configuration system.) The configuration files can be read with the org.modeshape.jcr.RepositoryConfiguration class, and the resulting RepositoryConfiguration instances can be passed to the ModeShapeEngine.deployRepository(...) and ModeShapeEngine.updateRepository(...) methods.

3.3.3. Clustering

The hierarchical database can be clustered at the repository level. This means that a repository with the same name is deployed to multiple engines (typically in separate processes), and those repository instances are aware of each other so that events that originate in one repository instance will be forwarded to all other repository instances in the cluster. Additionally, the Infinispan cache(s) used in each repository should also be clustered, so that Infinispan can coordinate changes to the data stored in the cache(s).
There are two other important aspects of clustering: storage and indexing.

3.3.4. Clustering: Storage

If the Infinispan caches use cache stores to persist content to the filesystem, a database, cloud storage and so forth then this storage must be compatible with clustering. For example, if the cache store content on the file system, then the cache used by each repository instance must have its own non-shared directory in which the cache can persist information. (Infinispan clustering will use network messaging to ensure that multiple instances that "own" a particular piece of data are all kept in sync.) Some of Red Hat JBoss Data Grid's cache stores are sharable , which means that multiple instances can all share a single store.

3.3.5. Clustering: Indexing

Each repository instance uses indexes to help answer queries. When clustering a repository, the repository has to know whether it owns the indexes (in which case the repository will update the indexes to reflect all changes that originate from the local or remote repository instances) or whether indexes are shared (in which case the repository will update the indexes only when changes that are made with that repository instance). Note that even in the shared case, the index files might be local copies that are periodically cloned from a master set.
Local indexes are much easier to configure, but the disadvantage is that every repository is hereby updating its own indexes for every change (so there is duplicate work). This might cause a write-heavy system to become inundated with changes.
Shareable indexes are more difficult to configure (they require the use and proper configuration of JMS and/or JGroups), but are generally more capable of handling large amounts of updates.

3.3.6. Public APIs

  • javax.jcr - This is the standard JCR 2.0 API, and it actually is not in our codebase but is available in Maven. It has no dependencies.
  • modeshape-jcr-api - the hierarchical database's small extension to the standard JCR 2.0 API. This public API was meant to be used by client applications that already use the JCR API, but it is entirely optional. Many of the interfaces extend the functionality offered by standard interfaces, so most of the time clients can cast standard JCR instances to these interfaces only when they need a method specific to the hierarchical database. A few interfaces are new concepts that clients might need to access. It only depends on the JCR API JAR. Note that the public API will only ever be modified in a backward-compatible fashion: while some methods might be deprecated at any time (though we do not anticipate doing so), changes that are not backward compatible (e.g., removal of deprecated methods) will only occur on major releases. This module also defines the Sequencer SPI, since sequencer implementations only need the JCR API and this public API.

3.3.7. Sequencers

All of the sequencer artifacts are named in a similar way: modeshape-sequencer-name . For example, the DDL sequencer is in the modeshape-sequencer-ddl module, while the WSDL sequencer is in the modeshape-sequencer-wsdl module.
The use of sequencers in a repository is entirely optional. And because nearly all of the sequencers depend upon third-party libraries, we've put each sequencer into a separate artifact so that only the required dependencies are included.

3.3.8. Core Modules

  • modeshape-common - A simple set of domain-independent utilities and classes that are available for use in any other module. Some of these might be similar to those available in other third-party libraries, but were create and are maintained here to help minimize third-party dependencies (especially when small fractions of the third party libraries would be used). This includes the hierarchical database's framework for internationalization (I18n) and the logging framework that is a slight facade on top of several other logging systems, including SLF4J, Log4J, Logback, JDK logging. Sure, SLF4J is already a logging abstraction framework, but using our own abstraction makes it easier for developers to hook up the hierarchical database logging to their preferred framework (include the appropriate logging JAR on the classpath, or fallback to JDK logging) and it also allows the hierarchical database to enforce using only internationalized logging messages (except for debug and trace, which take string messages). Therefore, this module has no required dependencies, but will use one of the logging frameworks if they are available on the classpath.
  • modeshape-schematic - A library for working with JSON and BSON documents, for storing them inside Infinispan, and for editing them in a way that allows for the changes to be recorded as a set of changes to the documents and atomically apply them. (The latter is what distinguishes this library from other JSON or BSON libraries.) Supports reading a document from JSON and/or BSON, and writing a document to JSON and/or BSON. The hierarchical database stores each node as a document inside Infinispan, and this library encapsulates all of the domain-independent logic for doing this. The module depends on several Infinispan artifacts.
  • modeshape-jcr - The primary module that contains the hierarchical database engine and implementations of the standard JCR API and the hierarchical database's public API. It also defines several SPIs, including the Connector SPI (for federation) and the BinaryStore SPI (for storing binary values). It contains the file system connector and CND sequencer (since neither is dependent upon any other libraries and thus are too simple to be distinct artifacts).

3.3.9. Connectors

All of the connector artifacts are named in a similar way: modeshape-connector-name . For example, the Git connector is in the modeshape-connector-git module, while the CMIS connector is in the modeshape-connector-cmis module.
The use of federation (and thus connectors) in a repository is entirely optional. And because nearly all of the connectors depend upon third-party libraries, we've put each connector into a separate artifact so that only the required dependencies are included.

3.3.10. Web APIs

The hierarchical database has a number of web-based APIs that may optionally be used by remote clients to interact with one or more repositories.
  • REST Service - a RESTful service that enables navigating, searching, modifying and deleting nearly any content in the repositories (see the detailed API documentation in the REST Service 3.x section). All representations are in JSON, XML or text form. Each operation creates a new session, fulfills the request, and then closes the session; sessions longer than a single request are not possible. Versioned content can be manipulated: if it is changed, it is checked out, modified, saved, and checked back in. However, the rest of the JCR functionality is not available. The WAR file is named modeshape-web-jcr-rest-war-<version>.war .
  • WebDAV Service - exposes content via WebDAV, enabling WebDAV clients and operating systems to mount the repositories as network disk drives. This service exposes a small amount of the hierarchical database's functionality, and allows clients to basically navigate, download, and upload files and folders. The WAR file is named modeshape-web-jcr-webdav-war-<version>.war .
  • CMIS Service - exposes an API that conforms to CMIS . The CMIS functionality exposes the ability to navigate, download, and upload folders and CMIS documents. The WAR file is named modeshape-web-jcr-cmis-war-<version>.war .
Each of these services can be independently deployed to a web or application server and in which the hierarchical database must be running. Each service talks to a single (local) ModeShapeEngine instance (typically found via JNDI) and will work with all of the repositories deployed to that engine.

3.3.11. JDBC Driver

The hierarchical database supports several query languages to allow client applications to find content independent of its hierarchical location. The JCR-SQL2 language is by far the most powerful, and the hierarchical database provides a JDBC driver that applications can use to query a repository (running in the same process or in a remote process where the REST service is available). The driver JAR is self-contained, making it pretty easy to incorporate into existing JDBC-aware applications.