Red Hat Training

A Red Hat training course is available for Red Hat JBoss Data Virtualization

15.4. Custom Connectors

15.4.1. The Connector Framework

A connector is actually a plain old Java object (POJO). To create a connector, create a Java class that extends one of the following abstract classes:
  • ReadOnlyConnector - extend this class when the hierarchical database clients will never be able to manipulate, create or remove any content exposed by the connector.
  • WritableConnector - extend this class when the hierarchical database clients may be able to manipulate, create and/or remove content exposed by the connector. Note that each time this connector is configured, it can still be made to be read-only.
A connector operates by accessing an external system and dynamically creating nodes that represent information in that external system. The nodes must form a single tree, although how that tree is structured and what the nodes actually look like is completely up to the connector implementation.

15.4.2. Documents

While a connector conceptually exposes nodes, technically it exchanges representations of nodes (and other information, like sublists of children). These representations take the form of Java Document objects that are semantically like JSON and BSON documents. The connector SPI does this for a number of reasons:
  • Firstly, the hierarchical database actually stores its own internal (non-federated) nodes as Documents, so connectors are actually working with the same kind of internal Document instances that the hierarchical database uses.
  • Secondly, a Document is easily converted to and from JSON (and BSON), making it potentially very easy to write a connector that accesses a remote system.
  • Thirdly, constructs other than nodes can be represented as documents; for example, a connector can be pageable, meaning it breaks the list of child node references into multiple pages that are read with separate requests, allowing the connector to efficiently expose large numbers of children under a single node.
  • Finally, the node's identifier, properties, child node references, and other information specific to the hierarchical database are stored in specific fields within a Document, but additional fields can be used by the connector and hidden from hierarchical database clients. Though this makes little sense for a read-only connector, a writable connector might include such hidden fields when reading nodes so that when the document comes back to the connector those hidden fields are still available.
Before studying the Documents you shall study the methods your connector implementation will need to implement.

15.4.3. Read Only Connector

The following code fragment shows the methods that a ReadOnlyConnector subclass must implement.
package org.modeshape.jcr.federation.spi;

import java.io.IOException;
import java.util.Collection;
import javax.jcr.NamespaceRegistry;
import javax.jcr.RepositoryException;
import org.infinispan.schematic.document.Document;
import org.modeshape.jcr.api.nodetype.NodeTypeManager;

public abstract class ReadOnlyConnector extends Connector {

    ...

   /**
     * Initialize the connector. This is called automatically by ModeShape once for each Connector instance,
     * and should not be called by the connector. By the time this method is called, ModeShape will have
     * already set the {{ExecutionContext}}, {{Logger}}, connector name, repository name {@link #context},
     * and any fields that match configuration properties for the connector.
     *
     * By default this method does nothing, so it should be overridden by implementations to do a one-time
     * initialization of any internal components. For example, connectors can use the supplied {{registry}}
     * and {{nodeTypeManager}} parameters to register custom namespaces and node types used by the exposed nodes.
     *
     * This is also an excellent place for connector to validate the connector-specific fields set by ModeShape  
     * via reflection during instantiation.
     *
     * @param registry the namespace registry that can be used to register custom namespaces; never null
     * @param nodeTypeManager the node type manager that can be used to register custom node types; never null
     * @throws RepositoryException if operations on the {@link NamespaceRegistry} or {@link NodeTypeManager} fail
     * @throws IOException if any stream based operations fail (like importing cnd files)
     */
    public void initialize( NamespaceRegistry registry,
                            NodeTypeManager nodeTypeManager ) throws RepositoryException, IOException {
    }

    /**
     * Returns the id of an external node located at the given external path within the connector's
     * exposed tree of content.
     *
     * @param externalPath a non-null string representing an external path, or "/" for the top-level
     *        node exposed by the connector
     * @return either the id of the document or null
     */
    public abstract String getDocumentId( String externalPath );

    /**
     * Returns a Document instance representing the document with a given id. The document should have
     * a "proper" structure for it to be usable by ModeShape.
     *
     * @param id a {@code non-null} string
     * @return either an {@link Document} instance or {@code null}
     */
    public abstract Document getDocumentById( String id );

    /**
     * Return the path(s) of the external node with the given identifier. The resulting paths are from the
     * point of view of the connector. For example, the "root" node exposed by the connector wil have a
     * path of "/".
     *
     * @param id a null-null string
     * @return the connector-specific path(s) of the node, or an empty document if there is no such
     * document; never null
     */
    public abstract Collection<String> getDocumentPathsById( String id );

    /**
     * Checks if a document with the given id exists in the end-source.
     *
     * @param id a non-null string.
     * @return {{true}} if such a document exists, {{false}} otherwise.
     */
    public abstract boolean hasDocument( String id );

    ...
}
Not shown are fields, getters, and other implemented methods that your methods will almost certainly use. For example, a Document is a read-only representation of a JSON document, and they can be created by calling the newDocument(id) method with the document's identifier, using the resulting DocumentWriter to set/remove/add fields (and nested documents), and calling the writer's document() method to obtain the read-only Document instance.
The DocumentWriter interface provides dozens of methods for getting and setting node properties and child node references. Here's some code that uses a document writer to construct a node representation with a few properties:
   String id = ...
   DocumentWriter writer = newDocument(id);
   writer.setPrimaryType("lib:book");
   writer.addMixinType("lib:tagged");
   writer.addProperty("lib:isbn, "0486280616");
   writer.addProperty("lib:format, "paperback");
   writer.addProperty("lib:author", "Mark Twain");
   writer.addProperty("lib:title", "The Adventures of Huckleberry Finn");
   writer.addProperty("lib:tags", "fiction", "classic", "americana");
   // Add a single child named 'tableOfContents' with its own identifier
   writer.addChild(id + "/toc","tableOfContents");
   Document doc = writer.document();
As you can see, creating documents is pretty straightforward.
Identifiers of documents are simple strings that are expected to uniquely and durably identify a document. However, the content of that string is entirely up to the connector implementations. If the external system already has the notion of unique identifiers, it might be easiest to reuse a string representation of those identifiers. For example, a database might have a unique key within a given table, whereas a Git repository uses SHA-1 hashes for identifiers of commits, branches, tags, etc. Some external systems (like file systems) do not have a concept of unique identifiers, and in such cases the connector should devise its own identifier mechanism is durable and reliable.

15.4.4. Properties, Paths, Names, and Values

Most of the time, you can use string property names and property values that are String, Calendar, URL, or Numeric instances, and the hierarchical database will convert to an internal object representation. However, the hierarchical database provides object definitions of JCR names, paths, values, and properties. These classes are often much easier to work with than the String names and paths, and they're easy to create using namespace-aware factories. The ValueFactories interface is a container for type-specific factories accessible with various getter methods. Here's an example of creating a Path value from a string and then using the Path methods to get at the already-parsed segments of the path:
   String str = "/a/b/c/cust:d";
   PathFactory pathFactory = factories().getPathFactory();
   Path path = pathFactory.create(str);
   for ( Segment segment : path ) {
       Name name = segment.getName();
       String localName = name.getLocalName();
       String namespaceUri = name.getNamespaceUri();
       if ( segment.hasIndex() ) {
           String snsIndex = segment.getIndex();
       }
   }
   Path parentPath = path.getParent();
   ...
The process of using a factory to create Name , Binary , DateTime , and all other JCR-compliant values is similar.
Properties are slightly different, since they are a bit more structured. The hierarchical database provides a PropertyFactory that can create single- or multi-valued Property instances given a name and one or more values. Here's some simple code that shows how to create a single-valued property:
    PropertyFactory propFactory = propertyFactory();
    Name propName = nameFactory().create("lib:title");
    String propValue = factories().stringFactory("The Adventures of Huckleberry Finn");
    Property prop = propFactory.create(propName,propValue);
All Property , Name , Path , DateTime , and Binary instances are immutable, meaning you can pass them around without worrying about whether the receiver might modify them. Also, the factories will often pick implementation classes that are tailored for the specific value. For example, there are separate implementations for the root path, single-segment paths, paths created from a parent path, single-valued properties, empty properties, and multi-valued properties.

15.4.5. Writable Connector

The following code fragment shows the methods that a WritableConnector subclass must implement.
package org.modeshape.jcr.federation.spi;

import java.io.IOException;
import java.util.Collection;
import javax.jcr.NamespaceRegistry;
import javax.jcr.RepositoryException;
import org.infinispan.schematic.document.Document;
import org.modeshape.jcr.api.nodetype.NodeTypeManager;

public abstract class WritableConnector extends Connector {

    ...

   /**
     * Initialize the connector. This is called automatically by ModeShape once for each Connector instance,
     * and should not be called by the connector. By the time this method is called, ModeShape will have
     * already set the {{ExecutionContext}}, {{Logger}}, connector name, repository name {@link #context},
     * and any fields that match configuration properties for the connector.
     *
     * By default this method does nothing, so it should be overridden by implementations to do a one-time
     * initialization of any internal components. For example, connectors can use the supplied {{registry}}
     * and {{nodeTypeManager}} parameters to register custom namespaces and node types used by the exposed nodes.
     *
     * This is also an excellent place for connector to validate the connector-specific fields set by ModeShape  
     * via reflection during instantiation.
     *
     * @param registry the namespace registry that can be used to register custom namespaces; never null
     * @param nodeTypeManager the node type manager that can be used to register custom node types; never null
     * @throws RepositoryException if operations on the {@link NamespaceRegistry} or {@link NodeTypeManager} fail
     * @throws IOException if any stream based operations fail (like importing cnd files)
     */
    public void initialize( NamespaceRegistry registry,
                            NodeTypeManager nodeTypeManager ) throws RepositoryException, IOException {
    }

    /**
     * Returns the id of an external node located at the given external path within the connector's
     * exposed tree of content.
     *
     * @param externalPath a non-null string representing an external path, or "/" for the top-level
     *        node exposed by the connector
     * @return either the id of the document or null
     */
    public abstract String getDocumentId( String externalPath );

    /**
     * Returns a Document instance representing the document with a given id. The document should have
     * a "proper" structure for it to be usable by ModeShape.
     *
     * @param id a {@code non-null} string
     * @return either an {@link Document} instance or {@code null}
     */
    public abstract Document getDocumentById( String id );

    /**
     * Return the path(s) of the external node with the given identifier. The resulting paths are
     * from the point of view of the connector. For example, the "root" node exposed by the connector
     * will have a path of "/".
     *
     * @param id a null-null string
     * @return the connector-specific path(s) of the node, or an empty document if there is no such
     *         document; never null
     */
    public abstract Collection<String> getDocumentPathsById( String id );

    /**
     * Checks if a document with the given id exists in the end-source.
     *
     * @param id a non-null string.
     * @return {{true}} if such a document exists, {{false}} otherwise.
     */
    public abstract boolean hasDocument( String id );

    /**
     * Removes the document with the given id.
     *
     * @param id a non-null string.
     * @return {{true}} if the document was removed, or {{false}} if there was no document with the
     *         given id
     */
    public abstract boolean removeDocument( String id );

    /**
     * Stores the given document.
     *
     * @param document a non-null Document instance.
     * @throws DocumentAlreadyExistsException if there is already a new document with the same identifier
     * @throws DocumentNotFoundException if one of the modified documents was removed by another session
     */
    public abstract void storeDocument( Document document );

    /**
     * Updates a document using the provided changes.
     *
     * @param documentChanges a non-null DocumentChanges object which contains
     *        granular information about all the changes.
     */
    public abstract void updateDocument( DocumentChanges documentChanges );

    /**
     * Generates an identifier which will be assigned when a new document (aka. child) is created under an
     * existing document (aka.parent). This method should be implemented only by connectors which support
     * writing.
     *
     * @param parentId a non-null string which represents the identifier of the parent under which the new
     *        document will be created.
     * @param newDocumentName a non-null Name which represents the name that will be given
     *        to the child document
     * @param newDocumentPrimaryType a non-null Name which represents the child document's
     *        primary type.
     * @return either a non-null string which will be assigned as the new identifier, or null which means
     *         that no "special" id format is required. In this last case, the repository will
     *         auto-generate a random id.
     * @throws org.modeshape.jcr.cache.DocumentStoreException if the connector is readonly.
     */
    public abstract String newDocumentId( String parentId,
                                          Name newDocumentName,
                                          Name newDocumentPrimaryType );
    ...
}
A WritableConnector has to implement all of the read-related methods that a ReadOnlyConnector must implement and a handful of write-related methods for removing, updating, and storing new documents (nodes).
In the same way that the hierarchical database provides a DocumentWriter, there is also a DocumentReader that has methods to easily read properties, primary type, mixin types, and child references:
    Document doc = ...
    DocumentReader reader = readDocument(doc);
    String id = reader.getDocumentId();
    String primaryType = reader.getPrimaryTypeName();
    Map<Name, Property> properties = reader.getProperties();
    // Get the ordered list of child references ...
    LinkedHashMap<String,Name> childReferences = reader.getChildrenMap();
    for ( Map<String,Name>.Entry childRef : childReferences.entrySet() ) {
        String key = childRef.getKey();
        String name = childRef.getValue();
    }

15.4.6. Extra Properties

The hierarchical database provides a framework for storing "extra properties" that cannot be stored in the external system. For example, the "file system connector" cannot naturally map arbitrary properties to file attributes, and instead uses a variety of techniques to stores these extra properties.
By default, the hierarchical database can store the extra properties inside the same Infinispan cache where the repository's own internal (non-federated) content is stored. However, this may not be ideal, and so a connector can provide its own implementation of the ExtraPropertiesStore interface:
package org.modeshape.jcr.federation.spi;

/**
 * Store for extra properties, which a {@link Connector} implementation can use to store and retrieve
 * "extra" properties on a node that cannot be persisted in the external system. Generally, a connector
 * should store as much as possible in the external system. However, not all systems are capable of
 * persisting any and all properties that a JCR client may put on a node. In such cases, the connector
 * can store these "extra" properties (that it does not persist) in this properties store.
 */
public interface ExtraPropertiesStore {

    static final Map<Name, Property> NO_PROPERTIES = Collections.emptyMap();

    /**
     * Store the supplied extra properties for the node with the supplied ID. This will overwrite any properties
     * that were previously stored for the node with the specified ID.
     *
     * @param id the identifier for the node; may not be null
     * @param properties the extra properties for the node that should be stored in this storage area, keyed by
     *        their name
     */
    void storeProperties( String id,
                          Map<Name, Property> properties );

    /**
     * Update the supplied extra properties for the node with the supplied ID.
     *
     * @param id the identifier for the node; may not be null
     * @param properties the extra properties for the node that should be stored in this storage area, keyed by
     *        their name; any entry that contains a null Property will define a property that should be removed
     */
    void updateProperties( String id,
                           Map<Name, Property> properties );

    /**
     * Retrieve the extra properties that were stored for the node with the supplied ID.
     *
     * @param id the identifier for the node; may not be null
     * @return the map of properties keyed by their name; may be empty, but never null
     */
    Map<Name, Property> getProperties( String id );

    /**
     * Remove all of the extra properties that were stored for the node with the supplied ID.
     *
     * @param id the identifier for the node; may not be null
     * @return true if there were properties stored for the node and now removed, or false if there were no extra
     *         properties stored for the node with the supplied key
     */
    boolean removeProperties( String id );
}
Then to use the extra properties store, simple call in the connector's initialize(...) method the setEtraPropertiesStore(ExtraPropertiesStore store) method with an instance of your custom store. Then, in your store(Document) and update(Document) methods, record these extra properties. There are multiple ways of doing this, but here are a few:
    ExtraProperties extraProperties = extraPropertiesFor(id, false);
    // Add a single property ...
    Property p1 = ...
    extraProperties.add(p1);
    // Or add multiple properties at once ...
    Map<Name,Property> properties = ...
    extraProperties.addAll(properties).except("jcr:primaryType", "jcr:created");
    extraProperties.save();

15.4.7. Pageable Connectors

A Document that represents a node will contain references to all the children of that node. These references are relatively small (the ID and name of the child), and for many connectors this is sufficient and fast enough. However, when the number of children under a node starts to increase, building the list of child references for a parent node can become noticeable and even burdensome, especially when few (if any) of the child references may ultimately be resolved into their node representations.
A pageable connector is one that exposes the children of nodes in a "page by page" fashion, where the parent node only contains the first page of child references and subsequent pages are loaded only if needed. This turns out to be quite effective, since when clients navigate a specific path (or ask for a specific child of a parent by its name) the hierarchical database does not need to use the child references in a node's document and can instead have the connector resolve such (relative or absolute external) paths into an identifier and then ask for the document with that ID.
Therefore, the only time the child references are needed are when clients iterate over the children of a node. A pageable connector will only be asked for as many pages as needed to handle the client's iteration, making it very efficient for exposing a node structure that can contain nodes with numerous children.
To make your ReadOnlyConnector or WritableConnector support paging, implement the Pageable interface:
package org.modeshape.jcr.federation.spi;

public interface Pageable {

    /**
     * Return a document which represents a page of children. The document for the parent node
     * should include as many children as desired, and then include a reference to the next
     * page of children with the {{PageWriter#addPage(String, String, long, long)}} method.
     * Each page returned by this method should also include a reference to the next page.
     *
     * @param pageKey a non-null {@link PageKey} instance, which offers information about the
     *                page that should be retrieved.
     * @return either a non-null page document or {@code null} indicating that such a page
     *         doesn't exist
     */
    Document getChildren( PageKey pageKey );
}
The hierarchical database then knows that the document for the parent will contain only some of the children and how to access each page of children as needed.
For example, here's an example of code that might be used in a connector's getDocumentById(...) method to include some of the children in the parent node's document and to include a reference to a second page of children. This uses an imaginary Book class that is presumed to represent information about a book in a library:
   String id = "category/Americana";
   DocumentWriter writer = newDocument(id);
   writer.setPrimaryType("lib:category");
   writer.addProperty("lib:description", "Classic American literature");
   // Get the books in this category ...
   Collection<Book> books = getBooksInCategory("Americana");
   // Put just 20 in this document ...
   count = 0;
   for ( Book book : books ) {
       writer.addChild(book.getId(),book.getTitle());
       if ( ++count == 20 ) break;
   }
   if ( count == 20 ) {
       // There were more than 20 books, so add a reference to the next page
       // that starts with the 20th book ...
       writer.addPage(id, 20, 20, books.size());
   }
   Document doc = writer.document();
Then, the connector's getPage(...) method would implement getting the child references for a particular page:
public Document getPage( PageKey pageKey ) {
   String parentId = pageKey.getParentId();
   int offset = pageKey.getOffsetInt();
   String category = parentId.substring(9);  // we assume this is "category/{categoryName}"
   DocumentWriter writer = newDocument(parentId);
   // Get the next 20 books in this category plus one so we know there are more ...
   List<Book> books = getBooksInCategory("Americana").sublist(offset,offset+20+1); // no error checking here!
   for ( Book book : books ) {
       writer.addChild(book.getId(),book.getTitle());
       if ( ++count == 20 ) break;
   }
   if ( count == 20 ) {
       // There were more than 20 books, so add a reference to the next page
       // that starts with the 20th book ...
       writer.addPage(id, 20, 20, books.size());
   }
   Document doc = writer.document();
}
As you can see, the logic of getPage(...) is actually very similar to the logic that adds children in the getDocumentById(...) method, and your connector might find it useful to abstract this into a single helper method.