Red Hat Training

A Red Hat training course is available for Red Hat JBoss Data Virtualization

Chapter 10. Built-in Connectors

The hierarchical database comes with several connectors so that you can set up repositories that federate data from external systems.

10.1. File System Connector

This connector exposes files and folders on the file system as nt:file and nt:folder nodes in the repository. To use, configure an external source for a given file system (or area of the repository); each external source can be set up as read-only (to only expose the file system's existing files and folders) or as writable (to allow JCR clients to create/update/delete files and folders on the file system).
The File System Connector maps nt:file and nt:folder properties directly to the attributes on the file system's files and folders. By default, the hierarchical database will store these extra properties in the same Infinispan cache where the normal content is stored, though such content will be lost if files and folders are moved or renamed outside of the hierarchical database. Several other options are possible, including storing these extra properties on the file system using "sidecar" files that are named similarly to and stored adjacent to the target file or folder. See the extraPropertiesStorage attribute description below for more detail.
The connector does not currently monitor the file system for newly created files or folders, and therefore no events are created. However, navigation will always expose the current files/folder nodes within a folder. The hierarchical database can index the content so that the projected nt:file , nt:folder , and nt:resource nodes can be queried, but this must be done manually via the Workspace API's reindex methods.

Note

As of this release, the file system connector is pageable , which means it can efficiently expose folders that contain large numbers of items. Paging is a tradeoff between loading the parent node faster (by having smaller numbers of child references) and having to go back to the connector more frequently. By default, the connector includes only 20 items per page, so the page size can be adjusted to best suit your application's needs.
The connector classname is org.modeshape.connector.filesystem.FileSystemConnector, and there are several attributes that should be configured on each external source:
Attribute Name
Description
directoryPath
The path to the file or folder that is to be accessed by this connector.
extraPropertyStorage
An optional string flag that specifies how this source handles "extra" properties that are not stored via file system attributes. The value should be one of the following:
  • store - Any extra properties are stored in the same Infinispan cache where the content is stored. This is the default and is used if the actual value does not match any of the other accepted values.
  • json - Any extra properties are stored in a JSON file next to the file or directory.
  • legacy - Any extra properties are stored in a file next to the file or directory. This is generally discouraged unless you were using a previous version of the hierarchical database and have a directory structure that already contains these files.
  • none - An exception is thrown if the nodes contain any extra properties.
inclusionPattern
Optional property that specifies a regular expression that is used to help determine which files and folders in the underlying file system are exposed through this connector. The connector will expose only those files and folders with a name that matches the provided regular expression (as long as they also are not excluded by the exclusionPattern ). If no inclusion pattern is specified, then the connector will include all files and folders that are not excluded via the exclusionPattern .
exclusionPattern
Optional property that specifies a regular expression that is used to help determine which files and folders in the underlying file system are not exposed through this connector. Files and folders with a name that matches the provided regular expression will not be exposed by this source.
addMimeTypeMixin
A boolean flag that specifies whether this connector should add the mix:mimeType mixin to the nt:resource nodes to include the jcr:mimeType property. If set to true, the MIME type is computed immediately when the nt:resource node is accessed, which might be expensive for larger files. This is false by default.
readOnly
A boolean flag that specifies whether this source can create/modify/remove files and directories on the file system to reflect changes in the JCR content. By default, sources are not read-only.
cacheTtlSeconds
Optional property that specifies the default maximum number of seconds (i.e., time to live) that a node returned by this connector should be cached in the workspace cache before being expired. By default, the connector will not set a special value, and the repository will determine how long the node is cached in the workspace cache.
isQueryable
Optional property that specifies whether or not the content exposed by this connector should be indexed by the repository. This acts as a global flag, allowing a specific connector to mark its entire content as non-queryable. By default, all content exposed by a connector is queryable.
pageSize
(Added in this release) Optional property that controls the number of children that the connector should include in a single page; the default is 20. For example, if a folder contains 200 items (e.g., files or folders) and the page size is 20, then the connector will include in the document representing this folder only the properties of the folder and the first 20 items (that are readable, that satisfy the inclusion pattern, and that does not match the exclusion pattern). As additional children are needed (e.g., as the hierarchical database client navigates or accesses the folder's child nodes), the hierarchical database will request additional pages, each with up to 20 items.
By default, the file system connector will expose all of the files and folders that are underneath the specified directory and readable by the Java process, and it will allow hierarchical database clients using the JCR API to change, remove, or even create new files and folders. Additionally, any "extra properties" (e.g., those that are not directly mappable to file system attributes, such as jcr:primaryType, jcr:created, jcr:lastModified, and jcr:data) will be stored not on the file system but in the same Infinispan cache that the repositories own internal (non-federated) content is stored. The connector will also use pages to efficiently work with folders with large numbers of items.
If other behavior is desired, set the connector's properties to non-default values. For example, if hierarchical database clients are not allowed to modify, create, or remove file and folder nodes, then the connector should be configured with readOnly set to true . Or, if only certain files and folders are to be exposed, set the inclusionPattern and exclusionPattern to regular expressions that the connector can use to know whether to include or exclude files and folders by name. Note that any file or folder will only be exposed by the connector when the file/folder is readable and when its name satisfies the inclusionPatternand does not satisfy the exclusion pattern.
The connector is often used to expose as content in a repository the existing files and folders on the file system. Since the connector does not access any OS-specific file attributes, the connector maps each existing file and folder as follows:
  • A folder is represented in the hierarchical database as a node with a primary type of nt:folder, no mixin types, and the jcr:created timestamp set to the last modified timestamp given by the file system. The node will contain a child for each file and folder that are to be exposed (as discussed above).
  • A file is represented in the hierarchical database as a node with a primary type of nt:file, no mixin types, and the jcr:created timestamp set to the last modified timestamp given by the file system. The node will contain a single child node named jcr:content that represents the content of the file, and which has a primary type of nt:resource and the jcr:lastModified timestamp set to the file system's last modified timestamp for the file. If the connector is configured with addMimeTypeMixin set to true , then the hierarchical database will also attempt to determine the MIME type for the file's content and, if determined, add the mix:mimeType mixin and the jcr:mimeType property to the jcr:content node.
Here is a sample configuration that projects the //a/b/c directory onto a node the repository at /files, with the above (default) behavior:
{
    ...
    "externalSources" : {
        "local-git-repo" : {
            "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
            "directoryPath" : "/a/b/c/",
            "projections" : \[ "/files" \]
        }
    }
    ...
}
Here is a slightly different configuration that is read-only, that excludes any files or folders with names that end with "{{.tmp}" (and have at least one character before this suffix), and that includes the automatically-detected MIME type:
{
    ...
    "externalSources" : {
        "local-git-repo" : {
            "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
            "directoryPath" : "/a/b/c/",
            "projections" : \[ "/files" \],
            "readOnly" : true,
            "addMimeTypeMixin" : true,
            "exclusionPattern" : ".+[.]tmp$"
        }
    }
    ...
}
Of course, some applications may want to set additional properties and/or mixins. When the connector is writable (e.g., not read-only), the connector can store these properties in one of several places, based upon the extraPropertyStorage configuration property. By default, these extra properties are stored in the same Infinispan cache where the hierarchical database repository stores the rest of its internal (non-federated) content. This is convenient, but can lead to orphaned documents in the Infinispan cache should files and folder be removed outside of the hierarchical database.
Alternatively, the connector can store these extra properties on the file system. Any extra properties on a file or folder will be stored in a "sidecar" next to the corresponding file or folder and named similarly to the corresponding file or folder but with a special suffix. If stored as a JSON file, the suffix will be .modeshape.json, or if stored as a text file the suffix will be .modeshape. (The text format is the same as that used in the previous release, but is provided only for backward compatibility. Where possible, choose the JSON format.) Extra properties on the jcr:content child of nt:file nodes are stored in a different sidecar file, named similarly to the corresponding file but with the .content.modeshape.json or .content.modeshape suffix. Note that these sidecar files are never exposed as nodes by the connector.
It is even possible to prevent updating or creating files and folders with extra properties. To do this, configure the connector with the extraPropertyStorage property set to none.
Here is another sample configuration for a connector that works the same as the earlier configuration except that it is now storing extra properties in a JSON sidecar:
{
    ...
    "externalSources" : {
        "local-git-repo" : {
            "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
            "directoryPath" : "/a/b/c/",
            "projections" : \[ "/files" \],
            "readOnly" : true,
            "addMimeTypeMixin" : true,
            "exclusionPattern" : ".+[.]tmp$",
            "extraPropertyStorage" : "json"
        }
    }
    ...
}