Appendix B. eXo JCR

B.1. Introduction and Support Scope

eXo JCR usage

The portal is using a JCR API to store its information for internal usage. Red Hat does not support the JCR as an application information store.
The information contained in this appendix is provided to assist users to understand particular low level details on how the portal works, and how it can be fine-tuned.
The term JCR refers to the Java Content Repository. The JCR is the data store of the portal. All content is stored and managed using the JCR.
The eXo JCR included with the portal is a JSR-170 compliant implementation of the JCR 1.0 specification. The JCR provides the following features:
  • Revision control
  • Textual search
  • Access control
  • Content event monitoring
  • Text and binary data for internal portal usage
The back-end storage for the JCR can be provisioned as a file system or a database.

B.2. Concepts

B.2.1. Repository and Workspace

Repository
A repository is a form of data storage device. A repository differs from a database in the nature of the information contained. While a database holds hard data in rigid tables, a repository may access the data on a database by using less rigid meta-data. So, a repository operates as an interpreter between the database(s) and the user.

Note

The data model for the interface (the repository) is rarely the same as the data model used by the repository's underlying storage subsystems (such as a database) but the repository is able to make persistent data changes in the storage subsystem.
Workspace
The eXo JCR uses workspaces as the main data abstraction in its data model. The content is stored in a workspace as a hierarchy of items and each workspace has its own hierarchy of items.
Repositories access one or more workspaces. Persistent JCR workspaces consist of a directed acyclic graph of items where the edges represent the parent-child relation.

B.2.2. Items

An item is either a node or a property. Properties contain the data. Data can be simple values or binary data. The nodes give structure to a workspace and the properties hold the data.
Nodes
Nodes are identified using accepted namespacing conventions. Changed nodes are versioned through an associated version graph. This type of versioning helps to preserve data integrity.
Nodes can have various properties or child nodes associated to them.
Properties
Properties hold data as values of predefined types, such as, String, Binary, Long, Boolean, Double, Date, Reference and Path.

B.2.3. The Data Model

The core of any Content Repository is the data model. The data model defines the data elements such as fields, columns, attributes, and the relationships between these elements. These data elements are stored in the Content Repository.
Data elements can be singular pieces of information, for example the value 3.14, or compound values pi= 3.14. A data model uses concepts like nodes, arrays and links to define relationships between data elements.
The use and structure of these elements forms the content repository's data model.

B.2.4. Data Abstraction

Data abstraction describes the separation between abstract and concrete properties of data, stored in a repository.
The concrete properties of the data implementation may be changed without affecting the abstract properties of the data, which are read by the data client.
Consider the presentation of data in a list, graph or table. While the information implementation may change, the data itself is unaffected, and readers to whom the data is presented can perform a mental abstraction to interpret it correctly, regardless of the implementation.

B.3. eXo JCR Repository Service Components

eXo JCR Implementation consist of an eXoContainer. The eXoContainer consist of Repository Service, Repository and the Workspace.
Diagram explaining the relationships between the eXo Repository service components, which are described below.

Figure B.1. eXo JCR Repository Service Components

Repository Service Component Definitions

ExoContainer:
A subclass of org.exoplatform.container.ExoContainer (org.exoplatform.container.PortalContainer) holds a reference to the Repository Service.
Repository Service
This contains information about repositories. eXo JCR is able to manage many Repositories.
Repository
An implementation of javax.jcr.Repository. It holds references to one or more Workspace(s).
Workspace
Container of a single rooted tree of Items.

Note

An ExoContainer Workspace is not the same as a javax.jcr.Workspace: it is not a per-session object.
The usual JCR application use-case includes two initial steps:
  1. Obtain a repository object by getting Repository Service through a JNDI lookup if an eXo repository is bound to the naming context.
  2. Create a javax.jcr.Session object that calls Repository.login(..).

B.3.1. Workspace Data Model

The Workspace working model shown here explains the components of eXo JCR implementation, that are used in the data flow to perform operations specified in the JCR API.
Diagram explaining which components of an eXo JCR implementation are used in a data flow to perform operations specified in the JCR API.

Figure B.2. Workspace Data Model

The Workspace Data Model has four levels. These levels are created based on data isolation and value from the JCR model perspective.

JCR API
The eXo JCR core implements JCR API interfaces. The JCR API interface includes Item, Node, Property and JCR logical view on stored data.
Session Level
Session Level isolates transient data viewable inside one JCR Session and interacts with API level using eXo JCR internal API.
Session Data Manager
Session Data Manager maintains transient session data. With data access/ modification/ validation logic, it contains Modified Items Storage to hold the data changed between subsequent save() calling and Session Items Cache.
Transaction Data Manager
Transaction Data Manager maintains session data between save() and transaction commit/ rollback if the current session is part of a transaction.
Workspace Level
Workspace Level operates for particular workspace shared data. It contains per-Workspace objects
Workspace Storage Data Manager
Workspace Storage Data Manager maintains workspace data, including final validation, events firing and caching.
Workspace Data Container
Workspace Data Container implements physical data storage. It allows different types of backend, such as RDB, FS files, and so on to be used as storage for JCR data. Along with the main Data Container, other storages for persisted Property Values can be configured and used.
Indexer
Indexer maintains workspace data indexing for further queries.
Storage Level
Storage Level provides persistent storages for:
  • JCR Data
  • Indexes (Apache Lucene)
  • Values for BLOBs if they are different from the main Data Container

B.4. Template for JCR configuration file

The JCR configuration is defined in an XML file which is constructed as per the DTD show here:
!ELEMENT repository-service (repositories)
!ATTLIST repository-service default-repository NMTOKEN #REQUIRED
!ELEMENT repositories (repository)
!ELEMENT repository (security-domain,access-control,session-max-age,authentication-policy,workspaces)
!ATTLIST repository
  default-workspace NMTOKEN #REQUIRED
  name NMTOKEN #REQUIRED
  system-workspace NMTOKEN #REQUIRED
!ELEMENT security-domain (#PCDATA)
!ELEMENT access-control (#PCDATA)
!ELEMENT session-max-age (#PCDATA)
!ELEMENT authentication-policy (#PCDATA)
!ELEMENT workspaces (workspace+)
!ELEMENT workspace (container,initializer,cache,query-handler)
!ATTLIST workspace name NMTOKEN #REQUIRED
!ELEMENT container (properties,value-storages)
!ATTLIST container class NMTOKEN #REQUIRED
!ELEMENT value-storages (value-storage+)
!ELEMENT value-storage (properties,filters)
!ATTLIST value-storage class NMTOKEN #REQUIRED
!ELEMENT filters (filter+)
!ELEMENT filter EMPTY
!ATTLIST filter property-type NMTOKEN #REQUIRED
!ELEMENT initializer (properties)
!ATTLIST initializer class NMTOKEN #REQUIRED
!ELEMENT cache (properties)
!ATTLIST cache 
  enabled NMTOKEN #REQUIRED
  class NMTOKEN #REQUIRED

!ELEMENT query-handler (properties)
!ATTLIST query-handler class NMTOKEN #REQUIRED
!ELEMENT access-manager (properties)
!ATTLIST access-manager class NMTOKEN #REQUIRED
!ELEMENT lock-manager (time-out,persister)
!ELEMENT time-out (#PCDATA)
!ELEMENT persister (properties)
!ELEMENT properties (property+)
!ELEMENT property EMPTY

B.4.1. Portal Configuration for JCR

JCR services are registered in the Portal container.
The DTD show here is an example configuration from the file jcr-configuration.xml. The configuration file is located at $JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/.
<component>
  <key>org.exoplatform.services.jcr.RepositoryService</key>
  <type>org.exoplatform.services.jcr.impl.RepositoryServiceImpl</type>
  <component-plugins>
    <component-plugin>
      <name>add.namespaces</name>
      <set-method>addPlugin</set-method>
      <type>org.exoplatform.services.jcr.impl.AddNamespacesPlugin</type>
      <init-params>
	<properties-param>
	  <name>namespaces</name>
	  <property name="test" value="http://www.apache.org/jackrabbit/test"/>
	  <property name="exojcrtest" value="http://www.exoplatform.org/jcr/test/1.0"/>
	  <property name="rma" value="http://www.rma.com/jcr/"/>
	  <property name="metadata" value="http://www.exoplatform.com/jcr/metadata/1.1/"/>
	  <property name="dc" value="http://purl.org/dc/elements/1.1/"/>
	  <property name="publication" value="http://www.exoplatform.com/jcr/publication/1.1/"/>
	</properties-param>
      </init-params>
    </component-plugin>
    <component-plugin>
      <name>add.nodeType</name>
      <set-method>addPlugin</set-method>
      <type>org.exoplatform.services.jcr.impl.AddNodeTypePlugin</type>
      <init-params>
	<values-param>
	  <name>autoCreatedInNewRepository</name>
	  <description>Node types configuration file</description>
	  <value>jar:/conf/test/nodetypes-tck.xml</value>
	  <value>jar:/conf/test/nodetypes-impl.xml</value>
	  <value>jar:/conf/test/nodetypes-usecase.xml</value>
	  <value>jar:/conf/test/nodetypes-config.xml</value>
	  <value>jar:/conf/test/nodetypes-config-extended.xml</value>  
	  <value>jar:/conf/test/wcm-nodetypes.xml</value>
	  <value>jar:/conf/test/nodetypes-publication-config.xml</value>
	  <value>jar:/conf/test/publication-plugins-nodetypes-config.xml</value>          
	</values-param>
	
	<values-param>
	  <name>testInitNodeTypesRepository</name>
	  <description>
	    Node types configuration file for repository with name testInitNodeTypesRepository
	  </description>
	  <value>jar:/conf/test/nodetypes-test.xml</value>
	</values-param>
	
	<values-param>
	  <name>testInitNodeTypesRepositoryTest2</name>
	  <description>
	    Node types configuration file for repository with name testInitNodeTypesRepositoryTest2
	  </description>
	  <value>jar:/conf/test/nodetypes-test2.xml</value>
	</values-param>
	
	<!--values-param>
	<name>testInitNodeTypesRepositoryTest3</name>
	<description>Node types from ext. Needed bacause core starup earlie than ext</description>
	<value>jar:/conf/test/nodetypes-test3_ext.xml</value>
	</values-param-->
	
      </init-params>
    </component-plugin>
  </component-plugins>
</component>

<component>
  <key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key>
  <type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type>
  <init-params>
    <value-param>
      <name>conf-path</name>
      <description>JCR configuration file</description>
      <value>jar:/conf/standalone/test-jcr-config-jbc.xml</value>
    </value-param>
    <properties-param>
      <name>working-conf</name>
      <description>working-conf</description>
      <property name="dialect" value="auto" />
      <property name="source-name" value="jdbcjcr"/>
      <property name="persister-class-name" value="org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister"/>
    </properties-param>
  </init-params>
</component>

B.4.1.1. JCR Workspace Configuration

The JCR Service can use multiple Repositories and each repository can have multiple Workspaces. To configure the workspaces, you need to edit the repository configuration file.
Configure the workspaces by locating the workspace you need to modify. The workspace is found in $JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/repository-configuration.xml.
The repository configuration supports human-readable value as they are not case-sensitive.
Complete the appropriate element fields using the following value formats:
Number formats
  • K or KB for kilobytes.
  • M or MB for megabytes.
  • G or GB for gigabytes.
  • T or TB for terabytes.
  • Examples: 200K or 200KB; 4M or 4MB; 1.4G or 1.4GB; 10T or 10TB.
Time formats:
  • ms for milliseconds.
  • s for seconds.
  • m for minutes.
  • h for hours.
  • d for days.
  • w for weeks.
  • The default time format is seconds.
  • Examples: 500ms or 500 milliseconds; 20, 20s or 20 seconds; 30m or 30 minutes; 12h or 12 hours; 5d or 5 days; 4w or 4 weeks.

B.4.1.2. JCR Repository Service Configuration

The JCR Repository service configuration includes declaring repository names, workspace names, security domain name, access control policy, authentication policy name, and so on.
<!-- Comment #1 -->
<repository-service default-repository="repository">
  <!-- The list of repositories. -->
  <repositories>
    <!-- Comment #2. -->
    <repository name="repository" system-workspace="system" default-workspace="portal-system">
      <!-- Comment #3 -->
      <security-domain>gatein-domain</security-domain>
      <!-- Comment #4 -->
      <access-control>optional</access-control>
      <!-- Comment #5 -->
      <authentication-policy>org.exoplatform.services.jcr.impl.core.access.JAASAuthenticator</authentication-policy>
      ...
      <!-- Comment #6 -->
      <workspaces>
      ...
        <!-- 	Comment #7 -->
        <workspace name="portal-system">
          <!-- Comment #8 --> 
          <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
            <properties>
              <property name="source-name" value="${gatein.jcr.datasource.name}${container.name.suffix}"/>
              <property name="dialect" value="${gatein.jcr.datasource.dialect}"/>
              <property name="multi-db" value="false"/>
              <property name="update-storage" value="true"/>
              <property name="max-buffer-size" value="204800"/>
              <property name="swap-directory" value="${gatein.jcr.data.dir}/swap/portal-system${container.name.suffix}"/>
            </properties> 
            <value-storages>
              <value-storage id="portal-system" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
                <properties>
                  <property name="path" value="${gatein.jcr.storage.data.dir}/portal-system${container.name.suffix}"/>
                </properties>
                <filters>
                  <filter property-type="Binary"/>
                </filters>
              </value-storage>
            </value-storages>
          </container>
          <!-- Comment #9 -->
          <initializer class="org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer">
            <properties>
              <property name="root-nodetype" value="nt:unstructured"/>
              <property name="root-permissions" value="*:/platform/administrators read;*:/platform/administrators add_node;*:/platform/administrators set_property;*:/platform/administrators remove"/>
            </properties>
          </initializer>
          <!-- Comment #10 -->
          <cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache">
            <properties>
              <property name="jbosscache-configuration" value="${gatein.jcr.cache.config}" />
              <property name="jgroups-configuration" value="${gatein.jcr.jgroups.config}" />
              <property name="jgroups-multiplexer-stack" value="true" />
              <property name="jbosscache-cluster-name" value="jcr-${container.name.suffix}-portal-system" />
            </properties>
          </cache>
          <!-- Comment #11 -->
          <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
            <properties>
              <property name="index-dir" value="${gatein.jcr.index.data.dir}/portal-system${container.name.suffix}"/>
              <property name="changesfilter-class" value="${gatein.jcr.index.changefilterclass}" />
              <property name="jbosscache-configuration" value="${gatein.jcr.index.cache.config}" />
              <property name="jgroups-configuration" value="${gatein.jcr.jgroups.config}" />
              <property name="jgroups-multiplexer-stack" value="true" />
              <property name="jbosscache-cluster-name" value="jcrindexer-${container.name.suffix}-portal-system" />
              <property name="max-volatile-time" value="60" />
            </properties>
          </query-handler>
          <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl">
            <properties>
              <!-- Comment #12 -->
              <property name="time-out" value="15m" />
              <property name="jbosscache-configuration" value="${gatein.jcr.lock.cache.config}" />
              <property name="jgroups-configuration" value="${gatein.jcr.jgroups.config}" />
              <property name="jgroups-multiplexer-stack" value="true" />
              <property name="jbosscache-cluster-name" value="jcrlock-${container.name.suffix}-portal-system" />
              <property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlock_portal_system" />
              <property name="jbosscache-cl-cache.jdbc.table.create" value="true" />
              <property name="jbosscache-cl-cache.jdbc.table.drop" value="false" />
              <property name="jbosscache-cl-cache.jdbc.table.primarykey" value="pk" />
              <property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn" />
              <property name="jbosscache-cl-cache.jdbc.node.column" value="node" />
              <property name="jbosscache-cl-cache.jdbc.parent.column" value="parent" />
              <property name="jbosscache-cl-cache.jdbc.datasource" value="${gatein.jcr.datasource.name}${container.name.suffix}" />
            </properties>
          </lock-manager>
        </workspace>
Comment #1
The name of the default repository, the one returned by RepositoryService.getRepository().
Comment #2
The parameters are name of a repository, the name of workspace where /jcr:system node is placed and the name of the workspace obtained using Session's login() or login(Credentials) methods (the ones without an explicit workspace name).
Comment #3
The name of a security domain for JAAS authentication
Comment #4
The name of an access control policy. There can be 3 types: optional - ACL is created on-demand(default), disable - no access control, mandatory - an ACL is created for each added node(not supported yet).
Comment #5
The name of an authentication policy class
Comment #6
The list of workspaces
Comment #7
The name of the workspace
Comment #8
Workspace data container (physical storage) configuration
Comment #9
Workspace initializer configuration
Comment #10
Workspace storage cache configuration
Comment #11
Query handler configuration
Comment #12
The amount of time before the unused global lock is removed.

session-max-age

session-max-age This parameter is not shown in the example file above as it is optional. The session-max-age sets the time after which an idle session is removed (called logout). If it is not set up, an idle session will never be removed.
lock-remover-max-threads denotes the number of threads that can serve LockRemover tasks. The default value is of lock-remover-max-threads 1. Each workspace has a LockManager. JCR supports Locks with defined lifetime. The LockRemover removes a lock on expiration. LockRemovers is not an independent timer-thread, its a task that is executed every 30 seconds. Such a task is served by ThreadPoolExecutor which may use different number of threads.

B.4.1.3. Workspace Configuration Parameters

name
The name of a workspace.
auto-init-root-nodetype
DEPRECATED in JCR 1.9 (use initializer). The node type for root node initialization.
container
Workspace data container (physical storage) configuration.
initializer
Workspace initializer configuration.
cache
Workspace storage cache configuration.
query-handler
Query handler configuration.
auto-init-permissions
DEPRECATED in JCR 1.9 (use initializer). Default permissions of the root node. It is defined as a set of semicolon-delimited permissions containing a group of space-delimited identities and the type of permission.
For example, any read; :/admin read;:/admin add_node; :/admin set_property;:/admin remove means that users from group admin have all permissions and other users have only a 'read' permission.

B.4.1.4. Workspace Data Container Configuration Parameters

class
A workspace data container class name.
properties
The list of properties (name-value pairs) for the concrete Workspace data container.

Parameter Description

trigger_events_for_descendents_on_rename
Indicates the need to trigger events for descendants on rename or not. It allows to increase performance on rename operation but in same time Observation is not notified. The default value is true
lazy-node-iterator-page-size
The page size for lazy iterator. Indicates how many nodes can be retrieved from storage per request. The default value is 100
acl-bloomfilter-false-positive-probability
ACL Bloom-filter desired false positive probability. Range [0..1]. Default value 0.1d.
acl-bloomfilter-elements-number
Expected number of ACL-elements in the Bloom-filter. Default value 1000000.

Note

Bloom filters are not supported by all the cache implementations so far only the implementation for infinispan supports it.
value-storage
The list of value storage plug-ins.

B.4.1.5. Value Storage plug-in Configuration Parameters

Note

The value-storage element is optional. If you do not include it, the values will be stored as BLOBs inside the database.
value-storage
Optional value Storage plug-in definition.
class
A value storage plug-in class name (attribute).
properties
The list of properties (name-value pairs) for a concrete Value Storage plug-in.
filters
The list of filters defining conditions when this plug-in is applicable.

B.4.1.6. Initializer Configuration Parameters

class
Initializer implementation class.
properties
The list of properties (name-value pairs).
root-nodetype
The node type for root node initialization.
root-permissions
Default permissions of the root node. It is defined as a set of semicolon-delimited permissions containing a group of space-delimited identities such as user, group, and the type of permission. For example any read; :/admin read; :/admin add_node;:/admin set_property;:/admin remove means that users from group admin have all permissions and other users have only a read permission.
Configurable initializer adds a capability to override workspace initial startup procedure used for Clustering. It also replaces workspace element parameters auto-init-root-nodetype and auto-init-permissions with root-nodetype and root-permissions.

B.4.1.7. Cache Configuration Parameters

enabled
Checks if workspace cache is enabled or not.
class
Cache implementation class. The default value is org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl.This parameter is optional from JCR version 1.9.
Cache can be configured to use concrete implementation of WorkspaceStorageCache interface. JCR core has two implementation as:
  • LinkedWorkspaceStorageCacheImpl, which is the default implementation with configurable read behavior and statistic.
  • WorkspaceStorageCacheImpl was used pre 1.9 and can still be used.
properties
The list of properties (name-value pairs) for Workspace cache.
max-size
Cache maximum size (maxSize prior to v.1.9).
live-time
Cached item live time (liveTime prior to v.1.9).
From 1.9 LinkedWorkspaceStorageCacheImpl supports additional optional parameters.
statistic-period
Period (time format) of cache statistic thread execution, the default value is 5 minutes.
statistic-log
If true cache statistic will be printed to default logger (log.info),
statistic-clean
,
cleaner-period
Time period to remove the oldest items. the default time is 20 minutes.
blocking-users-count
Number of concurrent users allowed to read cache storage. The default value is 0 which means unlimited.

B.4.1.8. Query Handler Configuration Parameters

class
A Query Handler class name.
properties
The list of properties (name-value pairs) for a Query Handler (indexDir).
Properties and advanced features described in Search Configuration.

B.4.1.9. Lock Manager Configuration Parameters

time-out
Time after which the unused global lock will be removed.
persister
A class for storing lock information for future use. For example, remove lock after jcr restart.
path
A lock folder. Each workspace has its own lock folder.

B.5. Multi-language Support

Whenever a relational database is used to store multilingual text data in the eXo Java Content Repository the configuration must be adapted to support UTF-8 encoding. Dialect is automatically detected for certified database. You can still enforce it in case of failure.
The following sections describe enabling UTF-8 support with various databases.

Note

  • The configuration file to be modified for these changes is $JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/repository-configuration.xml.
  • The datasource jdbcjcr used in the following examples can be configured via the InitialContextInitializer component.

B.5.1. Oracle

In order to run multilanguage JCR on an Oracle backend Unicode encoding for characters set should be applied to the database. Other Oracle globalization parameters do not have any effect. The property to modify is NLS_CHARACTERSET.
The NLS_CHARACTERSET = AL32UTF8 entry has been successfully tested with many European and Asian languages.
Example of database configuration:
NLS_LANGUAGE             AMERICAN
NLS_TERRITORY            AMERICA
NLS_CURRENCY             $
NLS_ISO_CURRENCY         AMERICA
NLS_NUMERIC_CHARACTERS   .,
NLS_CHARACTERSET         AL32UTF8
NLS_CALENDAR             GREGORIAN
NLS_DATE_FORMAT          DD-MON-RR
NLS_DATE_LANGUAGE        AMERICAN
NLS_SORT                 BINARY
NLS_TIME_FORMAT          HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT     DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT       HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT  DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY        $
NLS_COMP                 BINARY
NLS_LENGTH_SEMANTICS     BYTE
NLS_NCHAR_CONV_EXCP      FALSE
NLS_NCHAR_CHARACTERSET   AL16UTF16
Create database with Unicode encoding and use Oracle dialect for the Workspace Container:
<workspace name="collaboration">
          <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
            <properties>
              <property name="source-name" value="jdbcjcr" />
              <property name="dialect" value="oracle" />
              <property name="multi-db" value="false" />
              <property name="max-buffer-size" value="200k" />
              <property name="swap-directory" value="target/temp/swap/ws" />
            </properties>
          .....

Warning

JCR does not use NVARCHAR columns, so the value of the parameter NLS_NCHAR_CHARACTERSET does not matter for JCR.

B.5.2. DB2

DB2 Universal Database (DB2 UDB) supports UTF-8 and UTF-16/UCS-2. When a Unicode database is created, CHAR, VARCHAR and LONG VARCHAR data are stored in UTF-8 form.
This enables JCR multi-lingual support.
Below is an example of creating a UTF-8 database using the db2 dialect for a workspace container with DB2 version 9 and higher:
DB2 CREATE DATABASE dbname USING CODESET UTF-8 TERRITORY US
<workspace name="collaboration">
          <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
            <properties>
              <property name="source-name" value="jdbcjcr" />
              <property name="dialect" value="db2" />
              <property name="multi-db" value="false" />
              <property name="max-buffer-size" value="200k" />
              <property name="swap-directory" value="target/temp/swap/ws" />
            </properties>
          .....

Note

For DB2 version 8.x support change the property "dialect" to db2v8.

B.5.3. MySQL

To use JCR with MySQL database requires the dialect, MySQL-UTF8 for internationalization support.
The default charset for the database is latin1. Latin1 allows to use limited index space effectively, for example 1000 bytes for MyISAM engine and 767 for InnoDB.

Note

If the default charset of the database is multibyte, you will face a JCR database initialization error is concerning index creation failure.
JCR works on any single byte default charset of database, with UTF8 supported by MySQL server. However, it has only been tested using the latin1 charset.
An example:
<workspace name="collaboration">
          <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
            <properties>
              <property name="source-name" value="jdbcjcr" />
              <property name="dialect" value="mysql-utf8" />
              <property name="multi-db" value="false" />
              <property name="max-buffer-size" value="200k" />
              <property name="swap-directory" value="target/temp/swap/ws" />
            </properties>
          .....
You need to indicate the charset name either at server level using the server parameter --character-set-server or at datasource configuration level by adding a new property as shown here:

	<property name="connectionProperties" value="useUnicode=yes;characterEncoding=utf8;characterSetResults=UTF-8;" />

B.5.4. PostgreSQL

Multilingual support can be enabled with PostgreSQL in the following ways:
  1. To use the locale features of the operating system to provide locale-specific collation order, number formatting, translated messages, and so on.
    UTF-8 is widely used on Linux distributions and can be useful in such a scenario.
  2. To provide different character sets defined in the PostgreSQL server, including multiple-byte character sets, to support storing text in any language and providing character set translation between client and server.
    Using UTF-8 database charset is recommended as it allows any-to-any conversations and make this issue transparent for the JCR.
Example of a database with UTF-8 encoding using PgSQL dialect for the Workspace Container:
<workspace name="collaboration">
          <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
            <properties>
              <property name="source-name" value="jdbcjcr" />
              <property name="dialect" value="pgsql" />
              <property name="multi-db" value="false" />
              <property name="max-buffer-size" value="200k" />
              <property name="swap-directory" value="target/temp/swap/ws" />
            </properties>
          .....

B.6. Configuring Search

The search function in JCR can be configured to perform in specific ways. This section discusses configuring the search function to improve search performance and results.
The JCR index configuration file is located at $JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/repository-configuration.xml.
A code example is listed here with a list of the configuration parameters shown below:
<repository-service default-repository="db1">
  <repositories>
    <repository name="db1" system-workspace="ws" default-workspace="ws">
       ....
      <workspaces>
        <workspace name="ws">
       ....
          <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
            <properties>
              <property name="index-dir" value="${java.io.tmpdir}/temp/index/db1/ws" />
              <property name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" />
              <property name="synonymprovider-config-path" value="/synonyms.properties" />
              <property name="indexing-config-path" value="/indexing-configuration.xml" />
              <property name="query-class" value="org.exoplatform.services.jcr.impl.core.query.QueryImpl" />
            </properties>
          </query-handler>
        ... 
        </workspace>
     </workspaces>
    </repository>        
  </repositories>
</repository-service>
The table outlines some of the Configuration Parameters. The table lists the default setting and the version of eXo JCR.

Table B.1. Configuration parameters

Parameter
Default Setting
Description
JCR Version
index-dir
none
The location of the index directory. This parameter is mandatory. It is called "indexDir" in versions prior to eXo JCR version 1.9.
1.0
use-compoundfile
true
Advises lucene to use compound files for the index files.
1.9
min-merge-docs
100
The minimum number of nodes in an index until segments are merged.
1.9
volatile-idle-time
3
Idle time in seconds until the volatile index part is moved to a persistent index even though minMergeDocs is not reached.
1.9
max-merge-docs
Integer.MAX_VALUE
The maximum number of nodes in segments that will be merged. The default value changed to Integer.MAX_VALUE in eXo JCR version 1.9.
1.9
merge-factor
10
Determines how often segment indices are merged.
1.9
max-field-length
10000
The number of words that are full-text indexed at most per property.
1.9
cache-size
1000
Size of the document number cache. This cache maps UUID to lucene document numbers.
1.9
force-consistencycheck
false
Runs a consistency check on every start up. If false, a consistency check is only performed when the search index detects a prior forced shutdown.
1.9
auto-repair
true
Errors detected by a consistency check are automatically repaired. If false, errors are only written to the log.
1.9
query-class QueryImpl
Classname that implements the javax.jcr.query.Query interface.
This class must also extend from the class: org.exoplatform.services.jcr.impl.core. query.AbstractQueryImpl.
1.9
document-order
true
If true and the query does not contain an 'order by' clause, result nodes will be in document order. For better performance set to 'false' when queries return many nodes.
1.9
result-fetch-size
Integer.MAX_VALUE
The number of results when a query is executed. Default value: Integer.MAX_VALUE.
1.9
excerptprovider-class
DefaultXMLExcerpt
The name of the class that implements org.exoplatform.services.jcr.impl.core. query.lucene.ExcerptProvider.
This should be used for the rep:excerpt() function in a query.
1.9
support-highlighting
false
If set to true additional information is stored in the index to support highlighting using the rep:excerpt() function.
1.9
synonymprovider-class
none
The name of a class that implements org.exoplatform.services.jcr.impl.core. query.lucene.SynonymProvider.
The default value is null.
1.9
synonymprovider-config-path
none
The path to the synonym provider configuration file. This path is interpreted relative to the path parameter. If there is a path element inside the SearchIndex element, then this path is interpreted relative to the root path of the path. Whether this parameter is mandatory depends on the synonym provider implementation. The default value is null.
1.9
indexing-configuration-path
none
The path to the indexing configuration file.
1.9
indexing-configuration-class
IndexingConfigurationImpl
The name of the class that implements org.exoplatform.services.jcr.impl.core. query.lucene.IndexingConfiguration.
1.9
force-consistencycheck
false
If set to true a consistency check is performed depending on the parameter forceConsistencyCheck. If set to false no consistency check is performed on start up, even if a redo log had been applied.
1.9
spellchecker-class
none
The name of a class that implements org.exoplatform.services.jcr.impl.core. query.lucene.SpellChecker.
1.9
errorlog-size
50(KB)
The default size of error log file in KB.
1.9
upgrade-index
false
Allows JCR to convert an existing index into the new format. It is also possible to set this property via system property.
Indexes prior to eXo JCR 1.12 will not run with eXo JCR 1.12. You must run an automatic migration.
Start eXo JCR with:
 -Dupgrade-index=true
The old index format is then converted in the new index format. After the conversion the new format is used.
On subsequent starts this option is no longer needed. The old index is replaced and a back conversion is not possible
It is recommended that a backup of the index be made before conversion. (Only for migrations from JCR 1.9 and later.)
1.12
analyzer
org.apache.lucene.analysis. standard.StandardAnalyzer
Class name of a lucene analyzer to use for full-text indexing of text.
1.12

B.6.1. Global Search Index

eXo JCR uses the Lucene standard Analyzer to index content. The Lucene standard Analyzer uses some standard filters, to analyze the content.

Example B.1. Standard Analyzed Filters

public TokenStream tokenStream(String fieldName, Reader reader) {
    StandardTokenizer tokenStream = new StandardTokenizer(reader, replaceInvalidAcronym);
    tokenStream.setMaxTokenLength(maxTokenLength);
    // Comment #1
    TokenStream result = new StandardFilter(tokenStream);
    // Comment #2
    result = new LowerCaseFilter(result);
    // Comment #3
    result = new StopFilter(result, stopSet);
    return result;
  }
Comment #1: The first filter (StandardFilter) removes possessive apostrophes ('s) from the end of words and removes periods (.) from acronyms.
Comment #2: The second filter (LowerCaseFilter) normalizes token text to lower case.
Comment #3: The third filter (StopFilter) removes stop words from a token stream. The stop set is defined in the analyzer.
The global search index is configured in the $JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/repository-configuration.xml configuration file within the "query-handler" tag.
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
The same analyzer must be used for indexing and for querying in lucene otherwise results may be unpredictable. eXo JCR does this automatically. The StandardAnalyzer is the default analyzer. You can replace it with another analyzer.

B.6.1.1. Customized Search Indexes and Analyzers

eXo JCR uses the Lucene standard Analyzer to index contents. The Lucene standard analyzer uses some standard filters to analyze the content:
public TokenStream tokenStream(String fieldName, Reader reader) {
    StandardTokenizer tokenStream = new StandardTokenizer(reader, replaceInvalidAcronym);
    tokenStream.setMaxTokenLength(maxTokenLength);
    TokenStream result = new StandardFilter(tokenStream);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);
    return result;
  }
  • The first filter (StandardFilter) removes ('s ) from the end of words and removes dots from acronyms.
  • The second filter (LowerCaseFilter) normalizes token text to lower case.
  • The third filter (StopFilter) removes stop words from a token stream. The stop set is defined in the analyzer.
Additional filters are used in specific cases. For example, the ISOLatin1AccentFilter filter replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalents.

Note

The ISOLatin1AccentFilter is not present in the current lucene version used by eXo.

B.6.1.2. Creating a Customized Query Handler

To use a different filter, a new analyzer and a new search index must be created. A new search index is created to use the analyzer. These are packaged into a jar file, which is then deployed with the application.

Procedure B.1. Create a new filter, analyzer and search index

  1. Create a new filter.
    public final Token next(final Token reusableToken) throws java.io.IOException
    
    This defines how characters are read and used by the filter.
  2. Create the analyzer.
    The analyzer must extend org.apache.lucene.analysis.standard.StandardAnalyzer and overload the method.
    Use the following to use new filters.
    public TokenStream tokenStream(String fieldName, Reader reader)
    
  3. To create the new search index, extend org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex and write the constructor to set the correct analyzer.
    Use the method below to return your analyzer:
    public Analyzer getAnalyzer() {
    return MyAnalyzer;
    }
    

Note

In eXo JCR version 1.12 and later the analyzer can be directly during configuration. For users using this version, the creation of a new SearchIndex for new analyzers is redundant.

B.6.1.3. Configuring an application to use the new SearchIndex

To configure an application to use a new SearchIndex, replace the following code:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">

in $JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/repository-configuration.xml file with the new class:
<query-handler class="mypackage.indexation.MySearchIndex>

To configure an application to use a new analyzer, add the analyzer parameter to each query-handler configuration in $JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/repository-configuration.xml file:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
   <properties>
      ...
      <property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/>
      ...
   </properties>
</query-handler>
The new SearchIndex starts to index content with the specified filters once JCR is restarted.

B.6.2. Indexing Configuration

JCR version 1.9 onwards the default search index implementation in JCR allows the user to control the properties of a node that are indexed. Different analyzers can also be set for different nodes.
The configuration parameter for indexing is called indexingConfiguration. The indexing parameter is not set by default, which means all properties of a node are indexed.
To configure the indexing behavior, you have to add a parameter to the query-handler element in your configuration file as shown here:
<param name="indexing-configuration-path" value="/indexing_configuration.xml"/>

B.6.2.1. Node Scope Limit

You can limit the scope of a node such that certain properties of a node type are indexed. This is useful to optimize the index size.
The configuration shown here indexes two properties named, Text for nt:unstructured node types. This configuration applies to all the nodes type extending from nt:unstructured.
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured">
    <property>Text</property>
  </index-rule>
</configuration>

Namespace Prefixes

The namespace prefixes must be declared throughout the XML file in the configuration element that is being used.

B.6.2.2. Configuring Index Boost Value

JCR allows to configure a boost value for the nodes that match the index rule. The default boost value is 1.0. Higher boost values in the range of 1.0 - 5.0 will yield a higher score value and appear as more relevant.

Example B.2. Configuring index boost value

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0">
    <property>Text</property>
  </index-rule>
</configuration>
To configure certain properties, you can provide a boost value for the listed properties:
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured">
    <property boost="3.0">Title</property>
    <property boost="1.5">Text</property>
  </index-rule>
</configuration>

B.6.2.3. Adding Condition to Index Rules

You can add a condition to the index rule and have multiple rules with the same nodeType. The first index rule that matches will apply and all remaining ones are ignored:

Example B.3. Adding condition to index nodes

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0"
              condition="@priority = 'high'">
    <property>Text</property>
  </index-rule>
  <index-rule nodeType="nt:unstructured">
    <property>Text</property>
  </index-rule>
</configuration>
In the above example the first rule applies if the nt:unstructured node has a priority property with a value high. The condition syntax only supports the equals operator and a string literal.

Example B.4. Referencing properties

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0"
              condition="ancestor::*/@priority = 'high'">
    <property>Text</property>
  </index-rule>
  <index-rule nodeType="nt:unstructured"
              boost="0.5"
              condition="parent::foo/@priority = 'low'">
    <property>Text</property>
  </index-rule>
  <index-rule nodeType="nt:unstructured"
              boost="1.5"
              condition="bar/@priority = 'medium'">
    <property>Text</property>
  </index-rule>
  <index-rule nodeType="nt:unstructured">
    <property>Text</property>
  </index-rule>
</configuration>
The indexing configuration allows to specify the type of a node in the condition.

Example B.5. Specify node type in the condition

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0"
              condition="element(*, nt:unstructured)/@priority = 'high'">
    <property>Text</property>
  </index-rule>
</configuration>

Note

The type match of the node must be exact. It does not consider sub types of the specified node type.

B.6.2.4. Exclusion from the Node Scope Index

All configured properties of type string and included in the node scope index are full-text indexed by default.
A node scope search normally finds all nodes of an index. For example, jcr:contains(., 'foo') returns all nodes that have a string property containing the word 'foo'.

Example B.6. Excluding properties explicitly from the node scope index

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured">
    <property nodeScopeIndex="false">Text</property>
  </index-rule>
</configuration>

B.6.2.5. Characteristics of Node Scope Searches

There are chances of unexpected behavior while using analyzers to search within a property as compared to searching within a node scope, because the node scope always uses the global analyzer.

Example B.7. Query

For example, the property "mytext" contains the text, testing my analyzers but no analyzers have been configured for this property and the default analyzer in SearchIndex has not changed.
xpath = "//*[jcr:contains(mytext,'analyzer')]"
where, xpath does not return a result in the node with the and default analyzers.

Example B.8. Searching in node scope

If a search is done on the node scope, no result is returned.
xpath = "//*[jcr:contains(.,'analyzer')]"
Only specific analyzers can be set on a node property. The node scope indexing and analyzing is done with the globally defined analyzer in the SearchIndex element.

Example B.9. Changing the analyzer from the Global Analyzer to German Analyzer

<analyzer class="org.apache.lucene.analysis.Analyzer.GermanAnalyzer">
<property>mytext</property>
</analyzer>
When the global analyzer used to index the property mytext is changed to a specific analyzer, for example German Analyzer, the search query return a result due to word stemming.

Example B.10. Search query after changing the analyzer

This search query returns a result because of the word stemming, for example, analyzers - analyzer.
xpath = "//*[jcr:contains(mytext,'analyzer')]"

Example B.11. Search query after changing the analyzer

This search query will not give a result, since the node scope is indexed with the global analyzer. Because global analyzer ignores word stemming.
xpath = "//*[jcr:contains(.,'analyzer')]"

Warning

While using analyzers for specific properties, a result may be found in a property for certain search text, but the same search text in the node scope of the property may not find a result.

Note

Both index rules and index aggregates influence how content is indexed in JCR. If the configuration is changed, the existing content is not automatically re-indexed according to the new rules.
Content must be manually re-indexed when the configuration is changed.

B.6.3. Advanced features

eXo JCR supports some advanced features, which are not specified in JSR 170:
  • Get a text excerpt with highlighted words that matches the query.
  • Search a term and its synonyms.
  • Search similar nodes.
  • Check spelling of a full text query statement.
  • Define index aggregates and rules for indexing configuration.

B.7. Configuring the JDBC Data Container

B.7.1. Introduction

eXo JCR persistent data container can work in two configuration modes:
  • Multi-database: One database for each workspace (used in standalone eXo JCR service mode).
  • Single-database: All workspaces persist in one database (used in embedded eXo JCR service mode, for example, eXo portal).
The data container uses the JDBC driver to communicate with the actual database software, i.e. any JDBC-enabled data storage can be used with eXo JCR implementation.

B.7.2. Supported Databases

The data container is tested with the following RDBMS:

Table B.2. Supported databases

Database Driver Version
IBM DB2 9.7 (FP5) IBM DB2 JDBC Universal Driver Architecture 4.13.80
Oracle 11g R1 (11.1.0.7.0) Oracle JDBC Driver 11.1.0.7
Oracle 11g R1 RAC (11.1.0.7.0) Oracle JDBC Driver 11.1.0.7
Oracle 11g R2 (11.2.0.3.0) Oracle JDBC Driver v11.2.0.3.0
Oracle 11g R2 RAC (11.2.0.3.0) Oracle JDBC Driver v11.2.0.3.0
MySQL 5.1 MySQL Connector/J 5.1.21
MySQL 5.5 MySQL Connector/J 5.1.21
Microsoft SQL Server 2008 Microsoft SQL Server JDBC Driver 3.0.1301.101, Microsoft SQL Server JDBC Driver 4.0.2206.100
Microsoft SQL Server 2008 R2 Microsoft SQL Server JDBC Driver 3.0.1301.101, Microsoft SQL Server JDBC Driver 4.0.2206.100
PostgreSQL 8.4.8 JDBC4 Postgresql Driver, Version 8.4-703
PostgreSQL 9.1.0 JDBC4 Postgresql Driver, Version 9.1-903
Sybase ASE 15.7 Sybase jConnect JDBC driver v7

Isolation Levels

JCR requires at the parameter READ_COMMITED, to read the isolation level. This parameter and other RDBMS configurations can cause issues. So, ensure that the proper isolation level is configured on database server side.

Note

A mandatory JCR requirement for underlying databases is case sensitive collation. Microsoft SQL Server 2005 and 2008 customers must configure their server with collation corresponding to operational requirements, while honoring case sensitivity. See the Microsoft SQL Server documentation page titled Selecting a SQL Server Collation at http://msdn.microsoft.com/en-us/library/ms144250.aspx

Warning

JCR does not support MyISAM storage engine for the MySQL relational database management system.

B.7.3. Configuring the database using SQL-script

Each database software supports ANSI SQL standards and its own specifics. Therefore, each database has its own configuration setting in the eXo JCR as a database dialect parameter.
SQL-script files allow a detailed configuration of the database. SQL-scripts are located in conf/storage/ directory of the $JPP_HOME/modules/org/gatein/lib/main/exo.jcr.component.core-1.15.3.jar file.
The following tables show the correspondence between the scripts and databases:

Table B.3. Single-database

Database Script
MySQL DB jcr-sjdbc.mysql.sql
MySQL DB with utf-8 jcr-sjdbc.mysql-utf8.sql
PostgresSQL jcr-sjdbc.pqsql.sql
Oracle DB jcr-sjdbc.ora.sql
DB2 9.7 jcr-sjdbc.db2.sql
Microsoft SQL Server jcr-sjdbc.mssql.sql
Sybase jcr-sjdbc.sybase.sql
HSQLDB jcr-sjdbc.sql

Table B.4. Multi-database

Database Script
MySQL DB jcr-mjdbc.mysql.sql
MySQL DB with utf-8 jcr-mjdbc.mysql-utf8.sql
PostgresSQL jcr-mjdbc.pqsql.sql
Oracle DB jcr-mjdbc.ora.sql
DB2 9.7 jcr-mjdbc.db2.sql
Microsoft SQL Server jcr-mjdbc.mssql.sql
Sybase jcr-mjdbc.sybase.sql
HSQLDB jcr-mjdbc.sql

B.7.4. Multilanguage support database configuration

If a non-ANSI node name is used, you must use a database with MultiLanguage support. Some JDBC drivers need additional parameters for establishing a Unicode friendly connection. For example, under mysql it is necessary to add an additional parameter for the JDBC driver at the end of JDBC URL:
There are pre-configured configuration files for HSQLDB located in the /conf/portal/ and /conf/standalone/ folders of the exo.jcr.component.core-1.15.3.jar file, or within the source distribution of the eXo JCR implementation.

Example B.12. Parameter

jdbc:mysql://exoua.dnsalias.net/portal?characterEncoding=utf8
The configuration files are located in service jars /conf/portal/configuration.xml (eXo services including JCR Repository Service) and exo-jcr-config.xml (repositories configuration) by default.
In the portal, the JCR is configured in a portal web application portal/WEB-INF/conf/jcr/jcr-configuration.xml (JCR Repository Service and related services) and repository-configuration.xml (repositories configuration).

B.7.5. Isolated-database Configuration

Isolated-database configuration allows configuring single database for repository, but separate database tables for each workspace.

Procedure B.2. Configuring isolated databases

  1. Configure the data container in the org.exoplatform.services.naming.InitialContextInitializer service. It is the JNDI context initializer, which registers (binds) naming resources (DataSources) for data containers.
    
      <external-component-plugins>
        <target-component>org.exoplatform.services.naming.InitialContextInitializer</target-component>
        <component-plugin>
          <name>bind.datasource</name>
          <set-method>addPlugin</set-method>
          <type>org.exoplatform.services.naming.BindReferencePlugin</type>
          <init-params>
            <value-param>
              <name>bind-name</name>
              <value>jdbcjcr</value>
            </value-param>
            <value-param>
              <name>class-name</name>
              <value>javax.sql.DataSource</value>
            </value-param>
            <value-param>
              <name>factory</name>
              <value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
            </value-param>
            <properties-param>
              <name>ref-addresses</name>
              <description>ref-addresses</description>
              <property name="driverClassName" value="org.postgresql.Driver"/>
              <!-- MVCC configured to prevent possible deadlocks when a global Tx is active -->
              <property name="url" value="jdbc:postgresql://exoua.dnsalias.net/portal"/>
              <property name="username" value="exoadmin"/>
              <property name="password" value="exo12321"/>
            </properties-param>
          </init-params>
        </component-plugin>
       
      </external-component-plugins>
    
    The following database connection parameters are configured:
    • driverClassName. For example: "org.hsqldb.jdbcDriver", "com.mysql.jdbc.Driver", "org.postgresql.Driver"
    • url. For example: "jdbc:hsqldb:file:target/temp/data/portal", "jdbc:mysql://exoua.dnsalias.net/jcr"
    • username. For example: "sa", "exoadmin"
    • password. For example: "exo12321"
  2. Configure the repository service.
    Each workspace is configured for the same data container.
    In this step, you are configuring two workspaces which will persist in different database tables.
    
    <workspaces>
      <workspace name="ws" >
      
        <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
        <properties>
          <property name="source-name" value="jdbcjcr"/>
          <property name="db-structure-type" value="isolated"/>
          
        </properties>
        </container>
        
      </workspace>
      <workspace name="ws1" >
        <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
        <properties>
          <property name="source-name" value="jdbcjcr"/>
          <property name="db-structure-type" value="isolated"/>
            
        </properties>
        </container>
        
      </workspace>
    </workspaces>
    
    

B.7.6. Multi-database Configuration

You need to configure each workspace in a repository as part of multi-database configuration.
Databases may reside on remote servers as required.

Procedure B.3. Multi-database configuration

This procedure configures two workspace which will persistent in two different databases , ws in HSQLDB and ws1 in MySQL.
  1. Configure the data containers in the org.exoplatform.services.naming.InitialContextInitializer service.
    It's the JNDI context initializer which registers (binds) naming resources (DataSources) for data containers. For example, two data containers jdbcjcr - local HSQLDB, and jdbcjcr1 - remote MySQL are shown here.
    
      <external-component-plugins>
        <target-component>org.exoplatform.services.naming.InitialContextInitializer</target-component>
        <component-plugin>
          <name>bind.datasource</name>
          <set-method>addPlugin</set-method>
          <type>org.exoplatform.services.naming.BindReferencePlugin</type>
          <init-params>
            <value-param>
              <name>bind-name</name>
              <value>jdbcjcr</value>
            </value-param>
            <value-param>
              <name>class-name</name>
              <value>javax.sql.DataSource</value>
            </value-param>
            <value-param>
              <name>factory</name>
              <value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
            </value-param>
            <properties-param>
              <name>ref-addresses</name>
              <description>ref-addresses</description>
              <property name="driverClassName" value="${all.driverClassName:org.hsqldb.jdbcDriver}"/>
              <!-- MVCC configured to prevent possible deadlocks when a global Tx is active -->
              <property name="url" value="${jdbcjcr.url:jdbc:hsqldb:file:target/temp/data/portal;hsqldb.tx=mvcc}"/>
              <property name="username" value="${jdbcjcr.username:sa}"/>
              <property name="password" value="${jdbcjcr.password:}"/>
            </properties-param>
          </init-params>
        </component-plugin>
        <component-plugin>
          <name>bind.datasource</name>
          <set-method>addPlugin</set-method>
          <type>org.exoplatform.services.naming.BindReferencePlugin</type>
          <init-params>
            <value-param>
              <name>bind-name</name>
              <value>jdbcjcr1</value>
            </value-param>
            <value-param>
              <name>class-name</name>
              <value>javax.sql.DataSource</value>
            </value-param>
            <value-param>
              <name>factory</name>
              <value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
            </value-param>
            <properties-param>
              <name>ref-addresses</name>
              <description>ref-addresses</description>
              <property name="driverClassName" value="${all.driverClassName:org.hsqldb.jdbcDriver}"/>
              <property name="url" value="${jdbcjcr1.url:jdbc:hsqldb:file:target/temp/data/jcr}"/>
              <property name="username" value="${jdbcjcr1.username:sa}"/>
              <property name="password" value="${jdbcjcr1.password:}"/>
            </properties-param>
          </init-params>
        </component-plugin>
        <!-- Unnecessary plugins not relevant to this section removed for clarity -->
      </external-component-plugins>
    
    1. Configure the following database connection parameters:
      • driverClassName, for example, "org.hsqldb.jdbcDriver", "com.mysql.jdbc.Driver", "org.postgresql.Driver"
      • url, for example, "jdbc:hsqldb:file:target/temp/data/portal", "jdbc:mysql://exoua.dnsalias.net/jcr"
      • username, for example, "sa", "exoadmin"
      • password, for example, "", "exo12321"
    According to apache DBCP configuration, there can be connection pool configuration parameters, for example org.apache.commons.dbcp.BasicDataSourceFactory
  2. Configure the repository service.
    Each workspace will be configured for its own data container. For example, two workspaces ws - jdbcjcr and ws1 - jdbcjcr1 are configured.
    
    <workspaces>
      <workspace name="ws" auto-init-root-nodetype="nt:unstructured">
        <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
        <properties>
          <property name="source-name" value="jdbcjcr"/>
          <property name="dialect" value="hsqldb"/>
          <property name="multi-db" value="true"/>
          <property name="max-buffer-size" value="200K"/>
          <property name="swap-directory" value="target/temp/swap/ws"/>   
        </properties>
        </container>
        <cache enabled="true">
          <properties>
            <property name="max-size" value="10K"/><!-- 10Kbytes -->
            <property name="live-time" value="30m"/><!-- 30 min -->
          </properties>
        </cache>
        <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
        <properties>
          <property name="index-dir" value="target/temp/index"/>
        </properties>
        </query-handler>
        <lock-manager>
        <time-out>15m</time-out><!-- 15 min -->
        <persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister">
          <properties>
          <property name="path" value="target/temp/lock/ws"/>
          </properties>
        </persister>
        </lock-manager>
      </workspace>
      <workspace name="ws1" auto-init-root-nodetype="nt:unstructured">
        <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
        <properties>
          <property name="source-name" value="jdbcjcr1"/>
          <property name="dialect" value="mysql"/>
          <property name="multi-db" value="true"/>
          <property name="max-buffer-size" value="200K"/>
          <property name="swap-directory" value="target/temp/swap/ws1"/>   
        </properties>
        </container>
        <cache enabled="true">
          <properties>
            <property name="max-size" value="10K"/>
            <property name="live-time" value="5m"/>
          </properties>
        </cache>
        <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
        <properties>
          <property name="index-dir" value="target/temp/index"/>
        </properties>
        </query-handler>
        <lock-manager>
        <time-out>15m</time-out><!-- 15 min -->
        <persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister">
          <properties>
          <property name="path" value="target/temp/lock/ws1"/>
          </properties>
        </persister>
        </lock-manager>
      </workspace>
    </workspaces>

    Parameters and their description

    source-name
    A javax.sql.DataSource name configured in InitialContextInitializer component (was sourceName prior JCR 1.9).
    dialect
    A database dialect, one of hsqldb, mysql, mysql-utf8, pgsql, oracle, oracle-oci, mssql, sybase, derby, db2, db2v8 or auto for dialect auto detection.
    multi-db
    Enable multi-database container with this parameter (set value "true").
    max-buffer-size
    Threshold value, in bytes, after which the javax.jcr.Value content is swapped to a file in a temporary storage. A swap for pending changes, for example.
    • swap-directory A path in the file system used to swap the pending changes.

B.7.7. Single-database Configuration

Configuring a single-database data container is easier than configuring a multi-database data container as only one naming resource must be configured.

Example B.13. jdbcjcr Data Container


<external-component-plugins>
    <target-component>org.exoplatform.services.naming.InitialContextInitializer</target-component>
    <component-plugin>
        <name>bind.datasource</name>
        <set-method>addPlugin</set-method>
        <type>org.exoplatform.services.naming.BindReferencePlugin</type>
        <init-params>
          <value-param>
            <name>bind-name</name>
            <value>jdbcjcr</value>
          </value-param>
          <value-param>
            <name>class-name</name>
            <value>javax.sql.DataSource</value>
          </value-param>
          <value-param>
            <name>factory</name>
            <value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
          </value-param>
          <properties-param>
            <name>ref-addresses</name>
            <description>ref-addresses</description>
            <property name="driverClassName" value="org.postgresql.Driver"/>
            <property name="url" value="jdbc:postgresql://exoua.dnsalias.net/portal"/>
            <property name="username" value="exoadmin"/>
            <property name="password" value="exo12321"/>
            <property name="maxActive" value="50"/>
            <property name="maxIdle" value="5"/>
            <property name="initialSize" value="5"/>
          </properties-param>
        </init-params>
    </component-plugin>
  </external-component-plugins>
To configure repository workspaces with one database, the multi-db parameter must be set as false.
For example, (two workspaces ws - jdbcjcr, ws1 - jdbcjcr):

Example B.14. Setting two workspaces in a single database

This step configures two persistent workspaces in one database (PostgreSQL).

<workspaces>
  <workspace name="ws" auto-init-root-nodetype="nt:unstructured">
    <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
    <properties>
      <property name="source-name" value="jdbcjcr"/>
      <property name="dialect" value="pgsql"/>
      <property name="multi-db" value="false"/>
      <property name="max-buffer-size" value="200K"/>
      <property name="swap-directory" value="target/temp/swap/ws"/>
    </properties>
    </container>
    <cache enabled="true">
    <properties>
      <property name="max-size" value="10K"/>
      <property name="live-time" value="30m"/>
    </properties>
    </cache>
    <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
    <properties>
      <property name="index-dir" value="../temp/index"/>
    </properties>
    </query-handler>
    <lock-manager>
    <time-out>15m</time-out>
    <persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister">
      <properties>
      <property name="path" value="target/temp/lock/ws"/>
      </properties>
    </persister>
    </lock-manager>
  </workspace>
  <workspace name="ws1" auto-init-root-nodetype="nt:unstructured">
    <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
    <properties>
      <property name="source-name" value="jdbcjcr"/>
      <property name="dialect" value="pgsql"/>
      <property name="multi-db" value="false"/>
      <property name="max-buffer-size" value="200K"/>
      <property name="swap-directory" value="target/temp/swap/ws1"/>
    </properties>
    </container>
    <cache enabled="true">
    <properties>
      <property name="max-size" value="10K"/>
      <property name="live-time" value="5m"/>
    </properties>
    </cache>
    <lock-manager>
    <time-out>15m</time-out>
    <persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister">
      <properties>
      <property name="path" value="target/temp/lock/ws1"/>
      </properties>
    </persister>
    </lock-manager>
  </workspace>
</workspaces>

B.7.7.1. Configuration Without DataSource

It is possible to configure the repository without binding javax.sql.DataSource in the JNDI service if you have a dedicated JDBC driver implementation with special features such as, XA transactions, statements/connections pooling and so on.

Procedure B.4. Configuring the repository without the data source

  1. Remove the configuration in InitialContextInitializer for your database and configure a new one directly in the workspace container.
  2. Remove parameter source-name and add next lines. Describe your values for a JDBC driver, database URL and username.

Connection Pooling

Ensure the JDBC driver provides connection pooling. Connection pooling is strongly recommended for use with JCR to prevent a database overload.
<workspace name="ws" auto-init-root-nodetype="nt:unstructured">
  <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
    <properties>
      <property name="dialect" value="hsqldb"/>
      <property name="driverliteral" value="org.hsqldb.jdbcDriver"/>
      <property name="url" value="jdbc:hsqldb:file:target/temp/data/portal"/>
      <property name="username" value="su"/>
      <property name="password" value=""/> 
      ......

B.7.7.2. Dynamic Workspace Creation

Workspaces can be added dynamically during runtime.
This can be performed in two steps:

Procedure B.5. Adding workspace at runtime

  1. Register a new configuration in RepositoryContainer and create a WorkspaceContainer. ManageableRepository.configWorkspace(WorkspaceEntry wsConfig).
  2. Create a new workspace ManageableRepository.createWorkspace(String workspaceName).

B.7.8. Simple and Complex queries

eXo JCR provides two ways to interact with the database,

JDBCStorageConnection
This method uses simple queries. Simple queries do not use sub queries, left or right joins. They are implemented to support maximum number of database dialects.
CQJDBCStorageConection
This method uses complex queries. Complex queries are optimized to reduce the number of database calls.
Simple queries are used if you choose org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer.
<workspaces>
  <workspace name="ws" auto-init-root-nodetype="nt:unstructured">
    <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
    ...
  </workspace>
</worksapces>
Complex queries are used if you chose org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer.
<workspaces>
  <workspace name="ws" auto-init-root-nodetype="nt:unstructured">
    <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
    ...
  </workspace>
</worksapces>

B.7.9. Force Query Hints

Some databases, such as Oracle and MySQL, support hints to increase query performance. The eXo JCR has separate Complex Query implementations for the Oracle database dialect, which uses query hints to increase performance for few important queries.
To enable this option, use the following configuration property:
<workspace name="ws" auto-init-root-nodetype="nt:unstructured">
  <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
    <properties>
      <property name="dialect" value="oracle"/>
      <property name="force.query.hints" value="true" />
      ......

Note

Query hints are only used for Complex Queries with the Oracle dialect. For all other dialects this parameter is ignored.

B.7.10. Notes for Microsoft Windows Users

The current configuration of eXo JCR uses Apache DBCP connection pool (org.apache.commons.dbcp.BasicDataSourceFactory).
It is possible to set a high value for the maxActive parameter in the configuration.xml file. This creates a high use of TCP/IP ports from a client machine inside the pool, for example, JDBC driver. As a result, the data container can throw exceptions like Address already in use.
To solve this problem, you must configure the client's machine networking software to use shorter timeouts for open TCP/IP ports.
The solution is to edit two registry keys within the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters node. Both of these keys are unset by default.

Procedure B.6. Set the registry keys

  1. Set the MaxUserPort registry key to =dword:00001b58. This sets the maximum of open ports to 7000 or higher (the default is 5000).
  2. Set theTcpTimedWaitDelayregistry key to =dword:0000001e. This sets TIME_WAIT parameter to 30 seconds (the default is 240).

Example B.15. Sample Registry File

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"MaxUserPort"=dword:00001b58
"TcpTimedWaitDelay"=dword:0000001e

B.8. External Value Storages

JCR values are stored in the Workspace Data container by default. The eXo JCR offers an additional option of storing JCR values separately from the Workspace Data container which can help keep Binary Large Objects (BLOBs) separate.
Tree-based storage is recommended in most cases. For example, if you run an application on Amazon EC2 the S3 option is useful for architecture. Simple flat storage is good in speed of creation/deletion of values, it might be a compromise for a small storages.
Value storage configuration is a part of Repository configuration.

B.8.1. Tree File Value Storage

Tree File Value Storage holds values in tree-like file system files. Path property points to the root directory where the files are stored.
This is a recommended type of external storage because it can contain large amount of files limited only by disk/volume free space.
However, using Tree File Value Storage can result in a higher time on value deletion, due to the removal of unused tree-nodes.

Example B.16. Tree File Value Storage Configuration

     <workspace name="backup">
      <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
       <properties>
        <property name="source-name" value="jdbcjcr" />
        <property name="multi-db" value="false" />
        <property name="update-storage" value="false" />
        <property name="max-buffer-size" value="200k" />
        <property name="swap-directory" value="../temp/swap/backup" />
       </properties>
       <value-storages>
        <value-storage id="draft" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
<Comment #1>
         <properties>
          <property name="path" value="../temp/values/backup" />
         </properties>
<Comment #2>
         <filters>
          <filter property-type="Binary" />
         </filters>
        </value-storage>
       </value-storages>
      </container>
      <initializer class="org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer">
       <properties>
        <property name="root-nodetype" value="nt:unstructured" />
       </properties>
      </initializer>
      <cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl">
       <properties>
        <property name="max-size" value="10k" />
        <property name="live-time" value="1h" />
       </properties>
      </cache>
      <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
       <properties>
        <property name="index-dir" value="../temp/jcrlucenedb/backup" />
       </properties>
      </query-handler>
     </workspace>
    </workspaces>
   </repository>
  </repositories>
</repository-service>
Comment #1: id denotes the value storage unique identifier. It is used for linking with properties stored in a workspace container.
Comment #2: path denotes the location of value files.
Each file value storage has filters for incoming values. A filter can match values by property-type, property-name, ancestor-path. It can also match the size of values stored by min-value-size in bytes.
In the example a filter with property-type and min-value-size has been used. This results in storage for binary values with size greater of 1MB.

Note

It is recommended that properties with large values are stored in file value storage only.

Example B.17. Value storage for large files

This example shows a value storage with different locations for large files, for example, min-value-size of a 20Mb-sized filter.
A value storage uses ORed logic in the process of filter selection. This means the first filter in the list is called first and if it is not matched the next filter is called, and so on.
In this example a value matches the 20MB filter min-value-size and is stored in the path data/20Mvalues. All other filters are stored in data/values.
<value-storages>
  <value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
    <properties>
      <property name="path" value="data/20Mvalues"/>
    </properties>
    <filters>
      <filter property-type="Binary" min-value-size="20M"/>
    </filters>
  <value-storage>
  <value-storage id="Storage #2" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
    <properties>
      <property name="path" value="data/values"/>
    </properties>
    <filters>
      <filter property-type="Binary" min-value-size="1M"/>
    </filters>
  <value-storage>
<value-storages>

B.8.2. Disabling Value Storage

The JCR allows you to disable value storage by adding the following property into its configuration.
<property name="enabled" value="false" />

Warning

It is recommended that this functionality be used for internal and testing purpose only, and with caution, as all stored values will be inaccessible.

B.9. Workspace Data Container

Each Workspace of the JCR has its own persistent storage to hold that workspace's items data. The eXo JCR can be configured so that it can use one or more workspaces that are logical units of the repository content.
The physical data storage mechanism is configured using mandatory container element. The type of container is described in the attribute class by specifying the fully qualified name of the org.exoplatform.services.jcr.storage subclass.

Example B.18. Physical Data Storage Configuration

<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
  <properties>
    <property name="source-name" value="jdbcjcr1"/>
    <property name="dialect" value="hsqldb"/>
    <property name="multi-db" value="true"/>
    <property name="max-buffer-size" value="200K"/>
    <property name="swap-directory" value="target/temp/swap/ws"/>
    <property name="lazy-node-iterator-page-size" value="50"/>
    <property name="acl-bloomfilter-false-positive-probability" value="0.1d"/>
    <property name="acl-bloomfilter-elements-number" value="1000000"/>
    <property name="check-sns-new-connection" value="false"/>
    <property name="batch-size" value="1000"/>
  </properties>

Standard Workspace Data Container Properties

max-buffer-size
Default value is 200K.
The maximum buffer size for the container. If a value size is greater than this setting, it is spooled to a temporary file as defined in <swap-directory>.
swap-directory
Default value is java.io.tmpdir.
Specifies the location where the value is spooled, if no value storage is configured but a max-buffer-size is exceeded.
lazy-node-iterator-page-size
Default value is 100.
Specifies the page size and the number of nodes that are retrieved from persistent storage at once.
acl-bloomfilter-false-positive-probability
Default value is 0.1d.
Specifies the ACL Bloom-filter desired false positive probability. Range [0..1].
acl-bloomfilter-elements-number
Default value is 1000000.
Specifies the expected number of ACL-elements in the Bloom-filter.

Note

Bloom filters are used to avoid reading nodes that do not have ACL. Infinispan is the only cache implementation that currently supports Bloom filters. See http://en.wikipedia.org/wiki/Bloom_filter for an overview of bloom filters.
check-sns-new-connection
Default value is false.
Specifies whether to create a connection for checking if an older same-name sibling exists.
max-descendant-nodes-allowed-on-move
Default value is 100.
Specifies the maximum amount of descendant nodes allowed before determining whether descendant items are included into the changes log. This allows best possible performance, regardless of the total amount of sub-nodes.
This parameter is only used when trigger-events-for-descendants-on-move or trigger-events-for-descendants-on-rename is not set.
trigger-events-for-descendants-on-rename
Specifies whether each descendant item must be included in the changes log in case of a rename.
When this parameter is not set, the application will rely on the max-descendant-nodes-allowed-on-move parameter to handle whether or not descendant items are added to the changes log. If this parameter is not set but the parameter trigger-events-for-descendants-on-move is set, it will have the same value.
If set to false, performance on rename operations will increase on source parent nodes with a large number of sub-nodes but will decrease on parent nodes with a small number of sub-nodes.
If set to true, performance will decrease for a large number of sub-nodes and increase for a small number of sub-nodes.
trigger-events-for-descendants-on-move
When this parameter is not set, the application will rely on the max-descendant-nodes-allowed-on-move parameter to handle whether or not descendant items are added to the changes log.
Specifies whether each descendant item must be included in the change logs, in case of a move.
If set to false, performance on move operations will increase on source parent nodes with a large number of sub-nodes but will decrease on parent nodes with a small number of sub-nodes.
If set to true, performance will decrease for a large number of sub-nodes and increase for a small number of sub-nodes.
The eXo JCR has a JDBC-based relational database which is production ready Workspace Data Container.

JDBC Workspace Data Container Properties

source-name
Mandatory parameter, which specifies the JDBC data source name (registered in JDNI by InitialContextInitializer).
dialect
Default value is auto.
The database dialect can be one of the following values: "auto", "hsqldb", "h2", "mysql", "mysql-myisam", "mysql-utf8", "mysql-myisam-utf8", "pgsql", "pgsql-scs", "oracle", "oracle-oci", "mssql", "sybase", "derby", "db2" ,"db2-mys", "db2v8" hsqldb, mysql, mysql-utf8, pgsql, oracle, oracle-oci.
db-structure-type

Note

This parameter supersedes multi-db.
Mandatory parameter, which specifies the structure of the database container. Supported values include: ""isolated", "multi", and "single".
db-tablename-suffix
Specifies the workspace name appended to tables. If db-structure-type is set to isolated, tables used by the repository service have the following values:
  • JCR_I${db-tablename-suffix} for items.
  • JCR_V${db-tablename-suffix} for values.
  • JCR_R${db-tablename-suffix} for references.
Workspace Data Container may support external storages for javax.jcr.Value (for example, BLOB values) using the optional element value-storages.
The Data Container attempts to read or write a value using the underlying value storage plug-in if the filter criteria matches the current property.

Example B.19. External Value Storage Configuration

<value-storages>
  <value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
    <properties>
      <property name="path" value="data/values"/>
    </properties>
    <filters>
     <filter property-type="Binary" min-value-size="1M"/><!-- Values greater than 1Mbyte -->
    </filters>
<!-- content removed for readability -->
</value-storages>
value-storage is the subclass of org.exoplatform.services.jcr.storage.value.ValueStoragePlugin and properties are optional plug-in specific parameters.
filters: Each file value storage can have the filter(s) for incoming values. If there are several filter criteria, they all have to match (AND-Condition).
A filter can match values by property type (property-type), property name (property-name), ancestor path (ancestor-path) and/or the size of values stored (min-value-size, e.g. 1M, 4.2G, 100 (bytes)).
In a code sample, we use a filter with property-type and min-value-size only. That means that the storage is only for binary values whose size is greater than 1Mbyte.
It is recommended that you store properties with large values in a file value storage only.

B.10. Configuring the Cluster

B.10.1. Launching Cluster

B.10.1.1. Configuring JCR to use external configuration

  1. Create a new configuration file, for example, exo-jcr-configuration.xml as follows:

    Example B.20. Creating an external Configuration file

    <repository-service default-repository="repository1">
       <repositories>
          <repository name="repository1" system-workspace="ws1" default-workspace="ws1">
             <security-domain>exo-domain</security-domain>
             <access-control>optional</access-control>
             <authentication-policy>org.exoplatform.services.jcr.impl.core.access.JAASAuthenticator</authentication-policy>
             <workspaces>
                <workspace name="ws1">
                   <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
                      <properties>
                         <property name="source-name" value="jdbcjcr" />
                         <property name="dialect" value="oracle" />
                         <property name="multi-db" value="false" />
                         <property name="update-storage" value="false" />
                         <property name="max-buffer-size" value="200k" />
                         <property name="swap-directory" value="../temp/swap/production" />
                      </properties>
                      <value-storages>
                      <!-- Comment #1 -->
                      </value-storages>
                   </container>
                   <initializer class="org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer">
                      <properties>
                         <property name="root-nodetype" value="nt:unstructured" />
                      </properties>
                   </initializer>
                   <cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache">
                   <!-- Comment #2 -->     
                   </cache>
                   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
                   <!-- Comment #3 -->
                   </query-handler>
                   <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl">
                   <!-- Comment #4 --> 
                   </lock-manager>
                </workspace>
                <workspace name="ws2">
                            ...
                </workspace>
                <workspace name="wsN">
                            ...
                </workspace>
             </workspaces>
          </repository>
       </repositories>
    </repository-service>
    Comment #1: Configure the Value Storage
    Comment #2 Configure cache
    Comment #3: Configure Indexer
    Comment #4: Configure Lock Manager
  2. Update the RepositoryServiceConfiguration configuration in the exo-configuration.xml to reference your file.
    <component>
       <key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key>
       <type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type>
       <init-params>
          <value-param>
             <name>conf-path</name>
             <description>JCR configuration file</description>
             <value>exo-jcr-configuration.xml</value>
          </value-param>
       </init-params>
    </component>

B.10.2. Requirements

B.10.2.1. Environment requirements

  • Every node of the cluster must have the same mounted Network File System (NFS) with the read and write permissions.
  • Every node of cluster must use the same database.
  • The same clusters on different nodes must have the same names.
    For example, if the Indexer cluster in the production workspace on the first node is named production_indexer_cluster, then indexer clusters in the production workspace on all other nodes must also be named production_indexer_cluster.

B.10.2.2. Configuration Guidelines

The configuration of every workspace in the repository must contain the following elements:

Example B.21. Value Storage configuration

<value-storages>
   <value-storage id="system" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
      <properties>
         <property name="path" value="/mnt/tornado/temp/values/production" />    <!--path within NFS where ValueStorage will hold it's data-->
      </properties>
      <filters>
         <filter property-type="Binary" />
      </filters>
   </value-storage>
</value-storages>

Example B.22. Cache Configuration

<cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache">
   <properties>
      <property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-data.xml" />     <!--    path to JBoss Cache configuration for data storage -->
      <property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />                     <!--    path to JGroups configuration -->
      <property name="jbosscache-cluster-name" value="JCR_Cluster_cache_production" />                   <!--    JBoss Cache data storage cluster name -->
      <property name="jgroups-multiplexer-stack" value="true" />
   </properties>
</cache>

Example B.23. Indexer Configuration

<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
   <properties>
      <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
      <property name="index-dir" value="/mnt/tornado/temp/jcrlucenedb/production" />                       <!--    path within NFS where ValueStorage will hold it's data -->
      <property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-indexer.xml" />    <!--    path to JBoss Cache configuration for indexer -->
      <property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />                       <!--    path to JGroups configuration -->
      <property name="jbosscache-cluster-name" value="JCR_Cluster_indexer_production" />                   <!--    JBoss Cache indexer cluster name -->
      <property name="jgroups-multiplexer-stack" value="true" />
   </properties>
</query-handler>

Example B.24. Lock Manager Configuration

<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl">
   <properties>
      <property name="time-out" value="15m" />
      <property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-lock.xml" />       <!--    path to JBoss Cache configuration for lock manager -->
      <property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />                       <!--    path to JGroups configuration -->
      <property name="jgroups-multiplexer-stack" value="true" />
      <property name="jbosscache-cluster-name" value="JCR_Cluster_lock_production" />                      <!--    JBoss Cache locks cluster name -->
                     
      <property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks_production"/>                   <!--    the name of the DB table where lock's data will be stored -->
      <property name="jbosscache-cl-cache.jdbc.table.create" value="true"/>
      <property name="jbosscache-cl-cache.jdbc.table.drop" value="false"/>
      <property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_production_pk"/>
      <property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn"/>
      <property name="jbosscache-cl-cache.jdbc.node.column" value="node"/>
      <property name="jbosscache-cl-cache.jdbc.parent.column" value="parent"/>
      <property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr"/>
   </properties>
</lock-manager>

B.11. Configuring JBoss Cache

B.11.1. Indexer lock manager and data container configuration

Indexer, lock manager and data container uses instances of the JBoss Cache product for caching in clustered environment. So, every element has its own transport and has to be configured correctly.
Workspaces have similar configuration with different cluster names and parameters. The simplest way to configure is to define a configuration file for each component in each workspace.
<property name="jbosscache-configuration" value="conf/standalone
        /test-jbosscache-lock-db1-ws1.xml" />
To configure workspaces, eXo JCR offers a template-based configuration for JBoss Cache instances. You can have one template for each Lock Manager, Indexer and Data container.
To use these templates define the map of substitution parameters in a main configuration file by using define ${jbosscache-<parameter name>} inside xml template and list correct value in JCR configuration file just below jbosscache-configuration.

Example B.25.  Template for configuring workspaces

...
<clustering mode="replication" clusterName="${jbosscache-cluster-name}">
  <stateRetrieval timeout="20000" fetchInMemoryState="false" />
...

Example B.26.  JCR configuration file

...
<property name="jbosscache-configuration" value="jar:/conf/portal/jbosscache-lock.xml" />
<property name="jbosscache-cluster-name" value="JCR-cluster-locks-db1-ws" />
...

B.11.2. JGroups configuration

JGroups is used by JBoss Cache for network communication and transport in a clustered environment. If the property is defined in component configuration, it is injected in the JBoss Cache instance on start up.
<property 
name="jgroups-configuration" value="your/path/to/modified-udp.xml" 
/>
Lock manager, data container and query handler component for each workspace requires its own clustered environment with unique names.
Each cluster should perform multi-casts on a separate port. This configuration leads to much unnecessary overhead on cluster. JGroups provides a multiplexer feature providing ability to use one single channel for set of clusters.
The multiplexer reduces network overheads and increase performance and stability of application. To enable multiplexer stack, you should define appropriate configuration file (upd-mux.xml is pre-shipped with eXo JCR) and set jgroups-multiplexer-stack as true.
<property 
name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />
<property name="jgroups-multiplexer-stack" value="true" 
/>

B.11.3. Sharing JBoss Cache instances

A single JBoss Cache instance consumes large resources, and the default setup has an instance for the indexer, the lock manager and the data container on each workspace. So, an environment that uses multiple workspace can benefit from sharing a JBoss Cache instance between several instances of the same type, for example, the lock manager instance. .
Sharing feature is disabled by default. To enable sharing at the component configuration level, you need to set the jbosscache-shareable property to true:

Example B.27. Configuring sharing between JBoss cache instances

<property name="jbosscache-shareable" value="true" />
This feature allows the JBoss Cache instance, that is used by a component to be re-used by another components of the same type with the same JBoss Cache configuration. So, all the parameters of type jbosscache-<PARAM_NAME> must be identical between the components of same type of different workspaces. Therefore, if you use the same values for the parameters in each workspace, you need three JBoss Cache instances, one instance each for the indexer, lock manager and data container running at once. This approach frees resource significantly.

Note

In eviction configuration, reusing JBoss cache instance is handled differently.

B.11.4. Shipped JBoss Cache configuration templates

The eXo JCR implementation is shipped with ready-to-use JBoss Cache configuration templates for JCR's components. They are located in $JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/jbosscache directory, inside either the cluster or local directory.

B.11.4.1. Data container template

The data container template is jbosscache-data.xml.

Example B.28. Data container template

<?xml version="1.0" encoding="UTF-8"?>
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">

   <locking useLockStriping="false" concurrencyLevel="50000" lockParentForChildInsertRemove="false"
      lockAcquisitionTimeout="20000" />

   <clustering mode="replication" clusterName="${jbosscache-cluster-name}">
      <stateRetrieval timeout="20000" fetchInMemoryState="false" />
      <jgroupsConfig multiplexerStack="jcr.stack" />
      <sync />
   </clustering>

   <!-- Eviction configuration -->
   <eviction wakeUpInterval="5000">
      <default algorithmClass="org.jboss.cache.eviction.LRUAlgorithm"
         actionPolicyClass="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.ParentNodeEvictionActionPolicy"
         eventQueueSize="1000000">
         <property name="maxNodes" value="1000000" />
         <property name="timeToLive" value="120000" />
      </default>
   </eviction>
</jbosscache>

Template Variables

jbosscache-cluster-name
Unique cluster name.

B.11.4.2. Lock manager template

The lock manager template is jbosscache-lock.xml.

Example B.29. Lock manager template

<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">

   <locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false"
      lockAcquisitionTimeout="20000" />

   <loaders passivation="false" shared="true">
   <!-- All the data of the JCR locks needs to be loaded at startup -->
   <preload>
     <node fqn="/" />
  </preload>   
      <!--
For another cache-loader class you should use another template with
cache-loader specific parameters
-->
      <loader class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader" async="false" fetchPersistentState="false"
         ignoreModifications="false" purgeOnStartup="false">
         <properties>
            cache.jdbc.table.name=${jbosscache-cl-cache.jdbc.table.name}
            cache.jdbc.table.create=${jbosscache-cl-cache.jdbc.table.create}
            cache.jdbc.table.drop=${jbosscache-cl-cache.jdbc.table.drop}
            cache.jdbc.table.primarykey=${jbosscache-cl-cache.jdbc.table.primarykey}
            cache.jdbc.fqn.column=${jbosscache-cl-cache.jdbc.fqn.column}
            cache.jdbc.fqn.type=${jbosscache-cl-cache.jdbc.fqn.type}
            cache.jdbc.node.column=${jbosscache-cl-cache.jdbc.node.column}
            cache.jdbc.node.type=${jbosscache-cl-cache.jdbc.node.type}
            cache.jdbc.parent.column=${jbosscache-cl-cache.jdbc.parent.column}
            cache.jdbc.datasource=${jbosscache-cl-cache.jdbc.datasource}
</properties>
      </loader>
   </loaders>
</jbosscache>

Note

To prevent inconsistency related to the lock data, ensure that your cache loader is org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader and your database engine is transactional.

Template Variables

jbosscache-cluster-name
Table name.
jbosscache-cl-cache.jdbc.table.name
Unique cluster name.
jbosscache-cl-cache.jdbc.table.create
Indicates whether to create the able during startup. Value can be true or false. If true, the table is created if it doesn't already exist. The default value is true.
jbosscache-cl-cache.jdbc.table.drop
Indicates whether to drop the table during shutdown.Value can be true or false. The default value is true.
jbosscache-cl-cache.jdbc.table.primarykey
The name of the primary key for the table.
jbosscache-cl-cache.jdbc.fqn.column
FQN column name. The default value is 'fqn'.
jbosscache-cl-cache.jdbc.fqn.type
FQN column type. The default value is varchar(255).
jbosscache-cl-cache.jdbc.node.column
Node contents column name. The default value is node.
jbosscache-cl-cache.jdbc.node.type
node contents column type. The default value is blob. This type must specify a valid binary data type for the database used.
jbosscache-cl-cache.jdbc.parent.column
Parent column name. The default value is parent.
jbosscache-cl-cache.jdbc.datasource
JNDI name of the DataSource.

B.11.4.3. Query Handler Template

The query handler template is called jbosscache-indexer.xml.

Example B.30. Indexer template

<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">

   <locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false"
      lockAcquisitionTimeout="20000" />
   <!-- Configure the TransactionManager -->
   <transaction transactionManagerLookupClass="org.jboss.cache.transaction.JBossStandaloneJTAManagerLookup" />

   <clustering mode="replication" clusterName="${jbosscache-cluster-name}">
      <stateRetrieval timeout="20000" fetchInMemoryState="false" />
      <sync />
   </clustering>
</jbosscache>

Template Variable

jbosscache-cluster-name
Unique cluster name.

B.12. LockManager

The LockManager stores lock objects. It can lock or release objects as required. It is also responsible for removing stale locks. The parameter to remove stale locks is configured with time-out property.
The LockManager in the portal is implemented with org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl.

B.12.1. CacheableLockManagerImpl

CacheableLockManagerImpl stores lock objects in JBoss-cache, which implements JDBCCacheLoader to store locks in a database. Locks can be replicated. Locks can affect an entire cluster rather than a single node.
JBoss-cache has JDBCCacheLoader, so locks are stored in the database.
You can enable LockManager by adding lock-manager-configuration to workspace-configuration.

Example B.31. Enabling Lockmanager

<workspace name="ws">
   ...
   <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl">
      <properties>
         <property name="time-out" value="15m" />
         ...
      </properties>
   </lock-manager>               
   ...
</workspace>
The parameter time-out represents interval to remove expired Locks. LockRemover separates threads, that periodically ask LockManager to remove stale locks.

B.12.2. JBoss Cache Configuration

A simple method to configure the LockManager is to place the JBoss Cache configuration file path into CacheableLockManagerImpl class. This method is useful to configure a single LockManager, for a specific purpose.

Note

This is not an efficient method for configuring the LockManager as it requires a JBoss Cache configuration file for each LockManager configuration in each workspace of each repository. The configuration set up can subsequently become quite difficult to manage.

B.12.3. Configuration of JBoss Cache for LockManager

A simple LockManager configuration is shown here.

Example B.32. LockManager Configuration file test-jbosscache-lock.xml

<?xml version="1.0" encoding="UTF-8"?>
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">

   <locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false"
      lockAcquisitionTimeout="20000" />

   <clustering mode="replication" clusterName="${jbosscache-cluster-name}">
      <stateRetrieval timeout="20000" fetchInMemoryState="false" />
      <sync />
   </clustering>

   <loaders passivation="false" shared="true">
      <!-- All the data of the JCR locks needs to be loaded at startup -->
      <preload>
         <node fqn="/" />
      </preload>  
      <!--
      For another cache-loader class you should use another template with
      cache-loader specific parameters
      ->
      <loader class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader" async="false" fetchPersistentState="false"
         ignoreModifications="false" purgeOnStartup="false">
         <properties>
            cache.jdbc.table.name=${jbosscache-cl-cache.jdbc.table.name}
            cache.jdbc.table.create=${jbosscache-cl-cache.jdbc.table.create}
            cache.jdbc.table.drop=${jbosscache-cl-cache.jdbc.table.drop}
            cache.jdbc.table.primarykey=${jbosscache-cl-cache.jdbc.table.primarykey}
            cache.jdbc.fqn.column=${jbosscache-cl-cache.jdbc.fqn.column}
            cache.jdbc.fqn.type=${jbosscache-cl-cache.jdbc.fqn.type}
            cache.jdbc.node.column=${jbosscache-cl-cache.jdbc.node.column}
            cache.jdbc.node.type=${jbosscache-cl-cache.jdbc.node.type}
            cache.jdbc.parent.column=${jbosscache-cl-cache.jdbc.parent.column}
            cache.jdbc.datasource=${jbosscache-cl-cache.jdbc.datasource}
         </properties>
      </loader>
   </loaders>
</jbosscache>

Note

To prevent any consistency issue regarding the lock data,please ensure that your cache loader is org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader and your database engine is transactional.

B.12.4. LockManager Configuration Template

All configurable parameters are filled by templates and replaced by LockManagers configuration parameters:

Example B.33. LockManager configuration template

<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.infinispan.ISPNCacheableLockManagerImpl">
   <properties>
      <property name="time-out" value="15m" />
      <property name="infinispan-configuration" value="conf/standalone/cluster/test-infinispan-lock.xml" />
      <property name="jgroups-configuration" value="udp-mux.xml" />
      <property name="infinispan-cluster-name" value="JCR-cluster" />
      <property name="infinispan-cl-cache.jdbc.table.name" value="lk" />
      <property name="infinispan-cl-cache.jdbc.table.create" value="true" />
      <property name="infinispan-cl-cache.jdbc.table.drop" value="false" />
      <property name="infinispan-cl-cache.jdbc.id.column" value="id" />
      <property name="infinispan-cl-cache.jdbc.data.column" value="data" />
      <property name="infinispan-cl-cache.jdbc.timestamp.column" value="timestamp" />
      <property name="infinispan-cl-cache.jdbc.datasource" value="jdbcjcr" />
      <property name="infinispan-cl-cache.jdbc.connectionFactory" value="org.exoplatform.services.jcr.infinispan.ManagedConnectionFactory" />
   </properties>
</lock-manager>

Configuration requirements

  • infinispan-cl-cache.jdbc.id.type, infinispan-cl-cache.jdbc.data.type and infinispan-cl-cache.jdbc.timestamp.type are injected in the Infinispan configuration into the property respectively idColumnType, dataColumnType and timestampColumnType.
    You can set the data types according to your database type or set it as AUTO or do not set at all. Data type is detected automatically.

B.12.5. Creating udp-mux.xml

jgroups-configuration is moved to separate the configuration file - udp-mux.xml.
The udp-mux.xml file is a common JGroup configuration file for all components such as QueryHandler, Cache, and LockManager, but you can create your own configuration.

Example B.34. udp-mux.xml

<config>
    <UDP
         singleton_name="JCR-cluster" 
         mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}"
         mcast_port="${jgroups.udp.mcast_port:45588}"
         tos="8" 
         ucast_recv_buf_size="20000000"
         ucast_send_buf_size="640000" 
         mcast_recv_buf_size="25000000" 
         mcast_send_buf_size="640000" 
         loopback="false"
         discard_incompatible_packets="true" 
         max_bundle_size="64000" 
         max_bundle_timeout="30"
         use_incoming_packet_handler="true" 
         ip_ttl="${jgroups.udp.ip_ttl:2}"
         enable_bundling="false" 
         enable_diagnostics="true"
         thread_naming_pattern="cl" 

         use_concurrent_stack="true" 

         thread_pool.enabled="true" 
         thread_pool.min_threads="2"
         thread_pool.max_threads="8" 
         thread_pool.keep_alive_time="5000" 
         thread_pool.queue_enabled="true"
         thread_pool.queue_max_size="1000"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false" 
         oob_thread_pool.queue_max_size="100" 
         oob_thread_pool.rejection_policy="Run" />

    <PING timeout="2000"<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.2.xsd">
    <UDP
         singleton_name="JCR-cluster" 
         mcast_port="${jgroups.udp.mcast_port:45588}"
         tos="8"
         ucast_recv_buf_size="20M"
         ucast_send_buf_size="640K"
         mcast_recv_buf_size="25M"
         mcast_send_buf_size="640K"
         loopback="true"
         max_bundle_size="64K"
         max_bundle_timeout="30"
         ip_ttl="${jgroups.udp.ip_ttl:8}"
         enable_bundling="true"
         enable_diagnostics="true"
         thread_naming_pattern="cl"

         timer_type="old"
         timer.min_threads="4"
         timer.max_threads="10"
         timer.keep_alive_time="3000"
         timer.queue_max_size="500"

         thread_pool.enabled="true"
         thread_pool.min_threads="2"
         thread_pool.max_threads="8"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="true"
         thread_pool.queue_max_size="10000"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="Run"/>

    <PING timeout="2000"
            num_initial_members="20"/>
    <MERGE2 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD_ALL/>
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 xmit_interval="1000"
                    xmit_table_num_rows="100"
                    xmit_table_msgs_per_row="2000"
                    xmit_table_max_compaction_time="30000"
                    max_msg_batch_size="500"
                    use_mcast_xmit="false"
                    discard_delivered_msgs="true"/>
    <UNICAST  xmit_interval="2000"
              xmit_table_num_rows="100"
              xmit_table_msgs_per_row="2000"
              xmit_table_max_compaction_time="60000"
              conn_expiry_timeout="60000"
              max_msg_batch_size="500"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"
                view_bundling="true"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
    <RSVP resend_interval="2000" timeout="10000"/>
    <pbcast.STATE_TRANSFER />
    <!-- pbcast.FLUSH  /-->
</config>

B.12.6. FQN type and node type in different Databases

Table B.5. Data Types in Different Databases

DataBase name Node data type FQN data type
default BLOB VARCHAR(512)
HSSQL OBJECT VARCHAR(512)
MySQL LONGBLOB VARCHAR(512)
ORACLE BLOB VARCHAR2(512)
PostgreSQL bytea VARCHAR(512)
MSSQL VARBINARY(MAX) VARCHAR(512)
DB2 BLOB VARCHAR(512)
Sybase IMAGE VARCHAR(512)

B.12.7. Lock Migration

There are three methods for lock migration.

Lock Migration Methods

New Shareable Cache feature is not used and all locks are kept after migration.

Procedure B.7. Shareable Cache is not used and locks are kept

  1. Ensure that the same lock tables are used in configuration.
  2. Start the server.
New Shareable Cache feature is not used and all locks are removed after migration.

Procedure B.8. Shareable Cache is not used and locks are removed

  1. Ensure that the same lock tables is used in configuration.
  2. Start the sever with system property -Dorg.exoplatform.jcr.locks.force.remove=true.
  3. Stop the server
  4. Start the server without the system property -Dorg.exoplatform.jcr.locks.force.remove.
New Shareable Cache feature is used and all locks are removed after migration.

Procedure B.9. Shareable Cache is used

  1. Start the sever with system property -Dorg.exoplatform.jcr.locks.force.remove=true.
  2. Stop the server.
  3. Start the server without system property -Dorg.exoplatform.jcr.locks.force.remove.
  4. Optional

    Manually remove old tables for lock.

B.13. JCR Indexing

JCR offers indexing strategies for standalone and clustered environments. This ensure that JCR use the advantages of running in a single JVM and ensure efficient use of resources available in cluster.
JCR uses Lucene library as underlying search and indexing engine, but it has several limitations that greatly reduce possibilities and limits the usage of cluster advantages. So, eXo JCR offers two strategies that are suitable for the use-cases. These use-cases are clustered with shared index and local indexes.

B.13.1. Standalone Index

Standalone strategy provides a stack of indexes for greater performance within a single JVM.
Diagram explaining the Standalone Index.

Figure B.3. Standalone Index Diagram

Standalone Index combines in-memory buffer index directory with delayed file-system flushing. This index is called Volatile and it is invoked in searches. Under specific conditions volatile index is flushed to the persistent storage as new index directory. This allows to achieve great results for write operations.

B.13.2. Local Index

Clustered implementation with local indexes combines in-memory buffer index directory with delayed file-system flushing. This index is called Volatile and is invoked in searches.Under specific conditions volatile index is flushed to the persistent storage (file system) as new index directory. This enables high performance for write operations.
Diagram explaining the Local Index, which has a local file system for each JCR.

Figure B.4. Local Index Diagram

Clustered Index is designed for clustered environment. It has additional mechanisms for data delivery within cluster.
Text extraction and content operations, such as write operation are done on the same node. The documents (Lucene term that means block of data ready for indexing) prepared are replicated withing cluster nodes and processed by local indexes. So each cluster instance has the same index content
. When new node joins the cluster, the index is created.

Warning

To create the index, you can copy the index manually but this is not intended for use.
If no initial index is found JCR uses automated scenarios. They are controlled via configuration parameter index-recovery-mode. This parameter does the re-indexing from database or copying from another cluster node.

Note

Due to certain reasons having a multiple index copies on each instance is costly. So shared index is used instead.

B.13.3. Shared Index

Diagram explaining a shared index, which has a shared file system across all JCR instances.

Figure B.5. Shared Index Diagram

Shared indexing combines advantages of in-memory index and shared persistent index providing near real time search capabilities. This strategy allows nodes to index data in their own volatile (in-memory) indexes, but persistent indexes are managed by single coordinator node.
Each cluster instance has a read access for shared index to perform queries combining search results found in the in-memory index. For example, a shared folder must be configured in your system environment, which is mounted NFS folder.
In rare instances, this strategy can have different volatile indexes within cluster instances for a fraction of time, and in a few seconds the index is updated.
Shared index is consistent, stable, and slow. Local index is fast and takes time for re-synchronization, when cluster node is leaving a cluster for a small period of time. RSync-based index solves this problem along with local file system advantages in term of speed.

B.13.4. RSync-based Index

Diagram explaining a R-Sync index.

Figure B.6. RSync-based Index Diagram

RSync-based indexing is the same as shared indexing, but stores actual data on local file system, instead of shared. This triggers a synchronization job, that works on the level of file blocks, synchronizing modified data.
The Coordinator node in the cluster modifies index files. When data persists, the corresponding command is evoked and synchronization jobs start over the cluster.

B.13.5. Query-handler configuration

Example B.35. Sample configuration file

<workspace name="ws">
   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
         <property name="index-dir" value="shareddir/index/db1/ws" />
         <property name="changesfilter-class"
            value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
         <property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
         <property name="jgroups-configuration" value="udp-mux.xml" />
         <property name="jgroups-multiplexer-stack" value="true" />
         <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" />
         <property name="max-volatile-time" value="60" />
         <property name="rdbms-reindexing" value="true" />
         <property name="reindexing-page-size" value="1000" />
         <property name="index-recovery-mode" value="from-coordinator" />
         <property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" />
          <property name="indexing-thread-pool-size" value="16" />
      </properties>
   </query-handler>
</workspace>

B.13.5.1. Configuration properties

Table B.6. 

Property name Description
index-dir path to index
changesfilter-class template of JBoss-cache configuration for all query-handlers in repository
jbosscache-configuration template of JBoss-cache configuration for all query-handlers in repository
jgroups-configuration jgroups-configuration is template configuration for all components (search, cache, locks) [Add link to document describing template configurations]
jgroups-multiplexer-stack [TODO about jgroups-multiplexer-stack - add link to JBoss doc]
jbosscache-cluster-name cluster name (must be unique)
max-volatile-time max time to live for Volatile Index
rdbms-reindexing indicate that need to use rdbms reindexing mechanism if possible, the default value is true
reindexing-page-size maximum amount of nodes which can be retrieved from storage for re-indexing purpose, the default value is 100
index-recovery-mode If the parameter has been set to from-indexing, so a full indexing will be automatically launched (default behavior), if the parameter has been set to from-coordinator, the index will be retrieved from coordinator
index-recovery-filter Defines implementation class or classes of RecoveryFilters, the mechanism of index synchronization for Local Index strategy.
async-reindexing Controls the process of re-indexing on JCR's startup. If this flag is set, indexing will be launched asynchronously, without blocking the JCR. Default is "false".
indexing-thread-pool-size Defines the total amount of indexing threads.
max-volatile-size The maximum volatile index size in bytes until it is written to disk. The default value is 1048576 (1MB).

B.13.5.2. Improve Query Performance with postgreSQL and rdbms-reindexing

The performance of the queries used while indexing can be improved by using postgreSQL parameter and setting rdbms-reindexing parameter value as true.

Procedure B.10. Improve query performance

  1. Set the parameter enable_seqscan to off.
    OR
    Set default_statistics_target to at least 50.
  2. Restart DB server and analyze the JCR_SVALUE or JCR_MVALUE table.

B.13.5.3. Improve Query Performance with DB2 and rdbms-reindexing

The performance of the queries used while indexing can be improved by using DB2 and setting rdbms-reindexing to true.

Procedure B.11. Improve query performance

  • Collect statistics on tables by running the following query for JCR_SITEM (or JCR_MITEM) and JCR_SVALUE (or JCR_MVALUE) tables:
    RUNSTATS ON TABLE <scheme>.<table> WITH DISTRIBUTION AND INDEXES ALL

B.13.5.4. Cluster-ready indexing for shared index

For cluster-ready implementations JBoss Cache, JGroups and Changes Filter values must be defined.
Shared index requires a remote or shared file system, for example NFS, SMB. Indexing directory indexDir value must point to the file system.

Example B.36. Enable shared indexing

To enable shared index implementation, set changesfilter-class to org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter
<workspace name="ws">
   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
         <property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" />
         <property name="changesfilter-class"
            value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
         <property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
         <property name="jgroups-configuration" value="udp-mux.xml" />
         <property name="jgroups-multiplexer-stack" value="true" />
         <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" />
         <property name="max-volatile-time" value="60" />
         <property name="rdbms-reindexing" value="true" />
         <property name="reindexing-page-size" value="1000" />
         <property name="index-recovery-mode" value="from-coordinator" />
      </properties>
   </query-handler>
</workspace>

B.13.5.5. System Requirements for RSync Index

  • Rsync-based indexing strategy is an installed and properly configured RSync utility.
  • Rsync-based indexing must be accessible by calling "rsync" without defining it's full path.
  • Each cluster node must have a running RSync Server supporting "rsync://" protocol.
  • Path for index for each workspace must be the same across the cluster, /var/data/index/repository-name/workspace-name.
  • Each cluster node must have a running RSync Server supporting "rsync://" protocol.
  • RSync Server configuration must share some of index's parent folders. For example, /var/data/index. In other words, index is stored inside of RSync Server shared folder.

B.13.5.6. RSync Index Configuration

RSync configuration is similar to Shared Index, it just requires some additional parameters for RSync options. If they are present, JCR switches from shared to RSync-based index.

Example B.37. RSync configuration

<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
   <properties>
      <property name="index-dir" value="/var/data/index/repository1/production" />
      <property name="changesfilter-class"
         value="org.exoplatform.services.jcr.impl.core.query.ispn.ISPNIndexChangesFilter" />
      <property name="infinispan-configuration" value="jar:/conf/portal/cluster/infinispan-indexer.xml" />
      <property name="jgroups-configuration" value="jar:/conf/portal/cluster/udp-mux.xml" />
      <property name="infinispan-cluster-name" value="JCR-cluster" />
      <property name="max-volatile-time" value="60" /> 
      <property name="rsync-entry-name" value="index" />
      <property name="rsync-entry-path" value="/var/data/index" />
      <property name="rsync-port" value="8085" />
      <property name="rsync-user" value="rsyncexo" />
      <property name="rsync-password" value="exo" />
   </properties>
</query-handler>
RSync uses rsync-user and rsync-password for authentication.
They are optional and can be skipped if RSync Server configured to accept anonymous identity.

Example B.38. RSync Server configuration

uid = nobody
gid = nobody
use chroot = no
port = 8085
log file = rsyncd.log
pid file = rsyncd.pid
[index]
        path = /var/data/index
        comment = indexes
        read only = true
        auth users = rsyncexo
        secrets file= rsyncd.secrets
This sample configuration shares folder /var/data/index as an entry . The parameters should match rsync-entry-name, rsync-entry-path, and rsync-port properties in JCR configuration.

Note

index-dir is a descendant folder of RSync shared folder and those paths are the same on each cluster node.

B.13.5.7. Cluster-ready indexing for local index

Example B.39. Enable local indexing

To use cluster-ready strategy based on local indexes, when each node owns a copy of index on local file system, the indexing directory must point to any folder on local file system.
The changesfilter-class must be set to org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter.
<workspace name="ws">
   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
         <property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" />
         <property name="changesfilter-class"
            value="org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter" />
         <property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
         <property name="jgroups-configuration" value="udp-mux.xml" />
         <property name="jgroups-multiplexer-stack" value="true" />
         <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" />
         <property name="max-volatile-time" value="60" />
         <property name="rdbms-reindexing" value="true" />
         <property name="reindexing-page-size" value="1000" />
         <property name="index-recovery-mode" value="from-coordinator" />
      </properties>
   </query-handler>
</workspace>

B.13.5.8. Local Index Recovery Filters

All nodes that are joining a cluster for the first time or nodes joining after downtime, must be in a synchronized state.
When using shared value storages, databases and indexes, cluster nodes are synchronized at any given time. But this is not the case in a local index strategy.
If a new node joins a cluster, without an index it is retrieved or recreated. Nodes can be restarted and thus the index is not empty. By default, even though the existing index is looks updated, it can be outdated.
The portal JCR offers a mechanism called RecoveryFilters that automatically retrieve index for the joining node startup. This feature uses a set of filters that are defined in QueryHandler configuration.

Example B.40. QueryHandler configuration

<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" />
Filter numbers are not limited so they can be combined:
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" />
    <property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.SystemPropertyRecoveryFilter" />
If any one returns fires, the index is re-synchronized. This feature uses standard index recovery mode defined by previously described parameter (can be "from-indexing" (default) or "from-coordinator")
<property name="index-recovery-mode" value="from-coordinator" />

B.13.5.9. Filter Implementations

There are multiple filter implementations.
org.exoplatform.services.jcr.impl.core.query.lucene.DummyRecoveryFilter
Always returns true, when the index must be forcibly resynchronizedeach time.
org.exoplatform.services.jcr.impl.core.query.lucene.SystemPropertyRecoveryFilter
Returns value of system property org.exoplatform.jcr.recoveryfilter.forcereindexing. Index recovery is controlled from the top without changing documentation using system properties.
org.exoplatform.services.jcr.impl.core.query.lucene.ConfigurationPropertyRecoveryFilter
Returns value of QueryHandler configuration property index-recovery-filter-forcereindexing. So index recovery is controlled from configuration, separately for each workspace. For example:
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.ConfigurationPropertyRecoveryFilter" />
    <property name="index-recovery-filter-forcereindexing" value="true" />
org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter
Checks the number of documents in index on coordinator side and self-side. It returns true if the count differs.
The advantage of this filter, is that it skips reindexing for workspaces where the index is not modified.
For example, if there are ten repositories with three workspaces each and only one is heavily used in the cluster, this filter reindexes those workspaces that have changed, without affecting other indexes.
This reduces start up time.

B.13.5.10. JBoss-Cache template configuration

JBoss-Cache template configuration for query handler is same for both clustered strategies. The configuration file is jbosscache-indexer.xml

Example B.41. JBoss-Cache configuration for query handler

<?xml version="1.0" encoding="UTF-8"?>
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">
   <locking useLockStriping="false" concurrencyLevel="50000" lockParentForChildInsertRemove="false"
      lockAcquisitionTimeout="20000" />
   <!-- Configure the TransactionManager -->
   <transaction transactionManagerLookupClass="org.jboss.cache.transaction.JBossStandalone
            JTAManagerLookup" />
   <clustering mode="replication" clusterName="${jbosscache-cluster-name}">
      <stateRetrieval timeout="20000" fetchInMemoryState="false" />
      <jgroupsConfig multiplexerStack="jcr.stack" />
      <sync />
   </clustering>
   <!-- Eviction configuration -->
   <eviction wakeUpInterval="5000">
      <default algorithmClass="org.jboss.cache.eviction.FIFOAlgorithm" eventQueueSize="1000000">
         <property name="maxNodes" value="10000" />
         <property name="minTimeToLive" value="60000" />
      </default>
   </eviction>
</jbosscache>

B.13.6. Asynchronous Re-indexing

To manage large data set using JCR in production environment requires special operations with Indexes. These operations include recreation of index, which is called re-indexing.
Re-indexing is required to recover from hardware faults, hard restarts, data-corruption, migrations, and JCR updates that brings new features related to index. Index re-creation is requested on server startup or during runtime.

B.13.6.1. On startup indexing

eXo JCR supports RDBMS re-indexing, which is faster than ordinary indexing. To configure RDBMS reindexing, the QueryHandler parameter rdbms-reindexing is set to true.

B.13.6.2. Asynchronous Indexing on startup

The server startup is blocked when the indexing process is in progress during server startup. Asynchronous indexing on startup unblocks the server at startup .
In asynchronous indexing on startup, all indexing operations are performed in the background without blocking the repository.
To configure asynchronous indexing on startup the value of the async-reindexing parameter in QueryHandler configuration is set to true.
With active asynchronous indexation, the JCR starts without active indexes. You can execute queries on JCR without exceptions but no results are returned until index creation is completed.
The following code shows the usage of the parameterQueryManagerImpl to verify the state of the index, the return value of isOnline() is true.:
  • The OFFLINE state indicates that the index is currently re-creating. When the state changes, a corresponding log event is printed.
    • When the background index task starts the index is switched to OFFLINE, with following log event:
      [INFO] Setting index OFFLINE (repository/production[system]).
  • When the indexing process is completed, the following two events are logged:
    [INFO] Created initial index for 143018 nodes (repository/production[system]).
    [INFO] Setting index ONLINE (repository/production[system]).
    These two log lines indicates the end of process for workspace named system.

B.13.6.3. Hot Asynchronous Workspace Re-indexing using JMX

Due to hard system faults, system upgradation errors, migration issues and so on the index gets corrupted. Hot Asynchronous Workspace Re-indexing allows service administrators to launch the process in the background without stopping or blocking the application. Hot Asynchronous Workspace Re-indexing uses a JMX-compatible console.
The JMX Jconsole, with the SystemSearchManager Operations MBean displayed. The mouse is clicking the void reindex option.

Figure B.7. JMX Jconsole

The server can continue working as expected while the index is recreated.
This depends on the flag allow queries being passed via JMX interface to the reindex operation invocation. If the flag is set, the application continues working.

Important

In hot asynchronous workspace re-indexing method, the index is frozen while the background task is running.
This means that queries are performed on a version of the index present at the moment the indexing task is started, and that data written into the repository after startup will not be available through the search until process completes.
Data added during re-indexation is also indexed. The data is available when reindexing is complete. The JCR makes a snapshot of indexes at the invocation of the asynchronous indexing task and uses that snapshot for searches.
When the operation is finished, the stale index is replaced by the newly created index, which included any newly added data.
If the allow queries flag is set to false, then all queries will throw an exception while task is running. The current state can be acquired using the following JMX operation:
  • getHotReindexingState() - returns information about latest invocation: start time, if in progress or finish time if done.

B.13.6.4. Notices

You cannot launch hot re-indexing using JMX cannot if the index is in offline mode. This means that the index is currently busy in other operations, such as re-indexing at startup, copying in cluster to another node and so on.
Hot Asynchronous Reindexing via JMX and on startup reindexing are different features. You cannot get the state of startup reindexing using command getHotReindexingState in JMX interface.

Common JMX operations

  • getIOMode - returns current index IO mode (READ_ONLY / READ_WRITE), belongs to clustered configuration states.
  • getState - returns current state, (ONLINE / OFFLINE).

B.13.7. Lucene tuning

JCR Indexing is based on the Lucene indexing library. JCR Indexing uses directories to store index and manages access to index by Lock Factories.
By default, the JCR implementation uses optimal combination of Directory implementation and Lock Factory implementation.
  • SimpleFSDirectory is used in Windows environments and the NIOFSDirectory implementation is used in non-Windows systems.
  • NativeFSLockFactory is an optimal solution for a wide variety of cases including clustered environment with NFS shared resources.
You can override the default settings in the system properties.
  • The properties, org.exoplatform.jcr.lucene.store.FSDirectoryLockFactoryClass and org.exoplatform.jcr.lucene.FSDirectory.class control the default behavior.
    • org.exoplatform.jcr.lucene.store.FSDirectoryLockFactoryClass defines the implementation of abstract Lucene LockFactory class.
    • org.exoplatform.jcr.lucene.FSDirectory.class sets the implementation class for FSDirectory instances.

Important

JCR allows you to change the implementation classes of Lucene internals but it does not guarantee the stability and functionality of the changes.
For more information, see the Lucene documentation located at http://lucene.apache.org/core/documentation.html.

B.13.7.1. JBossTransactionsService

JBossTransactionsService implements eXo TransactionService and provides access to JBoss Transaction Service (JBossTS) JTA implementation using eXo container dependency.
TransactionService is used in JCR cache org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache implementation.

Example B.42. JBossTransactionsService Configuration

  <component>
    <key>org.exoplatform.services.transaction.TransactionService</key>
    <type>org.exoplatform.services.transaction.jbosscache.JBossTransactionsService</type>
    <init-params>
      <value-param>
        <name>timeout</name>
        <value>3000</value>
      </value-param>
    </init-params>   
  </component>
timeout - XA transaction timeout in seconds.

B.13.7.2. JCR Query Use-cases

The JCR supports two query languages; JCR and XPath. A query, whether XPath or SQL, specifies a subset of nodes within a workspace, called the result set. The result set constitutes all the nodes in the workspace that meet the constraints stated in the query.

Example B.43. SQL Query Creation and Execution

// get QueryManager
QueryManager queryManager = workspace.getQueryManager(); 
// make SQL query
Query query = queryManager.createQuery("SELECT * FROM nt:base ", Query.SQL);
// execute query
QueryResult result = query.execute();

Example B.44. XPath Query Creation and Execution

// get QueryManager
QueryManager queryManager = workspace.getQueryManager(); 
// make XPath query
Query query = queryManager.createQuery("//element(*,nt:base)", Query.XPATH);
// execute query
QueryResult result = query.execute();

Example B.45. Query Result Processing

// fetch query result
QueryResult result = query.execute();
To fetch the nodes:
NodeIterator it = result.getNodes();
The results can be formatted in a table:
// get column names
String[] columnNames = result.getColumnNames();
// get column rows
RowIterator rowIterator = result.getRows();
while(rowIterator.hasNext()){
   // get next row
   Row row = rowIterator.nextRow();
   // get all values of row
   Value[] values = row.getValues();
}

B.13.8. Searching Repository Content

JCR configuration file is located at: $JPP_HOME/gatein/gatein.ear/portal.war/portal/WEB-INF/conf/jcr/repository-configuration.xml.
You can search the JCR content repository using various search techniques such as bi-directional range iteration, fuzzy search, synonym search and so on.

B.13.8.1. Bi-directional RangeIterator

Bi-directional NodeIterator is implemented using QueryResult.getNodes().
The TwoWayRangeIterator interface is shown in the following example.

Example B.46. TwoWayRangeIterator interface

/**
 * Skip a number of elements in the iterator.
 * 
 * @param skipNum the non-negative number of elements to skip
 * @throws java.util.NoSuchElementException if skipped past the first element
 *           in the iterator.
 */
public void skipBack(long skipNum);

Example B.47. Usage of TwoWayRangeIterator

NodeIterator iter = queryResult.getNodes();
while (iter.hasNext()) {
  if (skipForward) {
    iter.skip(10); // Skip 10 nodes in forward direction
  } else if (skipBack) {
    TwoWayRangeIterator backIter = (TwoWayRangeIterator) iter; 
    backIter.skipBack(10); // Skip 10 nodes back 
  }
  .......
}

Note

Bi-directional NodeIterator is not supported in two cases:
  1. SQL query: select * from nt:base
  2. XPath query: //* .

B.13.8.2. Fuzzy Searches

JCR supports Lucene Fuzzy Searches. The following query q performs a fuzzy search:
QueryManager qman = session.getWorkspace().getQueryManager();
Query q = qman.createQuery("select * from nt:base where contains(field, 'ccccc~')", Query.SQL);
QueryResult res = q.execute();

B.13.8.3. Synonym Search

Searching with synonyms is integrated in the jcr:contains() function and uses the same syntax as synonym searches in web search engines such as Google. If a search term is prefixed by a tilde symbol ( ~ ), synonyms of the search term are taken into consideration.

Example B.48. Usage of the tilde symbol in search

SQL: select * from nt:resource where contains(., '~parameter')

XPath: //element(*, nt:resource)[jcr:contains(., '~parameter')

Example B.49. Enabling Synonym Search

To enable synonym search you need to add a configuration parameter to the query-handler element in your JCR configuration file.
<param  name="synonymprovider-config-path" value="..you path to configuration file....."/>
<param  name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider"/>

Example B.50. Synonym Provider Interface

/**
 * <code>SynonymProvider</code> defines an interface for a component that
 * returns synonyms for a given term.
 */
public interface SynonymProvider {

   /**
    * Initializes the synonym provider and passes the file system resource to
    * the synonym provider configuration defined by the configuration value of
    * the <code>synonymProviderConfigPath</code> parameter. The resource may be
    * <code>null</code> if the configuration parameter is not set.
    *
    * @param fsr the file system resource to the synonym provider
    *            configuration.
    * @throws IOException if an error occurs while initializing the synonym
    *                     provider.
    */
   public void initialize(InputStream fsr) throws IOException;

   /**
    * Returns an array of terms that are considered synonyms for the given
    * <code>term</code>.
    *
    * @param term a search term.
    * @return an array of synonyms for the given <code>term</code> or an empty
    *         array if no synonyms are known.
    */
   public String[] getSynonyms(String term);
}

B.13.9. Highlighting

An ExcerptProvider retrieves text excerpts for a node in the query result and marks up the words in the text that match the query terms.
By default, match highlighting is disabled because as it requires that additional information is written to the search index.
To enable this feature, you need to add a configuration parameter to the query-handler element in your JCR configuration file:
<param name="support-highlighting" value="true"/>
Additionally, there is a parameter that controls the format of the excerpt created. In JCR 1.9, the default is set to org.exoplatform.services.jcr.impl.core.query.lucene.DefaultHTMLExcerpt. The configuration parameter for this setting is:
<param name="excerptprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.DefaultXMLExcerpt"/>

B.13.9.1. DefaultXMLExcerpt

This excerpt provider creates an XML fragment of the following form:
<excerpt>
    <fragment>
        <highlight>exoplatform</highlight> implements both the mandatory
        XPath and optional SQL <highlight>query</highlight> syntax.
    </fragment>
    <fragment>
        Before parsing the XPath <highlight>query</highlight> in
        <highlight>exoplatform</highlight>, the statement is surrounded
    </fragment>
</excerpt>

B.13.9.2. DefaultHTMLExcerpt

This excerpt provider creates an HTML fragment of the following form:
<div>
    <span>
        <strong>exoplatform</strong> implements both the mandatory XPath
        and optional SQL <strong>query</strong> syntax.
    </span>
    <span>
        Before parsing the XPath <strong>query</strong> in
        <strong>exoplatform</strong>, the statement is surrounded
    </span>
</div>

B.13.9.3. Usage

If you are using XPath, you must use the rep:excerpt() function in the last location step, just like you would select properties:
QueryManager qm = session.getWorkspace().getQueryManager();
Query q = qm.createQuery("//*[jcr:contains(., 'exoplatform')]/(@Title|rep:excerpt(.))", Query.XPATH);
QueryResult result = q.execute();
for (RowIterator it = result.getRows(); it.hasNext(); ) {
   Row r = it.nextRow();
   Value title = r.getValue("Title");
   Value excerpt = r.getValue("rep:excerpt(.)");
}
The above code searches for nodes that contain the word exoplatform and then gets the value of the Title property and an excerpt for each resultant node.
It is also possible to use a relative path in the call Row.getValue() while the query statement still remains the same. Also, you may use a relative path to a string property. The returned value will then be an excerpt based on string value of the property.
Both available excerpt providers will create fragments of about 150 characters and up to three fragments.
In SQL, the function is called excerpt() without the rep prefix, but the column in the RowIterator will nonetheless be labelled rep:excerpt(.).
QueryManager qm = session.getWorkspace().getQueryManager();
Query q = qm.createQuery("select excerpt(.) from nt:resource where contains(., 'exoplatform')", Query.SQL);
QueryResult result = q.execute();
for (RowIterator it = result.getRows(); it.hasNext(); ) {
   Row r = it.nextRow();
   Value excerpt = r.getValue("rep:excerpt(.)");
}

B.13.10. SpellChecker

The lucene based query handler implementation supports a pluggable spell-checker mechanism. By default, spell checking is not available, so you have to configure it.
The JCR currently provides an implementation class which uses the lucene-spellchecker.
The dictionary is derived from the fulltext, indexed content of the workspace and updated periodically. You can configure the refresh interval by selecting the available inner classes of org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker:

Inner Classes available in SpellChecker

  • OneMinuteRefreshInterval
  • FiveMinutesRefreshInterval
  • ThirtyMinutesRefreshInterval
  • OneHourRefreshInterval
  • SixHoursRefreshInterval
  • TwelveHoursRefreshInterval
  • OneDayRefreshInterval
For example,for a refresh interval of six hours, the class name is, org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker$SixHoursRefreshInterval.
If you use org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker, the refresh interval will be one hour.
The spell checker dictionary is stored as a lucene index under <index-dir>/spellchecker. If this index does not exist, a background thread will create it on start up. Similarly, the dictionary refresh is also done in a background to avoid blocking of regular queries.

B.13.10.1. Spell check Usage

You can spell check a fulltext statement either with an XPath or a SQL query:

Example B.51. Spell check using XPath

// rep:spellcheck('explatform') will always evaluate to true
Query query = qm.createQuery("/jcr:root[rep:spellcheck('explatform')]/(rep:spellcheck())", Query.XPATH);
RowIterator rows = query.execute().getRows();
// the above query will always return the root node no matter what string we check
Row r = rows.nextRow();
// get the result of the spell checking
Value v = r.getValue("rep:spellcheck()");
if (v == null) {
   // no suggestion returned, the spelling is correct or the spell checker
   // does not know how to correct it.
} else {
   String suggestion = v.getString();
}

Example B.52. Spell check using SQL

// SPELLCHECK('exoplatform') will always evaluate to true
Query query = qm.createQuery("SELECT rep:spellcheck() FROM nt:base WHERE jcr:path = '/' AND SPELLCHECK('explatform')", Query.SQL);
RowIterator rows = query.execute().getRows();
// the above query will always return the root node no matter what string we check
Row r = rows.nextRow();
// get the result of the spell checking
Value v = r.getValue("rep:spellcheck()");
if (v == null) {
   // no suggestion returned, the spelling is correct or the spell checker
   // does not know how to correct it.
} else {
   String suggestion = v.getString();
}

B.13.11. Similarity

JCR, version 1.12 and onwards, allows you to search nodes that are similar to an existing node.
Similarity is determined by looking up terms that are common to nodes. There are conditions that must be met for a term to be considered. This is required to limit the number of relevant terms.
For a term to be considered relevant, the term must meet the following conditions.
  • The term must be at least four characters long.
  • The term must occur at least twice in the source node.
  • The term must occur in at least five other nodes.

Note

The similarity function requires that the support Hightlighting is enabled. You must have the following parameter set for the query handler in your workspace.xml.
<param name="support-highlighting" value="true"/>
The functions (rep:similar() in XPath and similar() in SQL) have two arguments:
relativePath
A relative path to a descendant node or a period (.) for the current node.
absoluteStringPath
A string literal that contains the path to the node for which, you are finding similar nodes.

Warning

Relative path is not supported yet.

Example B.53. Query to find similar nodes

The following query finds nt:resource nodes, which are similar to node by path /parentnode/node.txt/jcr:content.
//element(*, nt:resource)[rep:similar(., '/parentnode/node.txt/jcr:content')]

B.14. Full Text Search And Affecting Settings

Each indexable property of a node is processed with the Lucene analyzer and stored in the Lucene index. This is called indexing of a property. It allows fulltext searching of the indexed properties.

B.14.1. Lucene Analyzers

The purpose of analyzers is to transform all strings stored in the index into a well-defined condition. The same analyzer(s) is/are used when searching in order to adapt the query string to the index reality.
Therefore, performing the same query using different analyzers can return different results.
This example illustrates how the same string is transformed by different analyzers.

Table B.7. "The quick brown fox jumped over the lazy dogs"

Analyzer Parsed
org.apache.lucene.analysis.WhitespaceAnalyzer [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
org.apache.lucene.analysis.SimpleAnalyzer [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
org.apache.lucene.analysis.StopAnalyzer [quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
org.apache.lucene.analysis.standard.StandardAnalyzer [quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
org.apache.lucene.analysis.snowball.SnowballAnalyzer [quick] [brown] [fox] [jump] [over] [lazi] [dog]
org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - jcr default analyzer) [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]

Table B.8. "XY&Z Corporation - xyz@example.com"

Analyzer Parsed
org.apache.lucene.analysis.WhitespaceAnalyzer [XY&Z] [Corporation] [-] [xyz@example.com]
org.apache.lucene.analysis.SimpleAnalyzer [xy] [z] [corporation] [xyz] [example] [com]
org.apache.lucene.analysis.StopAnalyzer [xy] [z] [corporation] [xyz] [example] [com]
org.apache.lucene.analysis.standard.StandardAnalyzer [xy&z] [corporation] [xyz@example] [com]
org.apache.lucene.analysis.snowball.SnowballAnalyzer [xy&z] [corpor] [xyz@exampl] [com]
org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - jcr default analyzer) [xy&z] [corporation] [xyz@example] [com]

Note

StandardAnalyzer is the default analyzer in the portal JCR search engine. But it does not use stop words.

B.14.2. Property Indexing

Different properties are indexed in different ways and this affects whether it can be searched using fulltext by property or not.
Two property types are indexed as fulltext searchable, which are STRING and BINARY.

Table B.9. Fulltext search by different properties

Property Type Fulltext search by all properties Fulltext search by exact property
STRING YES YES
BINARY YES NO
For example, the jcr:data property (which is BINARY) is not be found with a query structured as follows because, BINARY is not searchable by fulltext search by exact property.
SELECT * FROM nt:resource WHERE CONTAINS(jcr:data, 'some string')
However, the following query return some results, provided the node contains the targeted data.
SELECT * FROM nt:resource WHERE CONTAINS( * , 'some string')

B.14.3. Different Analyzers

This topic shows the different types of analyzers. To create examples to analyze, first you have to fill repository by nodes with mixin type mix:title and different values of jcr:description property.
root
  ├── document1 (mix:title) jcr:description = "The quick brown fox jumped over the lazy dogs"
  ├── document2 (mix:title) jcr:description = "Brown fox live in forest."
  └── document3 (mix:title) jcr:description = "Fox is a nice animal."

Example B.54. Usage of analyzer

The first instance uses base JCR settings, so the string, The quick brown fox jumped over the lazy dogs is transformed to the set; {[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] }.
// make SQL query
QueryManager queryManager = workspace.getQueryManager();
String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:description, 'the')";
// create query
Query query = queryManager.createQuery(sqlStatement, Query.SQL);
// execute query and fetch result
QueryResult result = query.execute();
The NodeIterator returns document1.
If the default analyzer is changed to org.apache.lucene.analysis.StopAnalyzer, the repository is populated again (the new Analyzer must process node properties) and the same query runs to return nothing, because stop words like the are excluded from parsed string set.

B.15. WebDAV

WebDAV protocol enables you to use external tools to communicate with hierarchical content servers using the HTTP protocol. It is possible to add and remove documents or a set of documents from a path on the server.
DeltaV is an extension of the WebDav protocol that allows managing document versioning. The Locking feature guarantees protection against multiple access when writing resources. The ordering support allows changing the position of the resource in the list and sort the directory to make the directory tree viewed conveniently. The full-text search makes it easy to find the necessary documents. You can search by using two languages: SQL and XPATH.
In eXo JCR, the WebDAV layer is plugged on top of your JCR implementation. This setup enables to browse a workspace using external tools regardless of operating system environments. You can use a Java WebDAV client, such as DAVExplorer or Internet Explorer using FileOpen as a Web Folder.
WebDav is an extension of the REST service. To get the WebDav server ready, you must deploy the REST application. Then, you can access any workspaces of your repository by using the following URL:
When accessing the WebDAV server via http://localhost:8080/rest/jcr/repository/production, you can substitute production with collaboration.
Enter your login credentials. These are checked using the organization service that can be implemented using a dummy InMemory module, DB module or an LDAP. The JCR user session is created with the correct JCR Credentials.

Note:

If you try the "in ECM" option, add "@ecm" to the user's password. You can modify jaas.conf by adding the domain=ecm option as follows:
exo-domain {
     org.exoplatform.services.security.jaas.BasicLoginModule required domain=ecm;
};

B.15.1. WebDAV Configuration

The WebDAV configuration file:
<component>
  <key>org.exoplatform.services.webdav.WebDavServiceImpl</key>
  <type>org.exoplatform.services.webdav.WebDavServiceImpl</type>
  <init-params>

    <!-- this parameter indicates the default login and password values
         used as credentials for accessing the repository -->
    <!-- value-param>
      <name>default-identity</name>
      <value>admin:admin</value>    
    </value-param -->

    <!-- this is the value of WWW-Authenticate header -->
    <value-param>
      <name>auth-header</name>
      <value>Basic realm="eXo-Platform Webdav Server 1.6.1"</value>
    </value-param>

    <!-- default node type which is used for the creation of collections -->
    <value-param>
      <name>def-folder-node-type</name>
      <value>nt:folder</value>
    </value-param>

    <!-- default node type which is used for the creation of files -->
    <value-param>
      <name>def-file-node-type</name>
      <value>nt:file</value>
    </value-param>

    <!-- if MimeTypeResolver can't find the required mime type, 
         which conforms with the file extension, and the mimeType header is absent
         in the HTTP request header, this parameter is used 
         as the default mime type-->
    <value-param>
      <name>def-file-mimetype</name>
      <value>application/octet-stream</value>
    </value-param>

    <!-- This parameter indicates one of the three cases when you update the content of the resource by PUT command.
         In case of "create-version", PUT command creates the new version of the resource if this resource exists.
         In case of "replace" - if the resource exists, PUT command updates the content of the resource and its last modification date.
         In case of "add", the PUT command tries to create the new resource with the same name (if the parent node allows same-name siblings).-->

    <value-param>
      <name>update-policy</name>
      <value>create-version</value>
      <!--value>replace</value -->
      <!-- value>add</value -->
    </value-param>

    <!--
        This parameter determines how service responds to a method that attempts to modify file content.
        In case of "checkout-checkin" value, when a modification request is applied to a checked-in version-controlled resource, the request is automatically preceded by a checkout and followed by a checkin operation.
        In case of "checkout" value, when a modification request is applied to a checked-in version-controlled resource, the request is automatically preceded by a checkout operation.
    -->         
    <value-param>
      <name>auto-version</name>
      <value>checkout-checkin</value>
      <!--value>checkout</value -->
    </value-param>

    <!--
        This parameter is responsible for managing Cache-Control header value which will be returned to the client.
        You can use patterns like "text/*", "image/*" or wildcard to define the type of content.
    -->  
    <value-param>
      <name>cache-control</name>
      <value>text/xml,text/html:max-age=3600;image/png,image/jpg:max-age=1800;*/*:no-cache;</value>
    </value-param>
    
    <!--
        This parameter determines the absolute path to the folder icon file, which is shown
        during WebDAV view of the contents
    -->
    <value-param>
      <name>folder-icon-path</name>
      <value>/absolute/path/to/file</value>
    </value-param>

  </init-params>
</component>

B.15.2. WebDAV and JCR Actions

Table B.10. Corresponding WebDAV and JCR Actions

WebDav JCR
COPY Workspace.copy(...)
DELETE Node.remove()
GET Node.getProperty(...); Property.getValue()
HEAD Node.getProperty(...); Property.getLength()
MKCOL Node.addNode(...)
MOVE Session.move(...) or Workspace.move(...)
PROPFIND Session.getNode(...); Node.getNode(...); Node.getNodes(...); Node.getProperties()
PROPPATCH Node.setProperty(...); Node.getProperty(...).remove()
PUT Node.addNode("node","nt:file"); Node.setProperty("jcr:data", "data")
CHECKIN Node.checkin()
CHECKOUT Node.checkout()
REPORT Node.getVersionHistory(); VersionHistory.getAllVersions(); Version.getProperties()
RESTORE Node.restore(...)
UNCHECKOUT Node.restore(...)
VERSION-CONTROL Node.addMixin("mix:versionable")
LOCK Node.lock(...)
UNLOCK Node.unlock()
ORDERPATCH Node.orderBefore(...)
SEARCH Workspace.getQueryManager(); QueryManager.createQuery(); Query.execute()

B.15.3. WebDAV Limitation on Windows

When attempting to set up a web folder through Add a Network Location or Map a Network Drive through My Computer, an error message stating The folder you entered does not appear to be valid. Please choose another or Windows cannot access … Check the spelling of the name. Otherwise, there might be … may be encountered.
These errors may appear when you are using SSL or non-SSL.
To fix the error, perform the following steps:
  1. Go to Windows Registry Editor.
  2. Find a key: \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlset\services\WebClient\Parameters\BasicAuthLevel .
  3. Change the value to 2.

B.15.4. WebDAV Limitation for Microsoft Office 2010

If you have Microsoft Office 2007/2010 applications installed on a client computer and the client computer is connected to a web server configured for basic authentication using a connection that does not use Secure Sockets Layer (SSL) and you try to access an MS Office file that is stored on the remote server. You might experience the following symptoms when you try to open or to download the file:
  • The Office file does not open or download.
  • You do not receive a Basic authentication password prompt when you try to open or to download the file.
  • You do not receive an error message when you try to open the file. The associated Office application starts. However, the selected file does not open.
These outcomes can be circumvented by enabling Basic authentication on the client machine.

Procedure B.12. Enabling Basic Authentication on the Client Computer

  1. Click Start, type regedit in the Start Search box, and then press Enter.
  2. Locate and then click the following registry subkey:
    HKEY_CURRENT_USER\Software\Microsoft\Office\14.0\Common\Internet
  3. On the Edit menu, point to New, and then click DWORD Value.
  4. Type BasicAuthLevel, and then press Enter.
  5. Right-click BasicAuthLevel, and then click Modify.
  6. In the Value data box, type 2, and then click OK.

B.16. FTP

The JCR-FTP Server operates as an FTP server with access to a content stored in JCR repositories in the form of nt:file/nt:folder nodes or their successors. The client of an executed Server can be any FTP client. The FTP server is supported by a standard configuration which can be changed as required.

Configuration Parameters

command-port:
<value-param>
   <name>command-port</name>
   <value>21</value>
</value-param>
The value of the command channel port. The value '21' is default.
If you have already other FTP server installed in your system, this parameter needs to be changed (to 2121, for example) to avoid conflicts or if the port is protected.
data-min-port and data-max-port
<value-param>
   <name>data-min-port</name>
   <value>52000</value>
</value-param>
<value-param>
   <name>data-max-port</name>
   <value>53000</value>
</value-param>
These two parameters indicate the minimum and maximum values of the range of ports, used by the server. The usage of the additional data channel is required by the FTP protocol, which is used to transfer the contents of files and the listing of catalogues. This range of ports should be free from listening by other server-programs.
system
<value-param>
   <name>system</name>

   <value>Windows_NT</value>
     or
   <value>UNIX Type: L8</value>
</value-param>
Types of formats of listing of catalogues which are supported.
client-side-encoding
<value-param>
   <name>client-side-encoding</name>
      
   <value>windows-1251</value>
     or
   <value>KOI8-R</value>
     
</value-param>
This parameter specifies the coding which is used for dialogue with the client.
def-folder-node-type
<value-param>
   <name>def-folder-node-type</name>
   <value>nt:folder</value>
</value-param>
This parameter specifies the type of a node, when an FTP-folder is created.
def-file-node-type
<value-param>
   <name>def-file-node-type</name>
   <value>nt:file</value>
</value-param>
This parameter specifies the type of a node, when an FTP-file is created.
def-file-mime-type
<value-param>
   <name>def-file-mime-type</name>                 
   <value>application/zip</value>
</value-param>
The mime type of a created file is chosen by using its file extension. In case, a server cannot find the corresponding mime type, this value is used.
cache-folder-name
<value-param>
   <name>cache-folder-name</name>
   <value>../temp/ftp_cache</value>
</value-param>
The Path of the cache folder.
upload-speed-limit
<value-param>
   <name>upload-speed-limit</name>           
   <value>20480</value>
</value-param>
Restriction of the upload speed. It is measured in bytes.
download-speed-limit
<value-param>
   <name>download-speed-limit</name>
   <value>20480</value>          
</value-param>
Restriction of the download speed. It is measured in bytes.
timeout
<value-param>
   <name>timeout</name>
   <value>60</value>
</value-param>
Defines the value of a timeout.

B.17. Use External Backup Tool

B.17.1. Repository Suspending

To have the repository content consistent with the search index and value storage, the repository must be suspended. This means all working threads are suspended until a resume operation is performed and the index is flushed.
JCR provides ability to suspend repository via JMX.
The JMX console with the RepositorySuspendController MBean options displayed.

Figure B.8. Repository Suspend Controller

To suspend repository you need to invoke the suspend() operation. The returned result is "suspended" if everything passes successfully.
An "undefined" result means not all components are successfully suspended. Review the stack traces in the console to identify the cause.

B.17.2. Backup Considerations

You can backup your content manually or by using third part software. You should back up:
  • Database.
  • Lucene index.
  • Value storage (if configured).

B.17.3. Repository Resuming

Once a backup is done you need to invoke the resume() operation to switch the repository back to on-line. The returned result will be "on-line".

B.18. eXo JCR statistics

B.18.1. Statistics on the Database Access Layer

The statistics on database access layer help you to identify, what is slow in this layer and help diagnose and fix the problem.
You can get statistics on the time spent into the database access layer using the environment variables such as, org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer or org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer as WorkspaceDataContainer.

B.18.1.1. Database Access Layer Data

The database access layer is represented by the methods of the interface org.exoplatform.services.jcr.storage.WorkspaceStorageConnection, so for all the methods defined in this interface have the following data:
  • The minimum time spent into the method.
  • The maximum time spent into the method.
  • The average time spent into the method.
  • The total amount of time spent into the method.
  • The total amount of time the method has been called.
These figures are also available globally for all the methods which gives the global behavior of this layer.

B.18.1.2. Enabling Statistics for Database Access Layer

To enable the statistics, set the JVM parameter called JDBCWorkspaceDataContainer.statistics.enabled to true. The corresponding CSV file is StatisticsJDBCStorageConnection-${creation-timestamp}.csv.
The format of each column header is ${method-alias}-${metric-alias}. The metric alias are described in the statistics manager section.
The name of the category of statistics corresponding to these statistics is JDBCStorageConnection, this name is mostly needed to access to the statistics through JMX.

B.18.1.3. Database Access Layer Methods and their alias

Table B.11. Method Alias

global This is the alias for all the methods.
getItemDataById This is the alias for the method getItemData(String identifier).
getItemDataByNodeDataNQPathEntry This is the alias for the method getItemData(NodeData parentData, QPathEntry name).
getChildNodesData This is the alias for the method getChildNodesData(NodeData parent).
getChildNodesCount This is the alias for the method getChildNodesCount(NodeData parent).
getChildPropertiesData This is the alias for the method getChildPropertiesData(NodeData parent).
listChildPropertiesData This is the alias for the method listChildPropertiesData(NodeData parent).
getReferencesData This is the alias for the method getReferencesData(String nodeIdentifier).
commit This is the alias for the method commit().
addNodeData This is the alias for the method add(NodeData data).
addPropertyData This is the alias for the method add(PropertyData data).
updateNodeData This is the alias for the method update(NodeData data).
updatePropertyData This is the alias for the method update(PropertyData data).
deleteNodeData This is the alias for the method delete(NodeData data).
deletePropertyData This is the alias for the method delete(PropertyData data).
renameNodeData This is the alias for the method rename(NodeData data).
rollback This is the alias for the method rollback().
isOpened This is the alias for the method isOpened().
close This is the alias for the method close().

B.18.2. Statistics on the JCR API Accesses

To understand the usage of eXo JCR, you have register all the JCR API access requests and create a real life test scenario based on pure JCR calls and tune the JCR to better fit operational requirements.
The Load-time Weaving proposed by AspectJ allows you to specify which part of eXo JCR to monitor without applying any changes to existing code. To enable this feature, you have to add the following jar files in your classpath.

B.18.2.1. Enabling Statistics on the JCR API Accesses

Add the JVM parameter -javaagent:${pathto}/aspectjweaver-1.6.8.jar to the command line. See http://www.eclipse.org/aspectj/doc/released/devguide/ltw-configuration.html for more information.
By default, the configuration collects statistics on all methods of the internal interfaces org.exoplatform.services.jcr.core.ExtendedSession and org.exoplatform.services.jcr.core.ExtendedNode, and the JCR API interface javax.jcr.Property.
To add or remove monitored interfaces, two configuration files bundled into the jar exo.jcr.component.statistics-X.Y.Z.jar must be changed. The files are conf/configuration.xml and META-INF/aop.xml.

Example B.55. Configuration.xml

The file content below is the content of conf/configuration.xml that must be modified to add or remove the fully qualified name of the interfaces to monitor, into the list of parameter values of the init param called targetInterfaces.
<configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.exoplaform.org/xml/ns/kernel_1_2.xsd http://www.exoplaform.org/xml/ns/kernel_1_2.xsd"
 xmlns="http://www.exoplaform.org/xml/ns/kernel_1_2.xsd">

 <component>
   <type>org.exoplatform.services.jcr.statistics.JCRAPIAspectConfig</type>
   <init-params>
     <values-param>
       <name>targetInterfaces</name>
       <value>org.exoplatform.services.jcr.core.ExtendedSession</value>
       <value>org.exoplatform.services.jcr.core.ExtendedNode</value>
       <value>javax.jcr.Property</value>
     </values-param>
   </init-params>
  </component>
</configuration>

Example B.56. aop.xml

The file content below is the content of META-INF/aop.xml that must be modified to add or remove the fully qualified name of the interfaces to monitor, into the expression filter of the pointcut called JCRAPIPointcut.
By default only JCR API calls from the exoplatform packages are taken into account. This filter can be modified to add other package names.
<aspectj>
  <aspects>
    <concrete-aspect name="org.exoplatform.services.jcr.statistics.JCRAPIAspectImpl" extends="org.exoplatform.services.jcr.statistics.JCRAPIAspect">
      <pointcut name="JCRAPIPointcut"
        expression="(target(org.exoplatform.services.jcr.core.ExtendedSession) || target(org.exoplatform.services.jcr.core.ExtendedNode) || target(javax.jcr.Property)) && call(public * *(..))" />
    </concrete-aspect>
  </aspects>
  <weaver options="-XnoInline">
    <include within="org.exoplatform..*" />
  </weaver>
</aspectj>

Warning

This feature will affect the performance of eXo JCR. It must be used with caution.

B.18.2.2. CSV files used in Statistics

The corresponding CSV files are of type Statistics${interface-name}-${creation-timestamp}.csv
The format of each column header is ${method-alias}-${metric-alias}. The method alias is of type ${method-name}(semicolon-delimited-list-of-parameter-types-to-be-compatible-with-the-CSV-format).
The name of the category of statistics corresponding to these statistics is the name of the monitored interface, for example, ExtendedSession for org.exoplatform.services.jcr.core.ExtendedSession. This name is required to access statistics through JMX.

B.18.3. Statistics Manager

The statistics manager manages all the statistics provided by eXo JCR. It is responsible for printing the data to the CSV files and exposing the statistics through JMX or Rest interface.
The statistics manager creates all the CSV files for each category of statistics. The format of the CSV files is Statistics${category-name}-${creation-timestamp}.csv.
The CSV files are created in the user directory or the temporary directory. The format of those files is CSV ( Comma-Separated Values). One new line is added regularly, every 5 seconds by default and one last line is added at JVM exit. Each line is composed of the 5 figures described below for each method and globally for all the methods.

Table B.12. Metric Alias

Min The minimum time spent into the method expressed in milliseconds.
Max The maximum time spent into the method expressed in milliseconds.
Total The total amount of time spent into the method expressed in milliseconds.
Avg The average time spent into the method expressed in milliseconds.
Times The total amount of times the method has been called.

Note

You can disable the persistence of the statistics by setting the JVM parameter called JCRStatisticsManager.persistence.enabled to false. It is set to true by default.
You can define the time interval between each record by setting the JVM parameter called JCRStatisticsManager.persistence.timeout to your expected value in milliseconds. It is set to 5000 by default.

B.18.3.1. Accessing Statistics using JMX

You can access to the statistics through JMX. The available methods are:

Table B.13. JMX Methods

getMin Give the minimum time spent into the method corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics (JDBCStorageConnection for example) and the name of the expected method or global for the global value.
getMax Give the maximum time spent into the method corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics and the name of the expected method or global for the global value.
getTotal Give the total amount of time spent into the method corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics and the name of the expected method or global for the global value.
getAvg Give the average time spent into the method corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics and the name of the expected method or global for the global value.
getTimes Give the total amount of times the method has been called corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics (e.g. JDBCStorageConnection) and the name of the expected method or global for the global value.
reset Reset the statistics for the given category name and statistics name. The expected arguments are the name of the category of statistics and the name of the expected method or global for the global value.
resetAll Reset all the statistics for the given category name. The expected argument is the name of the category of statistics (e.g. JDBCStorageConnection).
The full name of the related MBean is xo:service=statistic, view=jcr.

B.19. Checking Repository Integrity and Consistency

B.19.1. JMX-based consistency tool

It is important to check the integrity and consistency of system regularly, especially if there is no, or stale, backups. The portal JCR implementation offers an innovative JMX-based complex checking tool.
During an inspection, the tool checks every major JCR component, such as persistent data layer and the index. The persistent layer includes JDBC Data Container and Value-Storages if they are configured.
The database is verified using the set of complex specialized domain-specific queries. The Value Storage tool checks the existence of, and access to, each file.
Access to the check tool is exposed via the JMX interface, with the following operations available:

Table B.14. Available methods

checkRepositoryDataConsistency() Inspect full repository data (db, value storage and search index)
checkRepositoryDataBaseConsistency() Inspect only DB
checkRepositoryValueStorageConsistency() Inspect only ValueStorage
checkRepositorySearchIndexConsistency() Inspect only SearchIndex
All inspection activities and corrupted data details are stored in a file in the app directory and named as per the following convention: report-<repository name>-dd-MMM-yy-HH-mm.txt.
The path to the file will be returned in result message also at the end of the inspection.

Note

There are three types of inconsistency (Warning, Error and Index) and two of them are critical (Errors and Index):
  • Index faults are marked as "Reindex" and can be fixed by re-indexing the workspace.
  • Errors can only be fixed manually.
  • Warnings can be a normal situation in some cases and usually production system will still remain fully functional.

B.20. JCR Performance Tuning Guide

This section will show you various ways of improving JCR performance.
It is intended for Administrators and others who want to use the JCR features more efficiently.

B.20.1. Cluster configuration

Table B.15, “EC2 network: 1Gbit” contains details about the configuration of the cluster used in benchmark testing.

Note

NFS and statistics (cacti snmp) server were located on one physical server.

Table B.15. EC2 network: 1Gbit

Servers hardware Specification
RAM 7.5 GB
Processors 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
Storage 850 GB (2×420 GB plus 10 GB root partition)
Architecture 64-bit
I/O Performance High
API name m1.large
The configuration used to achieve this benchmark is as follows: JAVA_OPTS: -Dprogram.name=run.sh -server -Xms4g -Xmx4g -XX:MaxPermSize=512m -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -XX:+UseParallelGC -Djava.net.preferIPv4Stack=true

B.20.2. JCR Clustered Performance

Benchmark test using WebDAV (Complex read/write load test (benchmark)) with 20K same file. To obtain per-operation results we have used custom output from the test case threads to CSV file.
Read operation:
Warm-up iterations: 100
Run iterations: 2000
Background writing threads: 25
Reading threads: 225
Bar graph, displaying the cluster size on the horizontal axis, and Threads Per Second (TPS) indicated in increments of 200 on the vertical axis, with a top value of 2800 TPS.

Figure B.9. EC2 Performance Results - 225/25 threads

Table B.16. EC2 Performance Results - 225/25 Bar Graph Details

Nodes count TPS Responses >2s Responses >4s
1 523 6.87% 1.27%
2 1754 0.64% 0.08%
3 2388 0.49% 0.09%
4 2706 0.46% 0.1%
Read operation with more threads:
Warm-up iterations: 100
Run iterations: 2000
Background writing threads: 50
Reading threads: 450
Bar graph, displaying the cluster size on the horizontal axis, and Threads Per Second (TPS) indicated in increments of 200 on the vertical axis, with a top value of 2800 TPS.

Figure B.10. EC2 Performance Results - 450/50 threads

Table B.17. EC2 Performance Results - 450/50 Bar Graph Details

Nodes count tps Responses >2s Responses >4s
1 116 ? ?
2 1558 6.1% 0.6%
3 2242 3.1% 0.38%
4 2756 2.2% 0.41%

B.20.3. JBoss Enterprise Application Platform 6 Tuning

You can use maxThreads parameter to increase maximum amount of threads that can be launched in AS instance. This can improve performance if you need a high level of concurrency. also you can use -XX:+UseParallelGC java directory to use parallel garbage collector.

Note

Beware of setting maxThreads too big, this can cause OutOfMemoryError. We've got it with maxThreads=1250 on such machine:
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage (2×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High
API name: m1.large
java -Xmx 4g

B.20.4. JCR Cache Tuning

Cache size
JCR-cluster implementation is built using JBoss Cache as distributed, replicated cache. But there is one particularity related to remove action in it. Speed of this operation depends on the actual size of cache. As many nodes are currently in cache as much time is needed to remove one particular node (subtree) from it.
Eviction
Manipulations with eviction wakeUpInterval value does not affect on performance. Performance results with values from 500 up to 3000 are approximately equal.
Transaction Timeout
Using short timeout for long transactions such as Export/Import, removing huge subtree defined timeout may cause TransactionTimeoutException.

B.20.5. Clustering Tuning

For performance it is better to have a load-balancer, DB server and shared NFS on different computers. If in some reasons you see that one node gets more load than others you can decrease this load using load value in load balancer.
JGroups configuration
It's recommended to use "multiplexer stack" feature present in JGroups. It is set by default in eXo JCR and offers higher performance in cluster, using less network connections also. If there are two or more clusters in your network, please check that they use different ports and different cluster names.
Write performance in cluster
Exo JCR implementation uses Lucene indexing engine to provide search capabilities. But Lucene brings some limitations for write operations: it can perform indexing only in one thread. That is why write performance in cluster is not higher than in singleton environment. Data is indexed on coordinator node, so increasing write-load on cluster may lead to ReplicationTimeout exception. It occurs because writing threads queue in the indexer and under high load timeout for replication to coordinator will be exceeded.
Taking in consideration this fact, it is recommended to exceed replTimeout value in cache configurations in case of high write-load.
Replication timeout
Some operations may take too much time. So if you get ReplicationTimeoutException try increasing replication timeout:
   <clustering mode="replication" clusterName="${jbosscache-cluster-name}">
      ...
      <sync replTimeout="60000" />
   </clustering>
value is set in milliseconds.

B.20.6. Declaring the Datasources in the Application Server

To declare the datasources using a JBoss application server, deploy a ds file (XXX-ds.xml) into the deploy directory of the appropriate server profile (/server/PROFILE/deploy.

Example B.57. Declaring datasources

The file ds.xml configures all datasources which the portal requires. There are four data sources jdbcjcr_portal, jdbcjcr_portal-sample, jdbcidm_portal and jdbcidm_sample-portal that are configured.
<?xml version="1.0" encoding="UTF-8"?>
<datasources>
   <no-tx-datasource>
      <jndi-name>jdbcjcr_portal</jndi-name>
      <connection-url>jdbc:hsqldb:${jboss.server.data.dir}/data/jdbcjcr_portal</connection-url>
      <driver-class>org.hsqldb.jdbcDriver</driver-class>
      <user-name>sa</user-name>
      <password></password>
   </no-tx-datasource>

   <no-tx-datasource>
      <jndi-name>jdbcjcr_sample-portal</jndi-name>
      <connection-url>jdbc:hsqldb:${jboss.server.data.dir}/data/jdbcjcr_sample-portal</connection-url>
      <driver-class>org.hsqldb.jdbcDriver</driver-class>
      <user-name>sa</user-name>
      <password></password>
   </no-tx-datasource>

   <no-tx-datasource>
      <jndi-name>jdbcidm_portal</jndi-name>
      <connection-url>jdbc:hsqldb:${jboss.server.data.dir}/data/jdbcidm_portal</connection-url>
      <driver-class>org.hsqldb.jdbcDriver</driver-class>
      <user-name>sa</user-name>
      <password></password>
   </no-tx-datasource>

   <no-tx-datasource>
      <jndi-name>jdbcidm_sample-portal</jndi-name>
      <connection-url>jdbc:hsqldb:${jboss.server.data.dir}/data/jdbcidm_sample-portal</connection-url>
      <driver-class>org.hsqldb.jdbcDriver</driver-class>
      <user-name>sa</user-name>
      <password></password>
   </no-tx-datasource>
</datasources>

B.20.6.1. Binding Datasources

Warning

Do not let the portal explicitly bind datasources.
To prevent the portal from binding datasources explicitly perform the following steps:
  1. Edit the $JPP_HOME/standalone/configuration/gatein/configuration.properties and comment out the following rows in the JCR section:
    #gatein.jcr.datasource.driver=org.hsqldb.jdbcDriver
    #gatein.jcr.datasource.url=jdbc:hsqldb:file:${gatein.db.data.dir}/data/jdbcjcr_${name}
    #gatein.jcr.datasource.username=sa
    #gatein.jcr.datasource.password=
  2. Comment out the following lines in the IDM section:
    #gatein.idm.datasource.driver=org.hsqldb.jdbcDriver
    #gatein.idm.datasource.url=jdbc:hsqldb:file:${gatein.db.data.dir}/data/jdbcidm_${name}
    #gatein.idm.datasource.username=sa
    #gatein.idm.datasource.password=
  3. Open the jcr-configuration.xml and idm-configuration.xml files and comment out references to the plug-in InitialContextInitializer.
    <!-- Commented because, Datasources are declared and bound by AS, not in eXo -->
    <!--
    <external-component-plugins>
        [...]
    </external-component-plugins>
    -->