-
Language:
English
-
Language:
English
Red Hat Training
A Red Hat training course is available for Red Hat JBoss Data Virtualization
Chapter 7. Query Language Grammars
The hierarchical database supports multiple query languages, including all four languages defined in the JCR 1.0 specification and JCR 2.0 specification .
7.1. JCR-SQL2
The JCR-SQL2 query language is defined by the JCR 2.0 specification as a way to express queries using strings that are similar to SQL. This query language is an improvement over the earlier JCR-SQL language, providing among other things far richer specifications of joins and criteria.
7.1.1. Extensions to JCR-SQL2
The hierarchical database includes full support for the complete JCR-SQL2 query language defined by the specification. However, there are several extensions provided to make it even more powerful:
- Support for the
FULL OUTER JOIN
andCROSS JOIN
join types, in addition to theLEFT OUTER JOIN
,RIGHT OUTER JOIN
andINNER JOIN
types defined by JCR-SQL2. Note thatJOIN
is a shorthand forINNER JOIN
. For detail, see the grammar for "joins" . - Support for the
UNION
,INTERSECT
, andEXCEPT
set operations on multiple result sets to form a single result set. As with standard SQL, the result sets being combined must have the same columns. TheUNION
operator combines the rows from two result sets, theINTERSECT
operator returns the difference between two result sets, and theEXCEPT
operator returns the rows that are common to two result sets. Duplicate rows are removed unless the operator is followed by theALL
keyword. For detail, see the grammar for "set queries" . - Removal of duplicate rows in the results, using
SELECT DISTINCT ...
expression. For detail, see the grammar for queries . - Limiting the number of rows in the result set with the
LIMIT count
clause, wherecount
is the maximum number of rows that should be returned. This clause may optionally be followed by theOFFSET number
clause to specify the number of initial rows that should be skipped. For detail, see the grammar for "limits and offsets" . - Additional dynamic operands
DEPTH(selectorName)
andPATH(selectorName)
that enable placing constraints on the node depth and path, respectively. These dynamic operands can be used in a manner similar toNAME(selectorName)
andLOCALNAME(selectorName)
that are defined by JCR-SQL2. Note in each of these cases, theselectorName
is optional if there is only one selector in the query. For detail, see the grammar for "dynamic operands" . - Additional dynamic operand
REFERENCE(selectorName.propertyName)
andREFERENCE(selectorName)
that enables placing constraints on one or any of the reference properties, respectively, and which can be used in a manner similar to the standard dynamic operandPropertyValue(selectorName.propertyName)
. Note in each of these cases, theselectorName
is optional if there is only one selector in the query, and that thepropertyName
can be excluded if the constraint should apply to all reference properties. For detail, see the grammar for "dynamic operands" . - Support for the
IN
andNOT IN
clauses to more easily and concisely supply multiple of discrete static operands. For example,WHERE ... [my:type].[prop1] IN (3,5,7,10,11,50) ...
. For detail, see the grammar for "set constraints" . - Support for the
BETWEEN
clause to more easily and concisely supply a range of discrete operands. For example,WHERE ... [my:type].[prop1] BETWEEN 3 EXCLUSIVE AND 10 ...
. For detail, see the grammar for "between constraints" . - Support for simple arithmetic in numeric-based criteria and order-by clauses. For example,
... WHERE SCORE(type1) + SCORE(type2) > 1.0
or... ORDER BY (SCORE(type1) * SCORE(type2)) ASC, LENGTH(type2.property1) DESC
. For detail, see the grammar for "order-by clauses" . - Support for (non-correlated) subqueries in the
WHERE
clause, wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g.,IN (...)
), then all of the subquery's rows will be used. For example, this expressionWHERE ... [my:type].[prop1] IN ( SELECT [my:prop2] FROM [my:type2] WHERE [my:prop3] < '1000' ) AND ...
will use the results of the subquery as the literal values in theIN
clause. See the "subqueries" section for more information. - Support for several pseudo-columns (
jcr:path
,jcr:score
,jcr:name
,mode:localName
, andmode:depth
) that can be used in theSELECT
, equijoin, andWHERE
clauses. These pseudo-columns make it possible to return location-related and score information within theQueryResult
's rows. They also make queries look more like SQL, and thus may be more friendly and easier to use in existing SQL-aware client applications. See the "pseudo-columns" section for more information. - Support for
NOT LIKE
as an operator in comparison criteria, and which is equivalent to wrapping aLIKE
comparison criteria in aNOT(...)
clause.
7.1.2. Extended JCR-SQL2 Grammar
The full grammar for the hierarchical database's extended JCR-SQL2 support is a strict superset of that defined by the JCR 2.0 specification. In other words, Any JCR-SQL2 query that uses the standard grammar it supported, as well as queries that make use of the provided extensions.
7.1.2.1. Queries
The top-level rule for the extended JCR-SQL2 grammar is
QueryCommand
, which consists of both Query
and SetQuery
:
QueryCommand ::= Query | SetQuery SetQuery ::= Query ('UNION'|'INTERSECT'|'EXCEPT') ['ALL'] Query { ('UNION'|'INTERSECT'|'EXCEPT') ['ALL'] Query } Query ::= 'SELECT' ['DISTINCT'] columns 'FROM' Source ['WHERE' Constraint] ['ORDER BY' orderings] [Limit]
The hierarchical database adds the concept of a set query, which is a query that performs a union , intersection , or complement of the results of two other queries. Set queries are common in SQL (which is essentially a set manipulation language) and are a very useful tool that would otherwise require significant processing of the results of multiple queries by the application. By supporting set queries, the application merely needs to declare that set operation be performed, and the hierarchical database will perform all the work before returning the results.
There is also the ability to use
SELECT DISTINCT
, which eliminates duplicate rows in a manner similar to SQL.
7.1.2.2. Source
A source is a named set of tuples, which in the hierarchical database corresponds to the nodes of a particular named node type. In other words, a source is equivalent to a table in a relational database. The available columns of a source are the named properties declared on the node type.
In the JCR-SQL2 grammar, a source is either a selector (a named node type) or a join specification:
Source ::= Selector | Join Selector ::= nodeTypeName ['AS' selectorName] nodeTypeName ::= Name selectorName ::= /* A string that contains only SQL-legal characters, and which can be used elsewhere in the query to refer to the selector. */
See Also:
7.1.2.3. Joins
The JCR 2.0 specification does include joins in the standard JCR-SQL2 grammar, though the only defined types of joins included inner , left outer , and right outer joins. Because SQL also defines the useful full outer and cross join types, the hierarchical database adds support for these.
Join ::= left [JoinType] 'JOIN' right 'ON' JoinCondition /* If JoinType is omitted INNER is assumed. */ left ::= Source right ::= Source JoinType ::= Inner | LeftOuter | RightOuter | FullOuter | Cross Inner ::= 'INNER' ['JOIN'] LeftOuter ::= 'LEFT JOIN' | 'OUTER JOIN' | 'LEFT OUTER JOIN' RightOuter ::= 'RIGHT OUTER' ['JOIN'] RightOuter ::= 'FULL OUTER' ['JOIN'] RightOuter ::= 'CROSS' ['JOIN'] JoinCondition ::= EquiJoinCondition | SameNodeJoinCondition | ChildNodeJoinCondition | DescendantNodeJoinCondition
Each of the four kinds of join conditions are described below.
- join condition
- An equijoin is a join that uses only equality comparisons in the join predicate (or join condition). Using any other operators (e.g., '
<
' or '!=
') in the join condition disqualifies a query from being an equi-join.Therefore, the rules for the equi-join condition are as follows:EquiJoinCondition ::= selector1Name'.'property1Name '=' selector2Name'.'property2Name selector1Name ::= selectorName selector2Name ::= selectorName property1Name ::= propertyName property2Name ::= propertyName propertyName ::= Name
where the node type referenced by the selector identified in the query with theselector1Name
must contain the property given by theproperty1Name
literal, and similarly the node type referenced by the selector identified in the query with theselector2Name
must contain the property given by theproperty2Name
literal.See also the "name rule" . - node join condition
- An identity join is a special case of an equijoin, where the compared properties are node identifiers. Thus the join condition of an identity join constrains the node on one sides of the join to be the same node on the other side of the join. The standard JCR-SQL2 grammar defines a special function that makes this a little easier to use:
SameNodeJoinCondition ::= 'ISSAMENODE(' selector1Name ',' selector2Name [',' selector2Path] ')' selector1Name ::= selectorName selector2Name ::= selectorName selector2Path ::= Path
See also the "path rule" . - Child-node join condition
- A child-node join is one where the join condition constrains the node on the left side of the join to be a child of the node on the right side of the join. The standard JCR-SQL2 grammar defines a special function that makes it easier to specify such join conditions:
ChildNodeJoinCondition ::= 'ISCHILDNODE(' childSelectorName ',' parentSelectorName ')' childSelectorName ::= selectorName parentSelectorName ::= selectorName
- Descendant-node join condition
- A descendant-node join is one where the join condition constrains the node on the left side of the join to be a descendant of the node on the right side of the join. The standard JCR-SQL2 grammar defines a special function that makes it easier to specify such join conditions:
DescendantNodeJoinCondition ::= 'ISDESCENDANTNODE(' descendantSelectorName ',' ancestorSelectorName ')' descendantSelectorName ::= selectorName ancestorSelectorName ::= selectorName
See Also:
7.1.2.4. Constraints
The "query rule" included a
WHERE
clause that can define multiple constraints on the nodes included in the results. The standard JCR-SQL2 grammar defined several such constraints, including and , or , not , comparison , property existence , full-text search , same-node , child-node , and descendant-node constraints. The hierarchical database supports all of these, but adds two others: between and set constraints.
Constraint ::= ConstraintItem | '(' ConstraintItem ')' ConstraintItem ::= And | Or | Not Comparison | Between | PropertyExistence | SetConstraint | FullTextSearch | SameNode | ChildNode | DescendantNode
Each of these types of constraints are described below.
- And constraint
- An and constraint stipulates that a node (or record or tuple) is included only if two other constraints are both true.
And ::= constraint1 'AND' constraint2 constraint1 ::= Constraint constraint2 ::= Constraint
- Or constraint
- An or constraint stipulates that a node (or record or tuple) is included if either of two other constraints are true.
Or ::= constraint1 'OR' constraint2 constraint1 ::= Constraint constraint2 ::= Constraint
- Not constraint
- The not qualifier will negate another constraint, requiring that a node (or record or tuple) is included if the other constraint is not true.
Not ::= 'NOT' constraint constraint ::= Constraint
- Comparison constraint
- A comparison constraint requires that the value for a node described by the dynamic operand on the left side of the operator is to be compared to a static literal value. The term "dynamic operand" is used in the JCR-SQL2 grammar because its value can only be determined during query evaluation.
Comparison ::= DynamicOperand Operator StaticOperand Operator ::= '=' | '!=' | '<' | '<=' | '>' | '>=' | 'LIKE' | 'NOT LIKE'
The behavior of the operators is dictated by the JCR 2.0 specification and matches howValue
objects are compared:- If the
DynamicOperand
evaluates to null, the constraint is not satisfied. - If the '
=
' operator is used, the value that theDynamicOperand
evaluates to must equal theStaticOperand
value for the constraint to be satisfied. - If the '
!=
' operator is used, the value that theDynamicOperand
evaluates to must not equal theStaticOperand
value for the constraint to be satisfied. - If the '
<
' operator is used, the value that theDynamicOperand
evaluates to must be less than theStaticOperand
value for the constraint to be satisfied. - If the '
<=
' operator is used, the value that theDynamicOperand
evaluates to must be less than or equal to theStaticOperand
value for the constraint to be satisfied. - If the '
>
' operator is used, the value that theDynamicOperand
evaluates to must be greater than theStaticOperand
value for the constraint to be satisfied. - If the '
>=
' operator is used, the value that theDynamicOperand
evaluates to must be greater than or equal to theStaticOperand
value for the constraint to be satisfied. - If the '
LIKE
' operator is used, the constraint is only satisfied if the value that theDynamicOperand
evaluates to match the pattern specified by the string literalStaticOperand
, where in the pattern:- the character '
%
' matches zero or more characters, and - the character '
_
' (underscore) matches exactly one character, and - the string '
\x
' matches the character 'x
', and - all other characters match themselves
- If the
NOT LIKE
operator is used, the constraint is only satisfied if the value that theDynamicOperand
evaluates to not match the pattern specified by the string literalStaticOperand
, where in the pattern:- the character '
%
' matches zero or more characters, and - the character '
_
' (underscore) matches exactly one character, and - the string '
\x
' matches the character 'x
', and - all other characters match themselves
Also, note that, unlike SQL, the standard JCR-SQL2 grammar does not allow the left-hand side and right-hand sides of a comparison constraint to be swapped. - Between constraint
- The between constraint is one of the extensions defined by the hierarchical database, and allows a query to more easily represent a range of static values than using only the constraints available in the standard JCR-SQL2 grammar. The between constraint is based on the similar expression in SQL.
Between ::= DynamicOperand ['NOT'] 'BETWEEN' lowerBound ['EXCLUSIVE'] 'AND' upperBound ['EXCLUSIVE'] lowerBound ::= StaticOperand upperBound ::= StaticOperand
- Property existence constraint
- A property existence constraint stipulates that a property does indeed exist on a node that is of the node type specified by the named selector. the hierarchical database does allow the
NOT
qualifier to be excluded, which turns the constraint into a stipulation that the property does not exist on the node.PropertyExistence ::= [selectorName'.']propertyName 'IS' ['NOT'] 'NULL' /* If only one selector exists in this query, explicit specification of the selectorName preceding the propertyName is optional */
- Set constraint
- Like the "between constraint", the set constraint is an extension to the standard JCR-SQL2 grammar that allows what would normally be a complicated combination of standard JCR-SQL2 constraints to be more easily represented with a single, simple expression. Again, this constraint is patterned after the similar expression in SQL.
SetConstraint ::= [selectorName '.']propertyName ['NOT'] 'IN' '(' firstStaticOperand {',' additionalStaticOperand } ')' /* If only one selector exists in this query, explicit specification of the selectorName preceding the propertyName is optional */ firstStaticOperand ::= StaticOperand additionalStaticOperand ::= StaticOperand
Note that multiple static operands can be included in the comma-separated list.Although this rule seems complicated, it is actually very straightforward. The following query selects all the properties defined on theacme:taggable
node type, returning only those "taggable" nodes with aacme:tagname
value of "tag1", "tag2", "tag3", or "tag4":SELECT * FROM [acme:taggable] as tagged WHERE tagged.[acme:tagName] IN ('tag1','tag2','tag3','tag4')
Even this trivial query is quite a bit simpler and easier to understand than if the query had used only the constraints defined by the standard JCR-SQL2 grammar:SELECT * FROM [acme:taggable] as tagged WHERE tagged.[acme:tagName] = 'tag1' OR tagged.[acme:tagName] = 'tag2' OR tagged.[acme:tagName] = 'tag3' OR tagged.[acme:tagName] = 'tag4'
Imagine how complicated a query might be with multiple joins, multiple criteria, and many values to be compared for one or several different properties. - text search constraint
FullTextSearch ::= 'CONTAINS(' ([selectorName'.']propertyName | selectorName'.*') ',' ''' fullTextSearchExpression''' ')' /* If only one selector exists in this query, explicit specification of the selectorName preceding the propertyName is optional */ fullTextSearchExpression ::= FulltextSearch
The full-text search expression is a string literal that adheres to the "full-text search" grammar described below.An example query selects all the properties defined on theacme:taggable
node type, returning only those "taggable" nodes with aacme:tagname
value that contains the "foo" term within the value:SELECT * FROM [acme:taggable] as tagged WHERE CONTAINS(tagged.[acme:tagName],'foo')
- node constraint
- The same-node constraint stipulates that the node appearing in the selector with the given name has a path that matches the literal path provided.
SameNode ::= 'ISSAMENODE(' [selectorName ','] Path ')' /* If only one selector exists in this query, explicit specification of the selectorName preceding the propertyName is optional */
Because this standard constraint clause is not really like traditional SQL, the hierarchical database defines ajcr:path
"pseudo-column" that can be used in "comparison constraints" and that allows for using other comparison operators, includingLIKE
. - Child-node constraint
- The child-node constraint stipulates that the node appearing in the selector with the given name is a child of a node with a path that matches the literal path provided.
ChildNode ::= 'ISCHILDNODE(' [selectorName ','] Path ')' /* If only one selector exists in this query, explicit specification of the selectorName preceding the propertyName is optional */
See also thejcr:path
"pseudo-column" that can be used in "comparison constraints" and that allows for using other comparison operators, includingLIKE
. And because the right hand side (i.e., static operand) of aLIKE
expression can involve wildcards, it may be easier and more understandable to use the pseudo-column. - Descendant-node constraint
- The descendant-node constraint stipulates that the node appearing in the selector with the given name is a descendant of a node with a path that matches the literal path provided.
DescendantNode ::= 'ISDESCENDANTNODE(' [selectorName ','] Path ')' /* If only one selector exists in this query, explicit specification of the selectorName preceding the propertyName is optional */
See also thejcr:path
"pseudo-column" that can be used in "comparison constraints" and that allows for using other comparison operators, includingLIKE
. And because the right hand side (i.e., static operand) of aLIKE
expression can involve wildcards, it may be easier and more understandable to use the pseudo-column.
7.1.2.5. Path and Name
Many of the rules above have used paths and names, and the rules for these are defined as follows:
Name ::= '[' quotedName ']' | '[' simpleName ']' | simpleName quotedName ::= /* A JCR Name (see the JCR specification) */ simpleName ::= /* A JCR Name that contains only SQL-legal characters (namely letters, digits, and underscore) */ Path ::= '[' quotedPath ']' | '[' simplePath ']' | simplePath quotedPath ::= /* A JCR Path that contains non-SQL-legal characters */ simplePath ::= /* A JCR Path (rather Name) that contains only SQL-legal characters (namely letters, digits, and underscore) */
Note that JCR-SQL2 surrounds identifiers with square brackets (e.g., '
[
' and ']
'), allowing names to contain a ':
' character needed with namespaced names. If the names or paths only contain valid SQL characters, then they do not need to be quoted.
7.1.2.6. Static Operand
In the standard JCR-SQL2 grammar, a static operand appears on the right-hand side of an operator, and represents an expression whose value can be determined by static analysis of the query (e.g., when the query is parsed ). In particular, a static operand in the standard JCR-SQL2 grammar comprised of either a literal value or a variable.
In SQL, however, the expression that appears on the right-hand side of an operator is not always able to be determined at query parse time. An example is a subquery, which appears on the right hand side but obviously can only be evaluated into values during query execution time. Since standard JCR-SQL2 does not include any such features, the term "static operand" is technically valid.
In addition to literal values and variables, the hierarchical database also supports "subqueries" appearing on the right-hand side of an operator. So this grammar continues to use the "static operand" term for easy comparison with the standard JCR-SQL2 grammar, but the term has a different (and expanded) semantic than in the standard grammar.
Therefore, the rules for what the hierarchical database allows on the right-hand side of an operator in a constraint is as follows:
StaticOperand ::= Literal | BindVariableValue | Subquery Literal ::= CastLiteral | UncastLiteral CastLiteral ::= 'CAST(' UncastLiteral ' AS ' PropertyType ')' PropertyType ::= 'STRING' | 'BINARY' | 'DATE' | 'LONG' | 'DOUBLE' | 'DECIMAL' | 'BOOLEAN' | 'NAME' | 'PATH' | 'REFERENCE' | 'WEAKREFERENCE' | 'URI' UncastLiteral ::= UnquotedLiteral | ''' UnquotedLiteral ''' | '"' UnquotedLiteral '"' UnquotedLiteral ::= /* String form of a JCR Value, as defined in the JCR specification */
- Bind variable
- The standard JCR-SQL2 grammar supports using variable names within a query, where the values for those variables are bound to the
Query
object before execution. In the query, the variable names are prefixed with a '$
' character and are otherwise normal JCR name:BindVariableValue ::= '$'bindVariableName bindVariableName ::= /* A string that conforms to the JCR Name syntax, though the prefix does not need to be a registered namespace prefix. */
So, consider this simple query that selects all the properties defined on theacme:taggable
node type, and that returns only those "taggable" nodes with aacme:tagname
value that matches the value of thetagValue
variable:SELECT * FROM [acme:taggable] as tagged WHERE tagged.[acme:tagName] = $tagValue
This query could be evaluated using the JCR API as follows:// Obtain the query manager for the session via the workspace ... javax.jcr.Session session = // ... javax.jcr.query.QueryManager mgr = session.getWorkspace().getQueryManager(); // Create a query object ... String language = ... String expression = ... javax.jcr.query.Query query = queryManager.createQuery(expression,language); // Bind a value to the variable ... Value tag = session.getValueFactory().create("foo"); query.bindVariable("tagValue",tag); // Execute the query and get the results ... javax.jcr.query.QueryResult result = query.execute();
Obviously multiple variables can be used in a query expression, but a value must be bound to every variable before theQuery
object can be executed. - Subquery
- The standard JCR-SQL2 grammar does not support subqueries. But subqueries are such a useful feature, so the hierarchical database supports using multiple subqueries within a single query. In fact, subqueries are nothing more than a
QueryCommand
, which if you'll remember is the top-level rule in the grammar. That means that subqueries can be any query, and you can even include subqueries within a subquery!Subquery ::= '(' QueryCommand ')' | QueryCommand
Strictly speaking, the hierarchical database only supports non-correlated subqueries, which means that they can actually be evaluated independently (outside the context of the containing query).Additionally, because subqueries appear on the right-hand side of an operator, all subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g.,IN (...)
), then all of the subquery's rows will be used.For example, in the following query fragment, the first value in each row of the subquery's results will be used within theIN
clause of the outer query:WHERE ... [my:type].[prop1] IN ( SELECT [my:prop2] FROM [my:type2] WHERE [my:prop3] < '1000' ) AND ...
However, changing theIN
clause to a comparison results in only the first value in the first row of the subquery's results being using in the comparison criteria:WHERE ... [my:type].[prop1] = ( SELECT [my:prop2] FROM [my:type2] WHERE [my:prop3] < '1000' ) AND ...
7.1.2.7. Dynamic Operand
In various constraints described above, the dynamic operand appears on the left-hand side of an operator, and signifies that the values can only be determined when the query is evaluated.
The standard JCR-SQL2 grammar defines seven kinds of dynamic operands: property value , length , node name , node local name , full-text search score , lowercase , and uppercase .
The hierarchical database supports all these types, but adds support for four more: reference value , node path , node depth , and simple arithmetic clauses . The hierarchical database also allows the dynamic operand to be surrounded by parentheses, which is sometimes convenient for complex queries.
The
DynamicOperand
rule in the extended grammar is:
DynamicOperand ::= PropertyValue | ReferenceValue | Length | NodeName | NodeLocalName | NodePath | NodeDepth | FullTextSearchScore | LowerCase | UpperCase | Arithmetic | '(' DynamicOperand ')'
Each of these types of dynamic operands is described below.
- Property value operand
- The property value operand always evaluates to the value(s) of the specified property on the selector.
PropertyValue ::= [selectorName'.'] propertyName /* If only one selector exists in this query, explicit specification of the selectorName preceding the propertyName is optional. */
Note that if the property is multi-valued, the constraint will be satisfied if any of the property values works with the constraint. For example, if theacme:tagNames
property is a multi-valued property declared on theacme:taggable
node type, then the following query will finds allacme:taggable
nodes that has "foo" for at least one of the values of theacme:tagNames
property:SELECT * FROM [acme:taggable] as tagged WHERE tagged.[acme:tagNames] = 'foo'
- Reference value operand
- One of the extensions is to support the a
REFERENCE(...)
dynamic operand, which enables placing constraints on one or any of the reference properties.ReferenceValue ::= 'REFERENCE(' selectorName '.' propertyName ')' | 'REFERENCE(' selectorName ')' | 'REFERENCE()' | /* If only one selector exists in this query, explicit specification of the selectorName preceding the propertyName is optional. Also, the property name may be excluded if the constraint should apply to any reference property.*/
TheREFERENCE
operand always evaluates to the identifier of the referenced nodes in one or all of the REFERENCE properties. Thus, all of theREFERENCE
operands should be used with aStaticOperand
that also evaluates to identifiers.TheREFERENCE()
operand (with no selector name and no property name) evaluates to the identifiers of the nodes referenced by all of reference properties on the node in the only selector. TheREFERENCE(selectorName)
works the same way, but must be used if there is more than one selector in the query. Finally, theREFERENCE(selectorName.propertyName)
evaluates to the identifiers of nodes referenced by thepropertyName
reference property on the nodes in the named selector.For example, here is a query that finds all nodes that reference a set of nodes for which we already know the identifiers,id1
,id2
, andid3
.SELECT * FROM [nt:base] WHERE REFERENCE() IN ('id1','id2','id3')
This operand works really well with subqueries or variables for the right-hand side. For example, here is a query finds all nodes that reference any of the nodes in the subgraph below the/foo/bar/baz
node, where a subquery is used to find all nodes in the subgraph:SELECT * FROM [nt:base] WHERE REFERENCE() IN ( SELECT [jcr:uuid] FROM [nt:base] AS refd WHERE ISDESCENDANT(refd,'/foo/bar/baz') )
This kind of query is impossible to do using standard JCR-SQL2 features, and shows some of the power of the extensions to JCR-SQL2. - Length operand
- The length operand evaluates to the length (or lengths, if multi-valued) of a property. The length is defined to be:
- for a
BINARY
value, the number of bytes in the value, or - for all other value types, the number of characters of the string resulting from a conversion of the value to a string.
The rule for the length operand is:Length ::= 'LENGTH(' PropertyValue ')'
- Node name operand
- The node name operand always evaluates to the prefixed name of the node given by the supplied selector:
NodeName ::= 'NAME(' [selectorName] ')' /* If only one selector exists in this query, explicit specification of the selectorName is optional */
See also thejcr:name
"pseudo-column", which enables accessing the JCR name of any node as if the name were a regular property on any node. - Node local name operand
- The node name operand always evaluates to the local name of the node given by the supplied selector:
NodeLocalName ::= 'LOCALNAME(' [selectorName] ')' /* If only one selector exists in this query, explicit specification of the selectorName is optional */
See also themode:localName
"pseudo-column", which enables accessing the local name of any node as if the local name were a regular property. - Node depth operand
- The node depth operand is an extension to the standard set of dynamic operands specific to the hierarchical database, and evaluates to the integer depth of the node given by the supplied selector. The depth of a node is defined to be the number of segments in the node's path. For example, the depth of the root node is 0, whereas the depth of the node at
/foo/bar/baz
is 3.NodeDepth ::= 'DEPTH(' [selectorName] ')' /* If only one selector exists in this query, explicit specification of the selectorName is optional */
See also themode:depth
"pseudo-column", which enables accessing the depth of any node as if the depth were a regular property. - Node path operand
- The node path operand is an extension to the standard set of dynamic operands specific to the hierarchical database, and evaluates to the path of the node given by the supplied selector.
NodePath ::= 'PATH(' [selectorName] ')' /* If only one selector exists in this query, explicit specification of the selectorName is optional */
See also thejcr:path
"pseudo-column", which enables accessing the path of any node as if the path were a regular property. - Full text search score operand
- The full-text search score operand evaluates to a
DOUBLE
value equal to the full-text search score of a node. The full-text search score ranks a selector's nodes by their relevance to thefullTextSearchExpression
specified in a[FullTextSearch|#Fulltextsearchconstraint
. The magnitude of the scores are implementation specific, but most implementations will produce higher scores with more relevant matching and lower scores for less-relevant matching.FullTextSearchScore ::= 'SCORE(' [selectorName] ')' /* If only one selector exists in this query, explicit specification of the selectorName is optional */
See also thejcr:score
"pseudo-column", which enables accessing the score of any node as if the score were a regular property. - Lowercase operand
- The lowercase operand evaluates to the lower-case string value (or values, if multi-valued) of operand. If the operand does not evaluate to a string value, its value is first converted to a string.
LowerCase ::= 'LOWER(' DynamicOperand ')'
- Uppercase operand
- The uppercase operand evaluates to the upper-case string value (or values, if multi-valued) of operand. If the operand does not evaluate to a string value, its value is first converted to a string.
LowerCase ::= 'LOWER(' DynamicOperand ')'
- Arithmetic operand
- The arithmetic operand is an extension to the standard JCR-SQL2 grammar specific to the hierarchical database. It allows two other dynamic operands that evaluate to numeric values to be numerically combined using addition, subtraction, multiplication, or division.
Arithmetic ::= DynamicOperand ('+'|'-'|'*'|'/') DynamicOperand
For example, the following query restricts the results such that the sum of the score of nodes originating from separate selectors is greater than 1.0:SELECT * FROM [acme:type1] AS type1 JOIN [acme:type2] as type2 ON type1.prop1 < type2.prop2 WHERE SCORE(type1) + SCORE(type2) > 1.0
So although it is possible to use in theWHERE
clause, it is more likely to be used in the order-by clauses . For example, the following query orders the results based upon the difference in the scores of nodes in the two selectors:SELECT * FROM [acme:type1] AS type1 JOIN [acme:type2] as type2 ON type1.prop1 < type2.prop2 ORDER BY ( SCORE(type1) - SCORE(type2) ) ASC, LENGTH(type2.prop3) DESC
7.1.2.8. Ordering
The
ORDER BY
clause defined by the standard JCR-SQL2 grammar allows the order of the results to be dictated by the values evaluated at execution time based upon one or more "dynamic operands" . The rule for the expression is as follows:
orderings ::= Ordering {',' Ordering} Ordering ::= DynamicOperand [Order] Order ::= 'ASC' | 'DESC'
As with SQL, the
ASC
qualifier specifies that the ordering should be in ascending order, and is the default; likewise, the DESC
qualifier specifies that the ordering should be in descending order.
See Also:
7.1.2.9. Columns
The standard JCR-SQL2 grammar allows a query to include in the
SELECT
clause which property values should be returned and included in the results:
columns ::= (Column ',' {Column}) | '*' Column ::= ([selectorName'.']propertyName ['AS' columnName]) | (selectorName'.*') /* If only one selector exists in this query, explicit specification of the selectorName is optional */ selectorName ::= Name propertyName ::= Name columnName ::= Name
When '
*
' "' is used for the list of selected columns, the result set is expected to minimally include, for each selector, a column for each single-valued non-residual property of the selector's node type, including those explicitly declared on the node type and those inherited from the node's supertypes.
For example, the result set for the following query would contain at least the
[jcr:primaryType]
column, since it is the only single-valued, non-residual property defined on the [nt:base]
node type. The [jcr:mixinTypes]
property is also non-residual, but the results need not include it since it is multi-valued.
SELECT * FROM [nt:base]
If there are multiple selectors, then
SELECT *
will include all of the selectable columns from each selector's node type. However, it is possible to request all of the selectable columns from some of the selectors, using the form. For example:
SELECT type1.* FROM [acme:type1] AS type1 JOIN [acme:type2] as type2 ON type1.prop1 < type2.prop2
Note, however, that although only single-valued, non-residual properties are included when '
*
' "' is used in the SELECT
clause, it is possible to explicitly include residual properties. For example, the following query finds all nodes that have at least one "foo" value for the acme:tagNames
property:
SELECT [acme:tagNames] AS tagName FROM [nt:base] WHERE tagName = 'foo'
7.1.2.10. Limit and Offset
Neither the standard JCR-SQL2 grammar or the JCR API itself provide support for limiting the rows that are returned in the results. This is a common need, especially for applications that paginate the results, where each page shows a subset of the results.
Because this is such an essential feature that cannot be accomplished any other way, the hierarchical database adds support for specifying the maximum number of rows to return, and optionally specifying the number of initial rows that should be skipped. This extension follows the SQL syntax:
Limit ::= 'LIMIT' count [ 'OFFSET' offset ] count ::= /* Positive integer value */ offset ::= /* Non-negative integer value */
The
LIMIT
clause is entirely optional, and if absent does not limit the result set rows in any way. However, if the LIMIT count
clause is used, then the result set will contain no more than count
rows. This LIMIT
clause may optionally be followed by the OFFSET number
clause, where number
is the number of initial rows that should be skipped before the rows are included in the results.
7.1.2.11. Psuedo-Columns
The design of the JCR-SQL2 query language makes fairly heavy use of functions, including
SCORE()
, NAME()
, LOCALNAME()
, and various constraints. The hierarchical database provides several more useful functions, such as PATH()
and DEPTH()
, that follow the same patterns.
However, these functions have several disadvantages. First, they make the JCR-SQL2 language less "SQL-like", since SQL-92 and -99 do not define similar kinds of functions. (There are aggregate functions, like
COUNT
, SUM
, etc., but they operate on a particular column in all tuples and are therefore more dissimilar than similar.) This means that applications that use SQL and SQL-like query languages are less likely to be able to build and issue JCR-SQL2 queries.
A second disadvantage of these functions is that JCR-SQL2 does not allow them to be used within the
SELECT
clause. As a result, the location-related and score information cannot be included as columns of values in the QueryResult
rows. Instead, a client can only access this information by obtaining the Node
object(s) for each row. Relying upon both the result set and additional Java objects makes it difficult to use the JCR query system. It also makes certain kinds of applications impossible.
For example, the hierarchical database's JDBC driver is designed to enable JDBC-aware applications to query repository content using JCR-SQL2 queries. The standard JDBC API cannot expose the
Node
objects, so the only way to return the path-related and score information is through additional columns in the result. While such columns could always "magically" appear in the result set, doing this is not compatible with JDBC applications that dynamically build the SELECT
clauses of queries based upon database metadata. Such applications require the columns to be properly described in database metadata, and the columns need to be used within queries.
The hierarchical database attempts to solve these issues by directly supporting a number of "pseudo-columns" within JCR-SQL2 queries, wherever columns can be used. These "pseudo-columns" include:
jcr:score
is a column of type DOUBLE that represents the full-text search score of the node, which is a measure of the node's relevance to the full-text search expression. The hierarchical database does compute the scores for all queries, though the score for rows in queries that do not include a full-text search criteria may not be reliable.jcr:path
is a column of type PATH that represents the normalized path of a node, including same-name siblings. This is the same as what would be returned by the getPath() method of Node. Examples of paths include "/jcr:system" and "/foo/bar3".jcr:name
is a column of type NAME that represents the node name in its namespace-qualified form using namespace prefixes and excluding same-name-sibling indexes. Examples of node names include "jcr:system", "jcr:content", "ex:UserData", and "bar".mode:localName
is a column of type STRING that represents the local name of the node, which excludes the namespace prefix and same-name-sibling index. As an example, the local name of the "jcr:system" node is "system", while the local name of the "ex:UserData3" node is "UserData".mode:depth
is a column of type LONG that represents the depth of a node, which corresponds exactly to the number of path segments within the path. For example, the depth of the root node is 0, whereas the depth of the "/jcr:system/jcr:nodeTypes" node is 2.
All of these pseudo-columns can be used in the
SELECT
clause of any JCR-SQL2 query, and their use defines whether such columns appear in the result set. In fact, all of these pseudo-columns will be included when SELECT *
clauses in JCR-SQL2 queries are expanded by the query engine. This means that every node type (even mixin node types that have no properties and are essentially markers) are represented by a queryable table with at least one column. However, unlike the older JCR-SQL query language, these pseudo-columns are never included in the result unless explicitly included or implicitly included with the SELECT *
clause.
Note
Why did the hierarchical database use the
jcr
namespace prefix for some of the pseudo-columns, and mode
for the others? The older JCR-SQL language defined the jcr:score
, jcr:path
, and jcr:name
pseudo-columns, so we use the same names. The other columns were unique to the hierarchical database and are therefore defined with the mode
namespace prefix.
Like any other column, all of these pseudo-columns can be also be used in the
WHERE
clause of any JCR-SQL2 query, even if they are not included in the SELECT
clause. They can be used anywhere that a regular column can be used, including within constraints and dynamic operands. The hierarchical database will automatically rewrite queries that use pseudo-columns in the dynamic operands of constraints to use the corresponding function, such as SCORE()
, PATH()
, NAME()
, LOCALNAME()
, and DEPTH()
. Additionally, any property existence constraint using these pseudo-columns will always evaluate to 'true' (and thus the hierarchical database's query optimizer will always remove such constraints from the query plan).
The
jcr:path
pseudo-column may also be used on both sides of an "equijoin" constraint clause. For example, equijoin expressions similar to:
... selector1.[jcr:path] = selector2.[jcr:path] ...
will be automatically rewritten by the hierarchical database's optimizer to the following form:
... ISSAMENODE(selector1,selector2) ...
As with regular columns, the pseudo-columns must be qualified with the selector name if the query contains more than one selector.
7.1.3. Full-text Search Grammar
The grammar for the full-text search expressions used in the JCR-SQL2's "full-text search constraint" is as follows:
FulltextSearch ::= Disjunct {Space 'OR' Space Disjunct} Disjunct ::= Term {Space Term} Term ::= ['-'] SimpleTerm SimpleTerm ::= Word | '"' Word {Space Word} '"' Word ::= NonSpaceChar {NonSpaceChar} Space ::= SpaceChar {SpaceChar} NonSpaceChar ::= Char - SpaceChar /* Any Char except SpaceChar */ SpaceChar ::= ' ' Char ::= /* Any character */
This grammar supports expressions similar to what you might provide to an Internet search engine. It lists the terms or phrases that should appear (or not appear) in the applicable property value(s). Simple terms consist of a single word (with only non-space characters), while phrases can be surrounded with double quotes.
7.1.4. Example JCR-SQL2 Queries
7.1.4.1. Simple Queries
One of the simplest JCR-SQL2 queries finds all nodes in the current workspace of the repository:
SELECT * FROM [nt:base]
This query will return a result set containing the
jcr:primaryType
column, since the nt:base
node type defines only one single-valued, non-residual property called jcr:primaryType
.
Note
The hierarchical database does not currently support returning multi-valued properties in result sets. This is permitted by the JCR 2.0 specification. The hierarchical database does, however, support using multi-valued properties in constraints and
ORDER BY
clauses.
Since our query used
SELECT *
, the hierarchical database also includes the five non-standard pseudo-columns mentioned above: jcr:path
, jcr:score
, jcr:name
, mode:localName
, and mode:depth
. These columns are very convenient to have in the results, but also make certain criteria much easier than with the corresponding standard functions or those specific to the hierarchical database.
Queries can explicitly specify the columns that are to be returned in the results. The following query is very similar to the previous query and will return the same rows, but the result set will have only a single column and will not include any of the pseudo-columns:
SELECT [jcr:primaryType] FROM [nt:base]
The following query will return the same rows as in the previous two queries, but the
SELECT
clause explicitly includes only two of the pseudo-columns for the path and depth (which are computed from the nodes' locations):
SELECT [jcr:primaryType], [jcr:path], [mode:depth] FROM [nt:base]
In JCR-SQL2, a table representing a particular node type will have a column for each of the node type's property definitions, including those inherited from supertypes. For example, the
nt:file
node type, its nt:hierarchyNode
supertype, and the mix:created
mixin type are defined using the CND notation as follows:
[mix:created] mixin - jcr:created (date) protected - jcr:createdBy (string) protected [nt:hierarchyNode] > mix:created abstract [nt:file] > nt:hierarchyNode + jcr:content (nt:base) primary mandatory
Therefore, the table representing the
nt:file
node type will have 3 columns: the jcr:created
and jcr:createdBy
columns inherited from the mix:created
mixin node type (via the nt:hierarchyNode
node type), and the jcr:primaryType
column inherited from the nt:base
node type, which is the implicit supertype of the nt:hierarchyNode
(and all node types).
The hierarchical database adheres to this behavior with the exception that a
SELECT *
will result in the additional pseudo-columns. Thus, this next query:
SELECT * FROM [nt:file]
is equivalent to this query:
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path], [jcr:name], [jcr:score], [mode:localName], [mode:depth] FROM [nt:file]
7.1.4.2. Using Columns in Constraints
Consider a query that selects some of the available columns from the
nt:file
table and uses a constraint to ensure the resulting file nodes have names that end in '.txt':
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file] WHERE LOCALNAME() LIKE '%.txt'
The hierarchical database also supports placing criteria against the
mode:localName
pseudo-column instead of using the LOCALNAME()
function. Such a query is equivalent to the previous query and will produce the exact same results:
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file] WHERE [mode:localName] LIKE '%.txt'
Note
The hierarchical database's pseudo-columns are often far easier to use than the corresponding function-like constraints.
Although this query looks much more like SQL, the use of the '
[
' and ']
' characters to quote the identifiers is not typical of a SQL dialect. The hierarchical database actually supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). Again, this next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous two queries:
SELECT "jcr:primaryType", "jcr:created", "jcr:createdBy", "jcr:path" FROM "nt:file" WHERE "mode:localName" LIKE '%.txt'
7.1.4.3. Inner Joins
In JCR-SQL2, a node will appear as a row in each table that corresponds to the node types defined by that node's primary type or mixin types, or any supertypes of these node types. In other words, a node will appear in the table corresponding to each node type for which
Node.isNodeType(...)
returns true.
For example, consider a node that has a primary type of
nt:file
but has an explicit mixin of mix:referenceable
. This node will appear as a row in the all of these tables:
nt:file
mix:referenceable
nt:hierarchyNode
mix:created
nt:base
However, the columns in each of these tables will differ. The
nt:file
node type has the nt:hierarchyNode
, mix:created
, and nt:base
for supertypes, and therefore the table for nt:file
contains columns for the property definitions on all of these types. But because mix:referenceable
is not a supertype of nt:file
, the table for nt:file
will not contain a jcr:uuid
column. To obtain a single result set that contains columns for all the properties of our node, we need to perform an identity join .
The next query shows how to return all properties for
nt:file
nodes that are also mix:referenceable
:
SELECT file.*, ref.* FROM [nt:file] AS file JOIN [mix:referenceable] AS ref ON ISSAMENODE(file,ref)
Since wildcards were used in the
SELECT
clause, the hierarchical database expands the SELECT
clause to include the columns for all (explicit and inherited) property definitions of each type plus pseudo-columns for each type, which is equivalent to:
SELECT file.[jcr:primaryType], file.[jcr:created], file.[jcr:createdBy], file.[jcr:path], file.[jcr:name], file.[jcr:score], file.[mode:localName], file.[mode:depth], ref.[jcr:path], ref.[jcr:name], ref.[jcr:score], ref.[mode:localName], ref.[mode:depth], ref.[jcr:uuid] FROM [nt:file] AS file JOIN [mix:referenceable] AS ref ON ISSAMENODE(file,ref)
Note because we are using an identity join, the
file.[jcr:path]
column will contain the same value as the ref.[jcr:path]
.
Note
Fully-expand the
SELECT
clause to specify exactly the columns that you want, excluding the columns that return the same values or return values not needed by your application. This can also make the query a bit more efficient, since less data needs to be found and returned.
By the way, this is also what many well-written applications do when querying SQL databases.
Here is a query that does this by eliminating columns with duplicate values and using aliases that are simpler than the namespace-qualified names:
SELECT file.[jcr:primaryType] AS primaryType, file.[jcr:created] AS created, file.[jcr:createdBy] AS createdBy, ref.[jcr:uuid] AS uuid, file.[jcr:path] AS path, file.[jcr:name] AS name, file.[jcr:score] AS score, file.[mode:localName] AS localName, file.[mode:depth] AS depth FROM [nt:file] AS file JOIN [mix:referenceable] AS ref ON ISSAMENODE(file,ref)
Although this query looks much more like SQL, use of the '
[
' and ']
' characters in JCR-SQL2 to quote the identifiers is not typical of a SQL dialect. Again, the hierarchical database supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). This makes it easier for existing SQL-oriented tools and applications to work more readily, including applications that use the hierarchical database's JDBC driver to query a JCR repository.
This next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous query. However, it uses double quotes and a pseudo-column identity constraint on
jcr:path
(which is identical in semantics and performance as the ISSAMENODE(...)
constraint):
SELECT file."jcr:primaryType" AS primaryType, file."jcr:created" AS created, file."jcr:createdBy" AS createdBy, ref."jcr:uuid" AS uuid, file."jcr:path" AS path, file."jcr:name" AS name, file."jcr:score" AS score, file."mode:localName" AS localName, file."mode:depth" AS depth FROM "nt:file" AS file JOIN "mix:referenceable" AS ref ON file."jcr:path" = ref."jcr:path"
Note
When using joins and selecting multiple columns, use aliases on the columns to make it easier to reference those columns in constraints and ordering clauses.
7.1.4.4. Other Joins
These are examples of two-way inner joins, but the hierarchical database supports joining multiple tables together in a single query. The hierarchical database also supports a variety of joins, including:
INNER JOIN
(orJOIN
)LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
CROSS JOIN
7.1.4.5. Set Operations
The hierarchical database also supports several other query features beyond JCR-SQL2. One of these is support for set queries that use:
UNION
andUNION ALL
INTERSECT
andINTERSECT ALL
EXCEPT
andEXCEPT ALL
.
Here is an example of a union:
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file] UNION SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:folder]
7.1.4.6. Subqueries
The hierarchical database also supports using (non-correlated) subqueries within the
WHERE
clause and wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries, though, should return a single column (all other columns will be ignored), and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single row (e.g., in a comparison), only the subquery's first row will be used.
Subqueries in the hierarchical database are a powerful and easy way to use more complex criteria that is a function of the content in the repository, without having to resort to multiple queries and complex application logic, such as taking the results of one query and dynamically generating the criteria of another query.
Here's an example of a query that finds all
nt:file
nodes in the repository whose paths are referenced in the value of the vdb:originalFile
property of the vdb:virtualDatabase
nodes. (This query also uses the $maxVersion
variable in the subquery.)
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file] WHERE PATH() IN ( SELECT [vdb:originalFile] FROM [vdb:virtualDatabase] WHERE [vdb:version] <= $maxVersion AND CONTAINS([vdb:description],'xml OR xml maybe') )
Without subqueries, this query would need to be broken into two separate queries: the first would find all of the paths referenced by the
vdb:virtualDatabase
nodes matching the version and description criteria, followed by one (or more) subsequent queries to find the nt:file
nodes with the paths expressed as literal values (or variables).
Note
Using a subquery is not only easier to implement and understand, it is actually more efficient.