Red Hat Training

A Red Hat training course is available for Red Hat Fuse

Chapter 32. The XPath Language

Abstract

When processing XML messages, the XPath language enables you to select part of a message, by specifying an XPath expression that acts on the message’s Document Object Model (DOM). You can also define XPath predicates to test the contents of an element or an attribute.

32.1. Java DSL

Basic expressions

You can use xpath("Expression") to evaluate an XPath expression on the current exchange (where the XPath expression is applied to the body of the current In message). The result of the xpath() expression is an XML node (or node set, if more than one node matches).

For example, to extract the contents of the /person/name element from the current In message body and use it to set a header named user, you could define a route like the following:

from("queue:foo")
    .setHeader("user", xpath("/person/name/text()"))
    .to("direct:tie");

Instead of specifying xpath() as an argument to setHeader(), you can use the fluent builder xpath() command — for example:

from("queue:foo")
    .setHeader("user").xpath("/person/name/text()")
    .to("direct:tie");

If you want to convert the result to a specific type, specify the result type as the second argument of xpath(). For example, to specify explicitly that the result type is String:

xpath("/person/name/text()", String.class)

Namespaces

Typically, XML elements belong to a schema, which is identified by a namespace URI. When processing documents like this, it is necessary to associate namespace URIs with prefixes, so that you can identify element names unambiguously in your XPath expressions. Apache Camel provides the helper class, org.apache.camel.builder.xml.Namespaces, which enables you to define associations between namespaces and prefixes.

For example, to associate the prefix, cust, with the namespace, http://acme.com/customer/record, and then extract the contents of the element, /cust:person/cust:name, you could define a route like the following:

import org.apache.camel.builder.xml.Namespaces;
...
Namespaces ns = new Namespaces("cust", "http://acme.com/customer/record");

from("queue:foo")
    .setHeader("user", xpath("/cust:person/cust:name/text()", ns))
    .to("direct:tie");

Where you make the namespace definitions available to the xpath() expression builder by passing the Namespaces object, ns, as an additional argument. If you need to define multiple namespaces, use the Namespace.add() method, as follows:

import org.apache.camel.builder.xml.Namespaces;
...
Namespaces ns = new Namespaces("cust", "http://acme.com/customer/record");
ns.add("inv", "http://acme.com/invoice");
ns.add("xsi", "http://www.w3.org/2001/XMLSchema-instance");

If you need to specify the result type and define namespaces, you can use the three-argument form of xpath(), as follows:

xpath("/person/name/text()", String.class, ns)

Auditing namespaces

One of the most frequent problems that can occur when using XPath expressions is that there is a mismatch between the namespaces appearing in the incoming messages and the namespaces used in the XPath expression. To help you troubleshoot this kind of problem, the XPath language supports an option to dump all of the namespaces from all of the incoming messages into the system log.

To enable namespace logging at the INFO log level, enable the logNamespaces option in the Java DSL, as follows:

xpath("/foo:person/@id", String.class).logNamespaces()

Alternatively, you could configure your logging system to enable TRACE level logging on the org.apache.camel.builder.xml.XPathBuilder logger.

When namespace logging is enabled, you will see log messages like the following for each processed message:

2012-01-16 13:23:45,878 [stSaxonWithFlag] INFO  XPathBuilder  -
Namespaces discovered in message: {xmlns:a=[http://apache.org/camel],
DEFAULT=[http://apache.org/default],
xmlns:b=[http://apache.org/camelA, http://apache.org/camelB]}

32.2. XML DSL

Basic expressions

To evaluate an XPath expression in the XML DSL, put the XPath expression inside an xpath element. The XPath expression is applied to the body of the current In message and returns an XML node (or node set). Typically, the returned XML node is automatically converted to a string.

For example, to extract the contents of the /person/name element from the current In message body and use it to set a header named user, you could define a route like the following:

<beans ...>

  <camelContext xmlns="http://camel.apache.org/schema/spring">
    <route>
      <from uri="queue:foo"/>
      <setHeader headerName="user">
        <xpath>/person/name/text()</xpath>
      </setHeader>
      <to uri="direct:tie"/>
    </route>
  </camelContext>

</beans>

If you want to convert the result to a specific type, specify the result type by setting the resultType attribute to a Java type name (where you must specify the fully-qualified type name). For example, to specify explicitly that the result type is java.lang.String (you can omit the java.lang. prefix here):

<xpath resultType="String">/person/name/text()</xpath>

Namespaces

When processing documents whose elements belong to one or more XML schemas, it is typically necessary to associate namespace URIs with prefixes, so that you can identify element names unambiguously in your XPath expressions. It is possible to use the standard XML mechanism for associating prefixes with namespace URIs. That is, you can set an attribute like this: xmlns:Prefix="NamespaceURI".

For example, to associate the prefix, cust, with the namespace, http://acme.com/customer/record, and then extract the contents of the element, /cust:person/cust:name, you could define a route like the following:

<beans ...>

  <camelContext xmlns="http://camel.apache.org/schema/spring"
                xmlns:cust="http://acme.com/customer/record" >
    <route>
      <from uri="queue:foo"/>
      <setHeader headerName="user">
        <xpath>/cust:person/cust:name/text()</xpath>
      </setHeader>
      <to uri="direct:tie"/>
    </route>
  </camelContext>

</beans>

Auditing namespaces

One of the most frequent problems that can occur when using XPath expressions is that there is a mismatch between the namespaces appearing in the incoming messages and the namespaces used in the XPath expression. To help you troubleshoot this kind of problem, the XPath language supports an option to dump all of the namespaces from all of the incoming messages into the system log.

To enable namespace logging at the INFO log level, enable the logNamespaces option in the XML DSL, as follows:

<xpath logNamespaces="true" resultType="String">/foo:person/@id</xpath>

Alternatively, you could configure your logging system to enable TRACE level logging on the org.apache.camel.builder.xml.XPathBuilder logger.

When namespace logging is enabled, you will see log messages like the following for each processed message:

2012-01-16 13:23:45,878 [stSaxonWithFlag] INFO  XPathBuilder  -
Namespaces discovered in message: {xmlns:a=[http://apache.org/camel],
DEFAULT=[http://apache.org/default],
xmlns:b=[http://apache.org/camelA, http://apache.org/camelB]}

32.3. XPath Injection

Parameter binding annotation

When using Apache Camel bean integration to invoke a method on a Java bean, you can use the @XPath annotation to extract a value from the exchange and bind it to a method parameter.

For example, consider the following route fragment, which invokes the credit method on an AccountService object:

from("queue:payments")
    .beanRef("accountService","credit")
    ...

The credit method uses parameter binding annotations to extract relevant data from the message body and inject it into its parameters, as follows:

public class AccountService {
    ...
    public void credit(
            @XPath("/transaction/transfer/receiver/text()") String name,
            @XPath("/transaction/transfer/amount/text()") String amount
            )
    {
        ...
    }
    ...
}

For more information, see Bean Integration in the Apache Camel Development Guide on the customer portal.

Namespaces

Table 32.1, “Predefined Namespaces for @XPath” shows the namespaces that are predefined for XPath. You can use these namespace prefixes in the XPath expression that appears in the @XPath annotation.

Table 32.1. Predefined Namespaces for @XPath

Custom namespaces

You can use the @NamespacePrefix annotation to define custom XML namespaces. Invoke the @NamespacePrefix annotation to initialize the namespaces argument of the @XPath annotation. The namespaces defined by @NamespacePrefix can then be used in the @XPath annotation’s expression value.

For example, to associate the prefix, ex, with the custom namespace, http://fusesource.com/examples, invoke the @XPath annotation as follows:

public class AccountService {
  ...
  public void credit(
    @XPath(
      value = "/ex:transaction/ex:transfer/ex:receiver/text()",
      namespaces = @NamespacePrefix( prefix = "ex", uri = "http://fusesource.com/examples"
      )
    ) String name,
    @XPath(
      value = "/ex:transaction/ex:transfer/ex:amount/text()",
      namespaces = @NamespacePrefix( prefix = "ex", uri = "http://fusesource.com/examples"
      )
    ) String amount,
  )
  {
    ...
  }
  ...
}

32.4. XPath Builder

Overview

The org.apache.camel.builder.xml.XPathBuilder class enables you to evaluate XPath expressions independently of an exchange. That is, if you have an XML fragment from any source, you can use XPathBuilder to evaluate an XPath expression on the XML fragment.

Matching expressions

Use the matches() method to check whether one or more XML nodes can be found that match the given XPath expression. The basic syntax for matching an XPath expression using XPathBuilder is as follows:

boolean matches = XPathBuilder
                    .xpath("Expression")
                    .matches(CamelContext, "XMLString");

Where the given expression, Expression, is evaluated against the XML fragment, XMLString, and the result is true, if at least one node is found that matches the expression. For example, the following example returns true, because the XPath expression finds a match in the xyz attribute.

boolean matches = XPathBuilder
                    .xpath("/foo/bar/@xyz")
                    .matches(getContext(), "<foo><bar xyz='cheese'/></foo>"));

Evaluating expressions

Use the evaluate() method to return the contents of the first node that matches the given XPath expression. The basic syntax for evaluating an XPath expression using XPathBuilder is as follows:

String nodeValue = XPathBuilder
                    .xpath("Expression")
                    .evaluate(CamelContext, "XMLString");

You can also specify the result type by passing the required type as the second argument to evaluate() — for example:

String name = XPathBuilder
                   .xpath("foo/bar")
                   .evaluate(context, "<foo><bar>cheese</bar></foo>", String.class);
Integer number = XPathBuilder
                   .xpath("foo/bar")
                   .evaluate(context, "<foo><bar>123</bar></foo>", Integer.class);
Boolean bool = XPathBuilder
                   .xpath("foo/bar")
                   .evaluate(context, "<foo><bar>true</bar></foo>", Boolean.class);

32.5. Enabling Saxon

Prerequisites

A prerequisite for using the Saxon parser is that you add a dependency on the camel-saxon artifact (either adding this dependency to your Maven POM, if you use Maven, or adding the camel-saxon-7.3.0.fuse-730079-redhat-00001.jar file to your classpath, otherwise).

Using the Saxon parser in Java DSL

In Java DSL, the simplest way to enable the Saxon parser is to call the saxon() fluent builder method. For example, you could invoke the Saxon parser as shown in the following example:

// Java
// create a builder to evaluate the xpath using saxon
XPathBuilder builder = XPathBuilder.xpath("tokenize(/foo/bar, '_')[2]").saxon();

// evaluate as a String result
String result = builder.evaluate(context, "<foo><bar>abc_def_ghi</bar></foo>");

Using the Saxon parser in XML DSL

In XML DSL, the simplest way to enable the Saxon parser is to set the saxon attribute to true in the xpath element. For example, you could invoke the Saxon parser as shown in the following example:

<xpath saxon="true" resultType="java.lang.String">current-dateTime()</xpath>

Programming with Saxon

If you want to use the Saxon XML parser in your application code, you can create an instance of the Saxon transformer factory explicitly using the following code:

// Java
import javax.xml.transform.TransformerFactory;
import net.sf.saxon.TransformerFactoryImpl;
...
TransformerFactory saxonFactory = new net.sf.saxon.TransformerFactoryImpl();

On the other hand, if you prefer to use the generic JAXP API to create a transformer factory instance, you must first set the javax.xml.transform.TransformerFactory property in the ESBInstall/etc/system.properties file, as follows:

javax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl

You can then instantiate the Saxon factory using the generic JAXP API, as follows:

// Java
import javax.xml.transform.TransformerFactory;
...
TransformerFactory factory = TransformerFactory.newInstance();

If your application depends on any third-party libraries that use Saxon, it might be necessary to use the second, generic approach.

Note

The Saxon library must be installed in the container as the OSGi bundle, net.sf.saxon/saxon9he (normally installed by default). In versions of Fuse ESB prior to 7.1, it is not possible to load Saxon using the generic JAXP API.

32.6. Expressions

Result type

By default, an XPath expression returns a list of one or more XML nodes, of org.w3c.dom.NodeList type. You can use the type converter mechanism to convert the result to a different type, however. In the Java DSL, you can specify the result type in the second argument of the xpath() command. For example, to return the result of an XPath expression as a String:

xpath("/person/name/text()", String.class)

In the XML DSL, you can specify the result type in the resultType attribute, as follows:

<xpath resultType="java.lang.String">/person/name/text()</xpath>

Patterns in location paths

You can use the following patterns in XPath location paths:

/people/person

The basic location path specifies the nested location of a particular element. That is, the preceding location path would match the person element in the following XML fragment:

<people>
  <person>...</person>
</people>

Note that this basic pattern can match multiple nodes — for example, if there is more than one person element inside the people element.

/name/text()
If you just want to access the text inside by the element, append /text() to the location path, otherwise the node includes the element’s start and end tags (and these tags would be included when you convert the node to a string).
/person/telephone/@isDayTime

To select the value of an attribute, AttributeName, use the syntax @AttributeName. For example, the preceding location path returns true when applied to the following XML fragment:

<person>
  <telephone isDayTime="true">1234567890</telephone>
</person>
*
A wildcard that matches all elements in the specified scope. For example, /people/person/\* matches all the child elements of person.
@*
A wildcard that matches all attributes of the matched elements. For example, /person/name/@\* matches all attributes of every matched name element.
//

Match the location path at every nesting level. For example, the //name pattern matches every name element highlighted in the following XML fragment:

<invoice>
  <person>
    <name .../>
  </person>
</invoice>
<person>
  <name .../>
</person>
<name .../>
..
Selects the parent of the current context node. Not normally useful in the Apache Camel XPath language, because the current context node is the document root, which has no parent.
node()
Match any kind of node.
text()
Match a text node.
comment()
Match a comment node.
processing-instruction()
Match a processing-instruction node.

Predicate filters

You can filter the set of nodes matching a location path by appending a predicate in square brackets, [Predicate]. For example, you can select the Nth node from the list of matches by appending [N] to a location path. The following expression selects the first matching person element:

/people/person[1]

The following expression selects the second-last person element:

/people/person[last()-1]

You can test the value of attributes in order to select elements with particular attribute values. The following expression selects the name elements, whose surname attribute is either Strachan or Davies:

/person/name[@surname="Strachan" or @surname="Davies"]

You can combine predicate expressions using any of the conjunctions and, or, not(), and you can compare expressions using the comparators, =, !=, >, >=, <, (in practice, the less-than symbol must be replaced by the < entity). You can also use XPath functions in the predicate filter.

Axes

When you consider the structure of an XML document, the root element contains a sequence of children, and some of those child elements contain further children, and so on. Looked at in this way, where nested elements are linked together by the child-of relationship, the whole XML document has the structure of a tree. Now, if you choose a particular node in this element tree (call it the context node), you might want to refer to different parts of the tree relative to the chosen node. For example, you might want to refer to the children of the context node, to the parent of the context node, or to all of the nodes that share the same parent as the context node (sibling nodes).

An XPath axis is used to specify the scope of a node match, restricting the search to a particular part of the node tree, relative to the current context node. The axis is attached as a prefix to the node name that you want to match, using the syntax, AxisType::MatchingNode. For example, you can use the child:: axis to search the children of the current context node, as follows:

/invoice/items/child::item

The context node of child::item is the items element that is selected by the path, /invoice/items. The child:: axis restricts the search to the children of the context node, items, so that child::item matches the children of items that are named item. As a matter of fact, the child:: axis is the default axis, so the preceding example can be written equivalently as:

/invoice/items/item

But there several other axes (13 in all), some of which you have already seen in abbreviated form: @ is an abbreviation of attribute::, and // is an abbreviation of descendant-or-self::. The full list of axes is as follows (for details consult the reference below):

  • ancestor
  • ancestor-or-self
  • attribute
  • child
  • descendant
  • descendant-or-self
  • following
  • following-sibling
  • namespace
  • parent
  • preceding
  • preceding-sibling
  • self

Functions

XPath provides a small set of standard functions, which can be useful when evaluating predicates. For example, to select the last matching node from a node set, you can use the last() function, which returns the index of the last node in a node set, as follows:

/people/person[last()]

Where the preceding example selects the last person element in a sequence (in document order).

For full details of all the functions that XPath provides, consult the reference below.

Reference

For full details of the XPath grammar, see the XML Path Language, Version 1.0 specification.

32.7. Predicates

Basic predicates

You can use xpath in the Java DSL or the XML DSL in a context where a predicate is expected — for example, as the argument to a filter() processor or as the argument to a when() clause.

For example, the following route filters incoming messages, allowing a message to pass, only if the /person/city element contains the value, London:

from("direct:tie")
    .filter().xpath("/person/city = 'London'").to("file:target/messages/uk");

The following route evaluates the XPath predicate in a when() clause:

from("direct:tie")
    .choice()
        .when(xpath("/person/city = 'London'")).to("file:target/messages/uk")
        .otherwise().to("file:target/messages/others");

XPath predicate operators

The XPath language supports the standard XPath predicate operators, as shown in Table 32.2, “Operators for the XPath Language”.

Table 32.2. Operators for the XPath Language

OperatorDescription

=

Equals.

!=

Not equal to.

>

Greater than.

>=

Greater than or equals.

<

Less than.

Less than or equals.

or

Combine two predicates with logical and.

and

Combine two predicates with logical inclusive or.

not()

Negate predicate argument.

32.8. Using Variables and Functions

Evaluating variables in a route

When evaluating XPath expressions inside a route, you can use XPath variables to access the contents of the current exchange, as well as O/S environment variables and Java system properties. The syntax to access a variable value is $VarName or $Prefix:VarName, if the variable is accessed through an XML namespace.

For example, you can access the In message’s body as $in:body and the In message’s header value as $in:HeaderName. O/S environment variables can be accessed as $env:EnvVar and Java system properties can be accessed as $system:SysVar.

In the following example, the first route extracts the value of the /person/city element and inserts it into the city header. The second route filters exchanges using the XPath expression, $in:city = 'London', where the $in:city variable is replaced by the value of the city header.

from("file:src/data?noop=true")
    .setHeader("city").xpath("/person/city/text()")
    .to("direct:tie");

from("direct:tie")
    .filter().xpath("$in:city = 'London'").to("file:target/messages/uk");

Evaluating functions in a route

In addition to the standard XPath functions, the XPath language defines additional functions. These additional functions (which are listed in Table 32.4, “XPath Custom Functions”) can be used to access the underlying exchange, to evaluate a simple expression or to look up a property in the Apache Camel property placeholder component.

For example, the following example uses the in:header() function and the in:body() function to access a head and the body from the underlying exchange:

from("direct:start").choice()
  .when().xpath("in:header('foo') = 'bar'").to("mock:x")
  .when().xpath("in:body() = '<two/>'").to("mock:y")
  .otherwise().to("mock:z");

Notice the similarity between theses functions and the corresponding in:HeaderName or in:body variables. The functions have a slightly different syntax however: in:header('HeaderName') instead of in:HeaderName; and in:body() instead of in:body.

Evaluating variables in XPathBuilder

You can also use variables in expressions that are evaluated using the XPathBuilder class. In this case, you cannot use variables such as $in:body or $in:HeaderName, because there is no exchange object to evaluate against. But you can use variables that are defined inline using the variable(Name, Value) fluent builder method.

For example, the following XPathBuilder construction evaluates the $test variable, which is defined to have the value, London:

String var = XPathBuilder.xpath("$test")
               .variable("test", "London")
               .evaluate(getContext(), "<name>foo</name>");

Note that variables defined in this way are automatically entered into the global namespace (for example, the variable, $test, uses no prefix).

32.9. Variable Namespaces

Table of namespaces

Table 32.3, “XPath Variable Namespaces” shows the namespace URIs that are associated with the various namespace prefixes.

Table 32.3. XPath Variable Namespaces

Namespace URIPrefixDescription

http://camel.apache.org/schema/spring

None

Default namespace (associated with variables that have no namespace prefix).

http://camel.apache.org/xml/in/

in

Used to reference header or body of the current exchange’s In message.

http://camel.apache.org/xml/out/

out

Used to reference header or body of the current exchange’s Out message.

http://camel.apache.org/xml/functions/

functions

Used to reference some custom functions.

http://camel.apache.org/xml/variables/environment-variables

env

Used to reference O/S environment variables.

http://camel.apache.org/xml/variables/system-properties

system

Used to reference Java system properties.

http://camel.apache.org/xml/variables/exchange-property

Undefined

Used to reference exchange properties. You must define your own prefix for this namespace.

32.10. Function Reference

Table of custom functions

Table 32.4, “XPath Custom Functions” shows the custom functions that you can use in Apache Camel XPath expressions. These functions can be used in addition to the standard XPath functions.

Table 32.4. XPath Custom Functions

FunctionDescription

in:body()

Returns the In message body.

in:header(HeaderName)

Returns the In message header with name, HeaderName.

out:body()

Returns the Out message body.

out:header(HeaderName)

Returns the Out message header with name, HeaderName.

function:properties(PropKey)

Looks up a property with the key, PropKey .

function:simple(SimpleExp)

Evaluates the specified simple expression, SimpleExp.