Camel XML tokenizer handling newlines within XML element tags

Solution Unverified - Updated -

Environment

  • JBoss Fuse 6.0

Issue

There is an issue MR-785 [1] with XML tokenizer for Camel with version < 12.0.2. The issue is that if tokenized XML element tag contains newline, Camel won't parse this element properly. For example in older versions of Camel the following route snippet...

split().tokenizeXML("Child")

.. won't properly parse the following XML message...

<Parent>
  <Child A="1"
    B="2">
    <ChildValue/>
  </Child>
</Parent>

[1] https://issues.jboss.org/browse/MR-785

Resolution

The recommended solution is to upgrade JBoss Fuse to version 6.1 (which includes proper version of Apache Camel with MR-785 [1] fixed).

If upgrading Fuse version is not possible, you can try to workaround this issue by preprocessing XML message before it is consumed by Camel splitter. The snippet below demonstrates how to use javax.xml.transform.Transformer API to remove newlines from within XML element tags.

import org.apache.camel.Body;

import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.InputStream;
import java.io.StringWriter;

public class XmlElementNormalizationTransformer {

    public String process(@Body InputStream xml) throws Exception {
        StringWriter transformedXml = new StringWriter();
        Source source = new StreamSource(xml);
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.transform(source, new StreamResult(transformedXml));
        transformedXml.flush();
        return transformedXml.toString();
    }

}

You can wire the mentioned preprocessor into your route using bean [2] component, as demonstrated on the snippet below.

from("direct:test").
  bean(XmlElementNormalizationTransformer.class).split().tokenizeXML("Child").
  to("mock:test");

In the attachment you can find minimal working Maven project demonstrating the preprocessing from the snippets above.

[1] https://issues.jboss.org/browse/MR-785
[2] http://camel.apache.org/bean.html

Root Cause

The old implementation of XML tokenizer was using single-line regexp pattern to parse XML elements tags. As a result Camel was unable to properly handle the multi-line XML element tags. Fixed version of Camel parses newline characters in element tags correctly.

Attachments

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments