Red Hat Training

A Red Hat training course is available for JBoss Enterprise SOA Platform

Chapter 3. Basics

3.1. Smooks

Smooks is a fragment-based data transformation and analysis tool. It is a general purpose processing tool capable of interpreting fragments of a message. It uses visitor logic to accomplish this. It allows you implement your transformation logic in XSLT or Java and provides a management framework through which you can centrally manage the transformation logic for your message-set.

3.2. Visitor Logic in Smooks

Smooks uses visitor logic. A "visitor" is Java code that performs a specific action on a specific fragment of a message. This enables Smooks to perform actions on message fragments.

3.3. Message Fragment Processing

Smooks supports these types of message fragment processing:
  • Templating: Transform message fragments with XSLT or FreeMarker
  • Java Binding: Bind message fragment data into Java objects
  • Splitting: Split messages fragments and rout the split fragments over multiple transports and destinations
  • Enrichment: "Enrich" message fragments with data from databases
  • Persistence: Persist message fragment data to databases
  • Validation: Perform basic or complex validation on message fragment data

3.4. Basic Processing Model

The following is a list of different transformations you can perform with Smooks:
  • XML to XML
  • XML to Java
  • Java to XML
  • Java to Java
  • EDI to XML
  • EDI to Java
  • Java to EDI
  • CSV to XML

3.5. Supported Models

Simple API for XML (SAX)
The SAX event model is based on the hierarchical SAX events you can generate from an XML source. These include the startElement and endElement. Apply it to other structured and hierarchical data sources like EDI, CSV and Java files.
Document Object Model (DOM)
Use this object model to map the message source and its final result.

Note

The most important events have visitBefore and visitAfter in their titles.

3.6. FreeMarker

FreeMarker is a template engine. You can use it to create and use a NodeModel as the domain model for a template operation. Smooks adds the ability to perform fragment-based template transformations to this functionality, as well as the power to apply the model to huge messages.

3.7. Example of Using SAX

Prerequisites

  • Requires an implemented SAXVisitor interface. (Choose an interface that corresponds to the events of the process.)
  • This example uses the ExecutionContext name. It is a public interface which extends the BoundAttributeStore class.

Procedure 3.1. Task

  1. Create a new Smooks configuration. This will be used to apply the visitor logic at the <xxx> element's visitBefore and visitAfter events.
  2. Apply the logic at the visitBefore and visitAfter events in a specific element of the overall event stream. The visitor logic is applied to the events in the <xxx> element.
  3. Use Smooks with FreeMarker to perform an XML-to-XML transformation on a huge message.
  4. Insert the following source format:
    <order id='332'>
        <header>
            <customer number="123">Joe</customer>
        </header>
        <order-items>
            <order-item id='1'>
                <product>1</product>
                <quantity>2</quantity>
                <price>8.80</price>
            </order-item>
    Â 
            <!-- etc etc -->
    Â 
        </order-items>
    </order>
    
  5. Insert this target format:
    <salesorder>
        <details>
            <orderid>332</orderid>
            <customer>
                <id>123</id>
                <name>Joe</name>
            </customer>
        <details>
        <itemList>
            <item>
                <id>1</id>
                <productId>1</productId>
                <quantity>2</quantity>
                <price>8.80</price>
            <item>        
     
            <!-- etc etc -->
     
        </itemList>
    </salesorder>
    
  6. Use this Smooks configuration:
    <?xml version="1.0"?>
    <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
                          xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd">
     
        <!--
        Filter the message using the SAX Filter (i.e. not DOM, so no
        intermediate DOM for the "complete" message - there are "mini" DOMs
        for the NodeModels below)....
        -->
        <params>
            <param name="stream.filter.type">SAX</param>
            <param name="default.serialization.on">false</param>
        </params>
     
        <!--
        Create 2 NodeModels.  One high level model for the "order"
        (header etc) and then one per "order-item".
     
        These models are used in the FreeMarker templating resources
        defined below.  You need to make sure you set the selector such
        that the total memory footprint is as low as possible.  In this
        example, the "order" model will contain everything accept the
        <order-item> data (the main bulk of data in the message).  The
        "order-item" model only contains the current <order-item> data
        (i.e. there's max 1 order-item in memory at any one time).
        -->
        <resource-config selector="order,order-item">
            <resource>org.milyn.delivery.DomModelCreator</resource>
        </resource-config>
     
        <!--
        Apply the first part of the template when we reach the start
        of the <order-items> element.  Apply the second part when we
        reach the end.
     
        Note the <?TEMPLATE-SPLIT-PI?> Processing Instruction in the
        template.  This tells Smooks where to split the template,
        resulting in the order-items being inserted at this point.
        -->
        <ftl:freemarker applyOnElement="order-items">
            <ftl:template><!--<salesorder>
        <details>
            <orderid>${order.@id}</orderid>
            <customer>
                <id>${order.header.customer.@number}</id>
                <name>${order.header.customer}</name>
            </customer>
        </details>
        <itemList>
            <?TEMPLATE-SPLIT-PI?>
        </itemList>
    </salesorder>--></ftl:template>
        </ftl:freemarker>
     
        <!--
        Output the <order-items> elements.  This will appear in the
        output message where the <?TEMPLATE-SPLIT-PI?> token appears in the
        order-items template.
        -->
        <ftl:freemarker applyOnElement="order-item">
            <ftl:template><!--        <item>
                <id>${.vars["order-item"].@id}</id>
                <productId>${.vars["order-item"].product}</productId>
                <quantity>${.vars["order-item"].quantity}</quantity>
                <price>${.vars["order-item"].price}</price>
            </item>
            --></ftl:template>
        </ftl:freemarker>
     
    </smooks-resource-list>
    
  7. Use this code to execute:
    Smooks smooks = new Smooks("smooks-config.xml");
    try {
        smooks.filterSource(new StreamSource(new FileInputStream("input-message.xml")), new StreamResult(System.out));
    } finally {
        smooks.close();
    }
    
  8. An XML-to-XML transformation occurs as a result.

3.8. Cartridges

A cartridge is a Java archive (JAR) file that contains reusable content handlers. In most cases, you will not need to write large quantities of Java code for Smooks because some modules of functionality are included as cartridges. You can create new cartridges of your own to extend the smooks-core's basic functionality. Each cartridge provides ready-to-use support for either a transformation process or a specific form of XML analysis.

3.9. Supplied Cartridges

These are the cartridges supplied with Smooks:
  • Calc:"milyn-smooks-calc"
  • CSV: "milyn-smooks-csv"
  • Fixed length reader: "milyn-smooks-fixed-length"
  • EDI: "milyn-smooks-edi"
  • Javabean: "milyn-smooks-javabean"
  • JSON: "milyn-smooks-json"
  • Routing: "milyn-smooks-routing"
  • Templating: "milyn-smooks-templating"
  • CSS: "milyn-smooks-css"
  • Servlet: "milyn-smooks-servlet"
  • Persistence: "milyn-smooks-persistence"
  • Validation: "milyn-smooks-validation"

3.10. Selectors

Smooks resource selectors tell Smooks which messages fragments to apply visitor logic. They also serve as simple look-up values for non-visitor logic. When a resource is a visitor implementation (like <jb:bean> or <ftl:freemarker>), Smooks treats the resource selector as an XPath selector. Resources include the Java Binding Resource and FreeMarker Template Resource.

3.11. Using Selectors

The following points apply when using the selectors:
  • Configurations are both "strongly typed" and domain-specific for legibility.
  • Configurations are XSD-based. This provides you with auto-completion support when using an integrated development environment.
  • The actual handler doesn't need to be defined for the given resource type (such as the BeanPopulator class for Java bindings).

3.12. Declaring Namespaces

Procedure 3.2. Task

  • Configure namespace prefix-to-URI mappings through the core configuration namespace and modify the following XML code to include the namespaces you wish to use:
    <?xml version="1.0"?>
    <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
        xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
    
        <core:namespaces>
            <core:namespace prefix="a" uri="http://a"/>
            <core:namespace prefix="b" uri="http://b"/>
            <core:namespace prefix="c" uri="http://c"/>
            <core:namespace prefix="d" uri="http://d"/>
        </core:namespaces>
    
        <resource-config selector="c:item[@c:code = '8655']/d:units[text() = 1]">
            <resource>com.acme.visitors.MyCustomVisitorImpl</resource>
        </resource-config>
    
    </smooks-resource-list>
    

3.13. Filtering Process Selection

This is how Smooks selects a filtering process:
  • The DOM processing model is selected automatically if only the DOM visitor interface is applied ( DOMElementVisitor and SerializationUnit).
  • If all visitor resources use only the SAX visitor interface ( SAXElementVisitor), the SAX processing model is selected automatically.
  • If the visitor resources use both the DOM and SAX interfaces, the DOM processing model is selected by default unless you specify SAX in the Smooks resource configuration file. (This is done using <core:filterSettings type="SAX" />.)
Visitor resources do not include non-element visitor resources such as readers.

3.14. Example of Setting the Filter Type to SAX in Smooks 1.3

<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" 
    xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">

    <core:filterSettings type="SAX" />

</smooks-resource-list>

3.15. DomModelCreator

The DomModelCreator is a class that you can use in Smooks to create models for message fragments.

3.16. Mixing the DOM and SAX Models

  • Use the DOM (Document Object Model) for node traversal (that is, seinding information between nodes) and pre-existing scripting/template engines.
  • Use the DomModelCreator visitor class to mix SAX and DOM models. When used with SAX filtering, this visitor will construct a DOM fragment from the visited element. It allows you to use DOM utilities within a streaming environment.
  • When more than one model is nested, the outer models will never contain data from the inner models (that is, the same fragment will never co-exist inside two models):
    <order id="332">
        <header>
            <customer number="123">Joe</customer>
        </header>
        <order-items>
            <order-item id='1'>
                <product>1</product>
                <quantity>2</quantity>
                <price>8.80</price>
            </order-item>
            <order-item id='2'>
                <product>2</product>
                <quantity>2</quantity>
                <price>8.80</price>
            </order-item>
            <order-item id='3'>
                <product>3</product>
                <quantity>2</quantity>
                <price>8.80</price>
            </order-item>
        </order-items>
    </order>
    

3.17. Configuring the DomModelCreator

  1. Configure the DomModelCreator from within Smooks to create models for the order and order-item message fragments. See the following example:
    <resource-config selector="order,order-item">
        <resource>org.milyn.delivery.DomModelCreator</resource>
    </resource-config>
    
  2. Configure the in-memory model for the order as shown:
    <order id='332'>
         <header>
             <customer number="123">Joe</customer>
         </header>
         <order-items />
    </order>
    

    Note

    Each new model overwrites the previous one so there will never be more than one order-item model in memory at once.

3.18. Further Information about the DomModelCreator

3.19. The Bean Context

The bean context contains objects for Smooks to access when filtering occurs. One bean context is created per execution context (using the Smooks.filterSource operation). Every bean the cartridge creates is filed according to its beanId.

3.20. Configuring Bean Contexts

  1. To have the contents of the bean context returned at the end of a Smooks.filterSource process, supply a org.milyn.delivery.java.JavaResult object in the call to the Smooks.filterSource method. This example shows you how:
     //Get the data to filter
    StreamSource source = new StreamSource(getClass().getResourceAsStream("data.xml"));
    
    //Create a Smooks instance (cachable)
    Smooks smooks = new Smooks("smooks-config.xml");
    
    //Create the JavaResult, which will contain the filter result after filtering
    JavaResult result = new JavaResult();
    
    //Filter the data from the source, putting the result into the JavaResult
    smooks.filterSource(source, result);
    
    //Getting the Order bean which was created by the Javabean cartridge
    Order order = (Order)result.getBean("order");
    
  2. To access the bean contexts at start-up, specify this in the BeanContext object. You can retrieve it from the ExecutionContext via the getBeanContext() method.
  3. When adding or retrieving objects from the BeanContext make sure you first retrieve a beanId object from the beanIdStore. (The beanId object is a special key that ensures higher performance than string keys, although string keys are also supported.)
  4. You must retrieve the beanIdStore from the ApplicationContext using the getbeanIdStore() method.
  5. To create a beanId object, call the register("beanId name") method. (If you know that the beanId is already registered, then you can retrieve it by calling the getbeanId("beanId name") method.)
  6. beanId objects are ApplicationContext-scoped objects. Register them in your custom visitor implementation's initialization method and then put them in the visitor object as properties. You can then use them in the visitBefore and visitAfter methods. (The beanId objects and the beanIdStore are thread-safe.)

3.21. Pre-Installed Beans

The following Beans come pre-installed:
  • PUUID: UniqueId bean. This bean provides unique identifiers for the filtering ExecutionContext.
  • PTIME: Time bean. This bean provides time-based data for the filtering ExecutionContext.
These examples show you how to use these beans in a FreeMarker template:
  • Unique ID of the ExecutionContext (message being filtered): $PUUID.execContext
  • Random Unique ID: $PUUID.random
  • Message Filtering start time (in milliseconds): $PTIME.startMillis
  • Message Filtering start time (in nanoseconds): $PTIME.startNanos
  • Message Filtering start time (Date): $PTIME.startDate
  • Time now (in milliseconds): $PTIME.nowMillis
  • Time now (in nanoseconds): $PTIME.nowNanos
  • Time now (Date): $PTIME.nowDate

3.22. Multiple Outputs/Results

Smooks produces output in these ways:
  • Through in-result instances. These are returned in the result instances passed to the Smooks.filterSource method.
  • During the filtering process. This is achieved through output generated and sent to external endpoints (such as ESB services, files, JMS destinations and databases) during the filtering process. Message fragment events trigger automatic routing to external endpoints.

Important

Smooks can generate output in the above ways in a single filtering pass of a message stream. It does not need to filter a message stream multiple times to generate multiple outputs.

3.23. Creating "In-Result" Instances

  • Supply Smooks with multiple result instances as seen in the API:
    public void filterSource(Source source, Result... results) throws SmooksException
    

    Note

    Smooks does not support capturing result data from multiple result instances of the same type. For example, you can specify multiple StreamResult instances in the Smooks.filterSource method call, but Smooks will only output to one of these StreamResult instances (the first one).

3.24. Supported Result Types

Smooks can work with standard JDK StreamResult and DOMResult result types, as well as these specialist ones:
  • JavaResult: use this result type to capture the contents of the Smooks Java Bean context.
  • ValidationResult: use this result type to capture outputs.
  • Simple Result type: use this when writing tests. This is a StreamResult extension wrapping a StringWriter.

3.25. Event Stream Results

When Smooks processes a message, it produces a stream of events. If a StreamResult or DOMResult is supplied in the Smooks.filterSource call, Smooks will, by default, serialize the event stream (produced by the Source) to the supplied result as XML. (You can apply visitor logic to the event stream before serialization.)

Note

This is the mechanism used to perform a standard 1-input/1-xml-output character based transformation.

3.26. During the Filtering Process

Smooks generates different types of output during the Smooks.filterSource process. (This occurs during the message event stream, before the end of the message is reached.) An example of this is when it is used to split and route message fragments to different types of endpoints for execution by other processes.
Smooks does not "batch up" the message data and produce all of the outputs after filtering the complete message. This is because performance would be impacted and also because it allows you to utilize the message event stream to trigger the fragment transformation and routing operations. Large messages are sent by streaming the process.

3.27. Checking the Smooks Execution Process

  1. To obtain an execution report from Smooks you must configure the ExecutionContext class to produce one. (Smooks will publish events as it processes messages.) The following sample code shows you how to configure Smooks to generate a HTML report:
    Smooks smooks = new Smooks("/smooks/smooks-transform-x.xml");
    ExecutionContext execContext = smooks.createExecutionContext();
    
    execContext.setEventListener(new HtmlReportGenerator("/tmp/smooks-report.html"));
    smooks.filterSource(execContext, new StreamSource(inputStream), new StreamResult(outputStream));
    
  2. Use the HtmlReportGenerator feature to assist you when debugging.

    Note

    You can see a sample report on this web page: http://www.milyn.org/docs/smooks-report/report.html

    Note

    Alternatively, you can create a custom ExecutionEventListener implementation.

3.28. Terminating the Filtering Process

  1. To terminate the Smooks filtering process before the end of the message is reached, add the <core:terminate> configuration to the Smooks settings. (This works for SAX and is not needed for DOM.)
    Here is an example configuration that terminates filtering at the end of the message's customer fragment:
    <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" 
       xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
    
        <!-- Visitors... -->
        <core:terminate onElement="customer" />
    
    </smooks-resource-list>
  2. To terminate at the beginning of a message (on the visitBefore event), use this code:
    <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" 
       xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
    
       <!-- Visitors... -->
    
       <core:terminate onElement="customer" terminateBefore="true" />
    
    </smooks-resource-list>

3.29. Global Configuration Settings

Default Properties
Default Properties specify the default values for <resource-config> attributes. These properties are automatically applied to the SmooksResourceConfiguration class when the corresponding <resource-config> does not specify a value for the attribute.
Global parameters
You can specify <param> elements in every <resource-config>. These parameter values will either be available at runtime through the SmooksResourceConfiguration or, if not, they will be injected through the @ConfigParam annotation.
Global configuration parameters are defined in one place. Every runtime component can access them by using the ExecutionContext.

3.30. Global Configuration Parameters

  1. Global parameters are specified in a <params> element as shown:
    <params>
        <param name="xyz.param1">param1-val</param>
    </params>
    
  2. Access the global parameters via the ExecutionContext:
    <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" 
        xmlns:xsl="http://www.milyn.org/xsd/smooks/xsl-1.1.xsd" 
        default-selector="order">
    
        <resource-config>
            <resource>com.acme.VisitorA</resource>
            ...
        </resource-config>
    
        <resource-config>
            <resource>com.acme.VisitorB</resource>
            ...
        </resource-config>
    
    <smooks-resource-list>
    

3.31. Default Properties

Default properties can be set on the root element of a Smooks configuration which then applies them applied the resource configurations in the smooks-conf.xml file. If all of the resource configurations have the same selector value, you can specify a default-selector=order. This means you don't have to specify the selector on every resource configuration.

3.32. Default Properties Example Configuration

<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" 
    xmlns:xsl="http://www.milyn.org/xsd/smooks/xsl-1.1.xsd" 
    default-selector="order">

    <resource-config>
        <resource>com.acme.VisitorA</resource>
        ...
    </resource-config>

    <resource-config>
        <resource>com.acme.VisitorB</resource>
        ...
    </resource-config>

<smooks-resource-list>

3.33. Default Property Options

default-selector
This is applied to all of the resource-config elements in the Smooks configuration file if no other selector has been defined.
default-selector-namespace
This is the default selector namespace. It is used if no other namespace is defined.
default-target-profile
This is the default target profile. It is applied to all of the resources in the Smooks configuration file when no other target-profile has been defined.
default-condition-ref
This refers to a global condition by the conditions identifier. This condition is applied to resources that define an empty condition element (in other words, <condition/>) that does not reference a globally-defined condition.

3.34. Filter Settings

  • To set filtering options, use the smooks-core configuration namespace. See the following example:
    ;smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" 
       xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
       <core:filterSettings type="SAX" defaultSerialization="true" 
          terminateOnException="true" readerPoolSize="3" closeSource="true" 
          closeResult="true" rewriteEntities="true" />
    
          .. Other visitor configs etc...
    
    </smooks-resource-list>
    

3.35. Filter Options

type
This determines the type of processing model that will be used out of either SAX or DOM. (The default is DOM.)
defaultSerialization
This determines if default serialization should be switched on. The default value is true. Turning it on tells Smooks to locate a StreamResult (or DOMResult) in the result objects provided to the Smooks.filterSource method and to, by default, serialize all events to that result.
You can turn this behaviour off via the global configuration parameter or you can override it on a per-fragment basis by targeting a visitor implementation at that fragment that either takes ownership of the result writer (when using SAX filtering) or modifies the DOM (when using DOM filtering).
terminateOnException
Use this to determine whether an exception should terminate processing. The default setting is true.
closeSource
This closes source instance streams passed to the Smooks.filterSource method (the default is true). The exception here is System.in, which will never be closed.
closeResult
This closes result streams passed to the Smooks.filterSource method (the default istrue). The exceptions here are System.out and System.err, which are never closed.
rewriteEntities
Use this to rewrite XML entities when reading and writing (default serialization) XML.
readerPoolSize
This sets the reader pool size. Some reader implementations are very expensive to create. Pooling reader instances (in other words, reusing them) can result in significant performance improvement, especially when processing a multitude of small messages. The default value for this setting is 0 (in other words, not pooled: a new reader instance is created for each message).
Configure this to be in line with your applications threading model.