JBoss Enterprise SOA Platform 5 Smooks User Guide

Arya MirServers

Apr 3, 2012 (5 years and 5 months ago)

1,624 views

JBoss Enterprise
SOA Platform 5
Smooks User Guide
For JBoss Developers.
Smooks User Guide
JBoss Enterprise SOA Platform 5 Smooks User Guide
For JBoss Developers.
Edition 5.1.0
Copyright © 2011 Red Hat, Inc.
Copyright © 2011 Red Hat, Inc..
The text of and illustrations in this document are licensed by Red Hat under the GNU Lesser General
Public License (LGPL) version 2.1. A copy of this license can be found at Appendix A, GNU Lesser
General Public License 2.1.
This manual is derived from the Smooks User Guide from the Smooks Project. Further details about
the Smooks Project can be found at the project's website: http://www.smooks.org.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the Infinity
Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
All other trademarks are the property of their respective owners.
This document is a guide and reference for the Smooks framework. Smooks is a Java Framework for
processing XML and non-XML data such as CSV, EDI, and Java objects.
The JBoss Enterprise SOA Platform 5.1 only supports the Smooks 1.3 Framework for the performing
of message transformations and content-based routing within ESB actions.
iii
Preface vii
1. Document Conventions .................................................................................................. vii
1.1. Typographic Conventions .................................................................................... vii
1.2. Pull-quote Conventions ....................................................................................... viii
1.3. Notes and Warnings ............................................................................................ ix
2. We Need Feedback! ....................................................................................................... ix
1. Overview 1
2. Basics 5
2.1. Basic Processing Model ............................................................................................... 5
2.2. Simple Example ........................................................................................................... 6
2.3. Smooks Resources ...................................................................................................... 9
2.3.1. Selectors ......................................................................................................... 10
2.3.2. Namespace Declaration ................................................................................... 11
2.4. Cartridges .................................................................................................................. 11
2.5. Filtering Process Selection .......................................................................................... 12
2.5.1. Mixing the DOM and SAX Models .................................................................... 12
2.6. Checking the Smooks Execution Process .................................................................... 14
2.7. Terminating the Filtering Process ................................................................................. 14
2.8. Global Configurations ................................................................................................. 15
2.8.1. Global Configuration Parameters ...................................................................... 15
2.8.2. Default Properties ............................................................................................ 16
2.9. Filter Settings ............................................................................................................. 16
3. Extending Smooks 19
3.1. Configuring Smooks Components ............................................................................... 19
3.1.1. Configuration Annotations ................................................................................ 19
3.2. Implementing a Source Reader ................................................................................... 22
3.2.1. Implementing a Binary Source Reader .............................................................. 27
3.3. Implementing a Fragment Visitor ................................................................................. 28
3.3.1. The SAX Visitor API ........................................................................................ 28
3.3.2. Text Accumulation ............................................................................................ 29
3.3.3. StreamResult Writing/Serialization .................................................................... 30
3.3.4. Visitor Configuration ......................................................................................... 33
3.3.5. Visitor Instance Life-cycle ................................................................................. 34
3.3.6. ExecutionContext and ApplicationContext .......................................................... 35
4. Java Binding 37
4.1. When to use Smooks Java Binding ............................................................................. 37
4.2. Basics Of Java Binding ............................................................................................. 38
4.2.1. The Bean Context .......................................................................................... 39
4.3. Java Binding Configuration Details ............................................................................. 40
4.3.1. Pre-processing Binding Values ......................................................................... 43
4.3.2. Creating beans using a factory ......................................................................... 44
4.3.3. Extended Life-Cycle Binding ............................................................................. 45
4.3.4. Binding Key Value Pairs to Maps .................................................................... 46
4.3.5. Virtual Object Models (Maps and Lists) ............................................................. 46
4.3.6. Merging Multiple Data Entities into a Single Binding ........................................... 47
4.4. Programmatic Configuration ........................................................................................ 47
4.4.1. An Example .................................................................................................... 48
4.5. Direct Value Binding ................................................................................................... 49
4.5.1. Configuration ................................................................................................... 49
4.5.2. Programmatic configuration .............................................................................. 50
4.6. Generating the Smooks Binding Configuration ............................................................ 51
4.7. Notes on JavaResult .................................................................................................. 52
Smooks User Guide
iv
5. Templates 55
5.1. FreeMarker Templates ................................................................................................ 55
5.2. Undertaking FreeMarker Transformations using NodeModels ....................................... 57
5.2.1. FreeMarker and the Java Bean Cartridge .......................................................... 59
5.2.2. Programmatic Configuration ............................................................................ 60
5.2.3. XSL Templates ................................................................................................ 60
6. "Groovy" Scripting 63
6.1. Using Mixed-DOM-and-SAX with Groovy ..................................................................... 63
6.1.1. Mixed-DOM-and-SAX Example ......................................................................... 64
7. Processing Non-XML Data 67
7.1. Processing CSV ......................................................................................................... 67
7.1.1. String manipulation functions ............................................................................ 68
7.1.2. Ignoring Fields ................................................................................................. 68
7.1.3. Binding CSV Records to Java .......................................................................... 69
7.1.4. Programmatic Configuration ............................................................................. 70
7.2. Processing Fixed Length ............................................................................................ 71
7.2.1. String manipulation functions ............................................................................ 72
7.2.2. Ignoring Fields ................................................................................................. 72
7.2.3. Binding fixed length Records to Java ................................................................ 72
7.2.4. Programmatic Configuration ............................................................................. 74
7.3. Processing EDI Files .................................................................................................. 75
7.3.1. EDI Mapping Models ....................................................................................... 75
7.3.2. Imports ............................................................................................................ 77
7.3.3. Type Support ................................................................................................... 78
7.3.4. Programmatic Configuration ............................................................................. 79
7.3.5. Edifact Java Compiler ...................................................................................... 79
7.4. Processing JavaScript Object Notation ........................................................................ 82
7.4.1. Programmatic Configuration ............................................................................. 83
7.5. Configuring the Default Reader ................................................................................... 83
7.6. String manipulation functions for readers ..................................................................... 83
8. Java-to-Java Transformations 85
8.1. Source and Target Object Models ............................................................................... 85
8.2. Source Model Event Stream ....................................................................................... 85
8.3. Smooks Configuration ................................................................................................. 86
8.4. Smooks Execution ...................................................................................................... 86
9. Rules 87
9.1. Rule Configuration .................................................................................................... 87
9.1.1. ruleBase Configuration Options ....................................................................... 87
9.2. RuleProvider Implementations .................................................................................... 87
9.2.1. RegexProvider ................................................................................................ 88
9.2.2. MVELProvider ................................................................................................ 88
10. Validation 91
10.1. Validation Configuration ........................................................................................... 91
10.1.1. Configuring Maximum Failures ...................................................................... 91
10.1.2. onFail ............................................................................................................ 92
10.1.3. Composite Rule Name ................................................................................... 92
10.2. Validation Results .................................................................................................... 92
10.3. Localized Validation Messages ................................................................................. 93
10.4. Example ................................................................................................................... 93
11. Processing "Huge" Messages 97
11.1. One-to-One Transformation ...................................................................................... 97
v
11.2. Splitting and Routing .............................................................................................. 101
11.2.1. Basic Splitting and Routing ........................................................................... 102
11.2.2. Routing to a File .......................................................................................... 103
11.2.3. Routing to Java Message Service ................................................................. 106
11.2.4. Routing to a Database with SQL ................................................................... 107
12. Database Persistence 111
12.1. Entity Persistence Frameworks ................................................................................ 111
12.2. Data Access Object Support ................................................................................... 114
12.3. Message Enrichment .............................................................................................. 116
13. Multiple Outputs/Results 117
13.1. In Result Instances ................................................................................................. 117
13.1.1. StreamResults / DOMResults ........................................................................ 118
13.2. During the Filtering Process .................................................................................... 118
14. Performance Tuning 119
14.1. General .................................................................................................................. 119
14.2. Smooks cartridges .................................................................................................. 119
14.3. Java bean cartridge ................................................................................................ 119
15. Testing 121
15.1. Unit Testing ............................................................................................................ 121
A. GNU Lesser General Public License 2.1 123
B. Revision History 131
vi
vii
Preface
1. Document Conventions
This manual uses several conventions to highlight certain words and phrases and draw attention to
specific pieces of information.
In PDF and paper editions, this manual uses typefaces drawn from the Liberation Fonts
1
set. The
Liberation Fonts set is also used in HTML editions if the set is installed on your system. If not,
alternative but equivalent typefaces are displayed. Note: Red Hat Enterprise Linux 5 and later includes
the Liberation Fonts set by default.
1.1. Typographic Conventions
Four typographic conventions are used to call attention to specific words and phrases. These
conventions, and the circumstances they apply to, are as follows.
Mono-spaced Bold
Used to highlight system input, including shell commands, file names and paths. Also used to highlight
keycaps and key combinations. For example:
To see the contents of the file my_next_bestselling_novel in your current
working directory, enter the cat my_next_bestselling_novel command at the
shell prompt and press Enter to execute the command.
The above includes a file name, a shell command and a keycap, all presented in mono-spaced bold
and all distinguishable thanks to context.
Key combinations can be distinguished from keycaps by the hyphen connecting each part of a key
combination. For example:
Press Enter to execute the command.
Press Ctrl+Alt+F2 to switch to the first virtual terminal. Press Ctrl+Alt+F1 to
return to your X-Windows session.
The first paragraph highlights the particular keycap to press. The second highlights two key
combinations (each a set of three keycaps with each set pressed simultaneously).
If source code is discussed, class names, methods, functions, variable names and returned values
mentioned within a paragraph will be presented as above, in mono-spaced bold. For example:
File-related classes include filesystem for file systems, file for files, and dir for
directories. Each class has its own associated set of permissions.
Proportional Bold
This denotes words or phrases encountered on a system, including application names; dialog box text;
labeled buttons; check-box and radio button labels; menu titles and sub-menu titles. For example:
Choose System → Preferences → Mouse from the main menu bar to launch Mouse
Preferences. In the Buttons tab, click the Left-handed mouse check box and click
1
https://fedorahosted.org/liberation-fonts/
Preface
viii
Close to switch the primary mouse button from the left to the right (making the mouse
suitable for use in the left hand).
To insert a special character into a gedit file, choose Applications → Accessories
→ Character Map from the main menu bar. Next, choose Search → Find… from the
Character Map menu bar, type the name of the character in the Search field and click
Next. The character you sought will be highlighted in the Character Table. Double-
click this highlighted character to place it in the Text to copy field and then click the
Copy button. Now switch back to your document and choose Edit → Paste from the
gedit menu bar.
The above text includes application names; system-wide menu names and items; application-specific
menu names; and buttons and text found within a GUI interface, all presented in proportional bold and
all distinguishable by context.
Mono-spaced Bold Italic or Proportional Bold Italic
Whether mono-spaced bold or proportional bold, the addition of italics indicates replaceable or
variable text. Italics denotes text you do not input literally or displayed text that changes depending on
circumstance. For example:
To connect to a remote machine using ssh, type ssh username@domain.name at
a shell prompt. If the remote machine is example.com and your username on that
machine is john, type ssh john@example.com.
The mount -o remount file-system command remounts the named file
system. For example, to remount the /home file system, the command is mount -o
remount /home.
To see the version of a currently installed package, use the rpm -q package
command. It will return a result as follows: package-version-release.
Note the words in bold italics above — username, domain.name, file-system, package, version and
release. Each word is a placeholder, either for text you enter when issuing a command or for text
displayed by the system.
Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and
important term. For example:
Publican is a DocBook publishing system.
1.2. Pull-quote Conventions
Terminal output and source code listings are set off visually from the surrounding text.
Output sent to a terminal is set in mono-spaced roman and presented thus:
books Desktop documentation drafts mss photos stuff svn
books_tests Desktop1 downloads images notes scripts svgs
Source-code listings are also set in mono-spaced roman but add syntax highlighting as follows:
package org.jboss.book.jca.ex1;
import javax.naming.InitialContext;
Notes and Warnings
ix
public class ExClient
{
public static void main(String args[])
throws Exception
{
InitialContext iniCtx = new InitialContext();
Object ref = iniCtx.lookup("EchoBean");
EchoHome home = (EchoHome) ref;
Echo echo = home.create();
System.out.println("Created Echo");
System.out.println("Echo.echo('Hello') = " + echo.echo("Hello"));
}
}
1.3. Notes and Warnings
Finally, we use three visual styles to draw attention to information that might otherwise be overlooked.
Note
Notes are tips, shortcuts or alternative approaches to the task at hand. Ignoring a note should
have no negative consequences, but you might miss out on a trick that makes your life easier.
Important
Important boxes detail things that are easily missed: configuration changes that only apply to
the current session, or services that need restarting before an update will apply. Ignoring a box
labeled 'Important' will not cause data loss but may cause irritation and frustration.
Warning
Warnings should not be ignored. Ignoring warnings will most likely cause data loss.
2. We Need Feedback!
If you find a typographical error in this manual, or if you have thought of a way to make this manual
better, we would love to hear from you! Please submit a report in Bugzilla: http://bugzilla.redhat.com/
bugzilla/ against the product JBoss Enterprise SOA Platform.
When submitting a bug report, be sure to mention the manual's identifier: Smooks_Guide
If you have a suggestion for improving the documentation, try to be as specific as possible when
describing it. If you have found an error, please include the section number and some of the
surrounding text so we can find it easily.
x
Chapter 1.
1
Overview
Read this book to learn about the Smooks transformation engine and the way in which it integrates
with the JBoss Enterprise Service Bus.
Smooks is a Java framework for processing both XML and non- XML data. (Non- XML data includes
formats such as CSV , EDI and Java files.)
About the Code Samples
This book references several code examples from the Smooks project. The use cases illustrated
by the examples are supported, but the code examples are not included with , tested or
supported by the JBoss Enterprise SOA Platform.
The complete Smooks Project code examples can be found at http://www.smooks.org/mediawiki/
index.php?title=Smooks_v1.3_Examples.
The JBoss Enterprise SOA Platform includes multiple "quickstart" example programs that
demonstrate Smooks transformations and routing. Refer to the readme files included with each
quickstart for more information
Read this section to gain an overview of Smooks' key features:
Transformations
This is the ability to perform a wide range of "data transforms" to and from many formats including
XML, CSV, EDI, and Java
Figure 1.1. Transformation
Java Binding
This feature is used to populate a Java Object Model from a data source, such as a CSV, EDI,
XML or Java file. The resulting populated object models can then either be used as and of
themselves (as transformation results) or, alternatively, as "templating" resources from which XML
(or other character-based results) can be generated.
This feature also supports Virtual Object Models (maps and lists of typed data). Virtual Object
Models can be used by both the Extract Transform Load (ETL) and by the templating functionality.
Chapter 1. Overview
2
Figure 1.2. Java Binding
Huge Message Processing
This feature is used to process very large messages (possibly many gigabytes in size.) It can
split, transform and route fragments of these message to a variety of destinations, such as Java
Message Services, files and databases.
Figure 1.3. Huge Message Processing
Message Enrichment
As its name suggests, this feature is used to "enrich" a message with information supplied from a
database or some other exernal source.
3
Figure 1.4. Message Enrichment
Complex Message Validation
This is a rules-based fragment validation feature.
Object-Relational Mapping (ORM)-Based Message Persistence
This feature uses a Java Persistence API (JPA)-compatible entity-persistence framework (such
as Ibatis or Hibernate) in order to access a database. It uses either the database's own query
language or the CRUD (Create, Read, Update and Delete) methodology in order to read from, and
write to it.
This functionality can also use custom Data Access Objects' (DAOs') CRUD methods to access a
database.
Combine
This feature is used to perform Extract Transform Load operations. It does so by leveraging
Smooks' transformation, routing and persistence features.
4
Chapter 2.
5
Basics
Read this section in order to gain a basic understanding of how Smooks works.
The smooks-core is a Structured Data Event Stream Processor, the code of which supports the
ability to "hook" custom visitor logic into an event stream produced from a data-source.
The concept of visitor logic is central to the way in which Smooks works. A "visitor" is a simple
piece of Java source code, designed to perform a specific task on the message fragment
at which it is targeted. (For instance, its purpose might be to apply an XSLT style-sheet.)
Visitor logic can be supported through either the SAX or DOM filters if you implement either
or both of the following interfaces: org.milyn.delivery.sax.SAXElementVisitor or
org.milyn.delivery.dom.DOMElementVisitor.
The most common application of this functionality is that of creating transformation solutions. (To do
so, implement visitor logic by using the event stream produced from the source message with the
intention of creating a result in some other form.) However, the capabilities of the smooks-core allow
it to be used in many other ways. Here are some examples:
• Java Binding: this is the ability to populate a Java Object Model from the source message.
• Message Splitting and Routing: this is the ability to perform complex splitting-and-routing operations
on the source message. It also includes the ability to concurrently route data in multiple formats to
multiple destinations.
• Huge Message Processing: this is the ability to declaratively "consume" (that is, transform or split-
and-route) very large messages without having to write large amounts of code.
2.1. Basic Processing Model
To reiterate, the basic purpose of Smooks is to take a data source and, from it, generate an event
stream, to which visitor logic can be applied. This is undertaken in order to produce a "result" in a
format such as Electronic Data Exchange (EDI.)
Many different data source and result formats are supported and, hence, a number of different
transformation types are available. Some of the more common examples are:
• XML to XML
• XML to Java
• Java to XML
• Java to Java
• EDI to XML
• EDI to Java
• Java to EDI
• CSV to XML
Smooks supports both the Document Object Model (DOM) and Simple API for XML (SAX) event
models. These are used to map between the source and the result. The SAX event model will be
discussed in the most detail throughout this document.
Chapter 2. Basics
6
The SAX event model is based, as its name would imply, on the hierarchical SAX events that you can
generate from an XML source. These may, for example, be the startElement and endElement.
This event model has an advantage it that it can applied rather easily to other structured and
hierarchical data sources, including EDI, CSV and Java files.
Usually, the most important events are those entitled visitBefore and visitAfter. The following
illustration conveys their respective hierarchical natures:
Figure 2.1. Hierarchical nature of the visitBefore and visitAfter events
2.2. Simple Example
In order to be able to use the SAX event stream produced from the source message, you must
implement one or more of the SAXVisitor interfaces. These are described in more detail in the
javadocs. One's choice of interfaces will depend upon which events are to be consumed in a particular
scenario.
Note
This example uses the ExecutionContext name. It is a public interface which extends the
BoundAttributeStore class. More information about this interface can be found in the
javadocs.
This example demonstrates how to aim the logic at the visitBefore and visitAfter events at a
specific element within the overall event stream. In this case, the visitor logic is aimed at the events for
the <xxx> element.
Simple Example
7
Figure 2.2. Implementing Visitor Logic
The visitor implementation is very simple as it consists of one method implementation per event. In
order to aim this implementation at the <xxx> element's visitBefore and visitAfter events,
create a Smooks configuration of the kind shown below.
This tutorial illustrates how Smooks (in conjunction with FreeMarker) can be used to perform an XML-
to-XML transformation on a huge message. Note that this tutorial can also be used as the basis for a
character-based transformation.
Note
FreeMarker is an extremely powerful templating engine. One rather useful feature is the ability to
create and use a NodeModel as the domain model for a templating operation. To this, Smooks
adds the ability to perform fragment-based templating transformations, as well as the power to
apply the model to huge messages.
Source Format:
<order id='332'>
<header>
<customer number="123">Joe</customer>
</header>
<order-items>
<order-item id='1'>
<product>1</product>
<quantity>2</quantity>
<price>8.80</price>
</order-item>

<!-- etc etc -->

</order-items>
</order>
Target Format:
<salesorder>
<details>
Chapter 2. Basics
8
<orderid>332</orderid>
<customer>
<id>123</id>
<name>Joe</name>
</customer>
<details>
<itemList>
<item>
<id>1</id>
<productId>1</productId>
<quantity>2</quantity>
<price>8.80</price>
<item>

<!-- etc etc -->

</itemList>
</salesorder>
Smooks Configuration
<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd">

<!--
Filter the message using the SAX Filter (i.e. not DOM, so no
intermediate DOM for the "complete" message - there are "mini" DOMs
for the NodeModels below)....
-->
<params>
<param name="stream.filter.type">SAX</param>
<param name="default.serialization.on">false</param>
</params>

<!--
Create 2 NodeModels. One high level model for the "order"
(header etc) and then one per "order-item".

These models are used in the FreeMarker templating resources
defined below. You need to make sure you set the selector such
that the total memory footprint is as low as possible. In this
example, the "order" model will contain everything accept the
<order-item> data (the main bulk of data in the message). The
"order-item" model only contains the current <order-item> data
(i.e. there's max 1 order-item in memory at any one time).
-->
<resource-config selector="order,order-item">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>

<!--
Apply the first part of the template when we reach the start
of the <order-items> element. Apply the second part when we
reach the end.

Note the <?TEMPLATE-SPLIT-PI?> Processing Instruction in the
template. This tells Smooks where to split the template,
resulting in the order-items being inserted at this point.
-->
<ftl:freemarker applyOnElement="order-items">
<ftl:template><!--<salesorder>
<details>
<orderid>${order.@id}</orderid>
<customer>
<id>${order.header.customer.@number}</id>
Smooks Resources
9
<name>${order.header.customer}</name>
</customer>
</details>
<itemList>
<?TEMPLATE-SPLIT-PI?>
</itemList>
</salesorder>--></ftl:template>
</ftl:freemarker>

<!--
Output the <order-items> elements. This will appear in the
output message where the <?TEMPLATE-SPLIT-PI?> token appears in the
order-items template.
-->
<ftl:freemarker applyOnElement="order-item">
<ftl:template><!-- <item>
<id>${.vars["order-item"].@id}</id>
<productId>${.vars["order-item"].product}</productId>
<quantity>${.vars["order-item"].quantity}</quantity>
<price>${.vars["order-item"].price}</price>
</item>
--></ftl:template>
</ftl:freemarker>

</smooks-resource-list>
Now, it must be executed:
Smooks smooks = new Smooks("smooks-config.xml");
try {
smooks.filterSource(new StreamSource(new FileInputStream("input-message.xml")), new
StreamResult(System.out));
} finally {
smooks.close();
}
The result is an XML-to-XML transformation.
This example has demonstrated how the "lower levels" of the Smooks programming model work. In
most cases, there will be no need to write large quantities of Java code for Smooks because it comes
with modules of pre-built functionality, ideal for many common usages. These modules are called
cartridges. Refer to Section 2.4, “Cartridges” for more information.
2.3. Smooks Resources
Smooks executes by taking a data stream of one format (XML, EDI, Java, JSON, CSV etc)
and generating an event stream. The event stream is then used to fire different types of "Visitor
logic" (Java, Groovy, FreeMarker, XSLT etc). The result of this process can be to produce a new data
stream in a different format (i.e. a traditional "transformation"), bind data from the source message
data stream to java objects to produce a populated Java object graph (i.e. a "Java binding"), produce
many smaller messages (message splitting), or route messages to desired destinations etc.
At a core level, Smooks sees all of the "Visitor logic" etc as "Smooks
Resources" (SmooksResourceConfiguration) that are configured to be applied based on an event
selector (i.e. event from the source data event stream). This is a very generic processing model and
makes a lot of sense from the point of view of Smooks Core and its architecture (maintenance etc).
However, it can be a little too generic from a usability perspective because everything looks very
similar in the configuration. To help with this, Smooks v1.1 introduced an "Extensible Configuration
Model" feature. This allows specific resource types (Javabean binding configs, FreeMarker template
configs etc) to be specified in the configuration using dedicated XSD namespaces of their own.
Chapter 2. Basics
10
Example 2.1. Java Binding Resource
<jb:bean beanId="lineOrder" class="example.trgmodel.LineOrder"
createOnElement="example.srcmodel.Order">
<jb:wiring property="lineItems" beanIdRef="lineItems" />
<jb:value property="customerId" data="header/customerNumber" />
<jb:value property="customerName" data="header/customerName" />
</jb:bean>
Example 2.2. FreeMarker Template Resource
<ftl:freemarker applyOnElement="order-item">
<ftl:template><!-- <item>
<id>${.vars["order-item"].@id}</id>
<productId>${.vars["order-item"].product}</productId>
<quantity>${.vars["order-item"].quantity}</quantity>
<price>${.vars["order-item"].price}</price>
</item>-->
</ftl:template>
</ftl:freemarker>
Important things to note from these examples:
• Configuration is both strongly "typed" and domain-specific. This makes it much easier to read.
• The configurations are XSD-based. This provides the user with auto-completion support within his or
her integrated development environment.
• There is no need to define the actual handler for the given resource type (such as the
BeanPopulator class for Java bindings.)
2.3.1. Selectors
Smooks Resource "selectors" are a very important part of Smooks and how it works. They instruct
Smooks as to which message fragments to apply configured Visitor logic to, as well working as a
simple lookup value for non-Visitor logic.
When the resource is a Visitor implementation, e.g. <jb:bean> or <ftl:freemarker>, Smooks will
interpret the selector as an XPath selector.
There are a number of things to be aware of:
1.The order in which the XPath expression is applied is the reverse of normal order expected.
Smooks works backwards from the targeted fragment element, as opposed to forwards from the
message root element.
2.Not all of the XPath specification is supported. Selector supports the following XPath syntax:
• text() and attribute (e.g. @x) value selectors, with both Literal and Numeric values.
E.g. "a/b[text() = 'abc']", "a/b[text() = 123]", "a/b[@id = 'abc']", and "a/
b[@id = 123]".
• text() is only supported on the last selector step in an expression. E.g. "a/b[text() =
'abc']" is permitted but "a/b[text() = 'abc']/c" is not.
Namespace Declaration
11
• text() is only supported on SAXVisitor implementations that implement the
SAXVisitAfter interface only. If the SAXVisitor implements the SAXVisitBefore or
SAXVisitChildren interfaces, an error will result.
• And and Or logical operations. E.g. "a/b[text() = 'abc' and @id = 123]", "a/
b[text() = 'abc' or @id = 123]".
• Namespaces on both the elements and attributes. E.g. "a:order/b:address[@b:city =
'NY']".
• This requires the namespace prefix-to-URI mappings to be defined. If not defined, a
configuration error will result. Refer to Section 2.3.2, “ Namespace Declaration” for more details.
• Supports the following operators:
• = (equals)
•!= (not equals)
• < (less than)
• > (greater than)
• Index selectors. E.g. "a/b[3]"
2.3.2. Namespace Declaration
Namespace prefix-to-URI mappings are configured through the core configuration namespace. These
configurations are then available when resolving namespaced selectors.
<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
<core:namespaces>
<core:namespace prefix="a" uri="http://a"/>
<core:namespace prefix="b" uri="http://b"/>
<core:namespace prefix="c" uri="http://c"/>
<core:namespace prefix="d" uri="http://d"/>
</core:namespaces>
<resource-config selector="c:item[@c:code = '8655']/d:units[text() = 1]">
<resource>com.acme.visitors.MyCustomVisitorImpl</resource>
</resource-config>
</smooks-resource-list>
2.4. Cartridges
In order that users can implement solutions quickly, Smooks includes pre-built, "ready-to-use" visitor
logic. This visitor logic is combined into groups known as cartridges.
A cartridge is simply a Java Archive (JAR) file that contains reusable content handlers. New cartridges
can be created to extend the basic functionality of the smooks-core. Each Smooks cartridge
provides "ready-to-use" support for either transformation or a specific form of XML analysis.
Here is a list of all the cartridges supported by Smooks:
• Calc: "milyn-smooks-calc"
Chapter 2. Basics
12
• CSV: "milyn-smooks-csv"
• Fixed length reader: "milyn-smooks-fixed-length"
• EDI: "milyn-smooks-edi"
• Javabean: "milyn-smooks-javabean"
• JSON: "milyn-smooks-json"
• Routing: "milyn-smooks-routing"
• Templating: "milyn-smooks-templating"
• CSS: "milyn-smooks-css"
• Servlet: "milyn-smooks-servlet"
• Persistence: "milyn-smooks-persistence"
• Validation: "milyn-smooks-validation"
2.5. Filtering Process Selection
This section explains the way in which Smooks selects a filtering process:
• If all of the visitor resources only implement the DOM visitor interfaces (DOMElementVisitor or
SerializationUnit), then the DOM processing model will be selected automatically.
• If all of the visitor resources only implement the SAX Visitor interface (SAXElementVisitor), then
the SAX processing model will be selected automatically.
• If the visitor resources implement both the DOM and the SAX interfaces, then the DOM processing
model will be selected by default, unless SAX is specified in the Smooks resource configuration.
In Smooks 1.3 this is done using <core:filterSettings type="SAX" />.
Note that visitor resources in this context do not include non-element visitor resources, such as
readers.
Example 2.3. Setting the filter type to SAX in Smooks 1.3
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
<core:filterSettings type="SAX" />
</smooks-resource-list>
More information about global parameters can be found in Section 2.8.1, “Global Configuration
Parameters”.
2.5.1. Mixing the DOM and SAX Models
The Document Object Model has the advantage of being easier to use than SAX at the level of coding,
because it allows you to use node traversal and other features. Employing the Document Object
Mixing the DOM and SAX Models
13
Model also allows you to take advantage of some pre-existing scripting and templating engines, such
as FreeMarker and Groovy, both of which have "built-in" support for DOM structures.
Unfortunately, it also has the disadvantage of being constrained by memory. This greatly limits its
ability to deal with extremely large messages.
The ability to mix the two models was added in Smooks v1.1 using the DomModelCreator visitor
class. When it is used in conjunction with SAX filtering, this visitor will construct a DOM fragment from
the visited element. Thus, you can use DOM utilities within a streaming environment.
When more than one model is nested, each inside the other, the outer models will never contain data
from the inner models; in other words, the same fragment will never co-exist inside two models. The
following example message demonstrates this principle:
<order id="332">
<header>
<customer number="123">Joe</customer>
</header>
<order-items>
<order-item id='1'>
<product>1</product>
<quantity>2</quantity>
<price>8.80</price>
</order-item>
<order-item id='2'>
<product>2</product>
<quantity>2</quantity>
<price>8.80</price>
</order-item>
<order-item id='3'>
<product>3</product>
<quantity>2</quantity>
<price>8.80</price>
</order-item>
</order-items>
</order>
The user can configure the DomModelCreator from within Smooks to create models for both the
order and order-item message fragments, as per the following code sample:
<resource-config selector="order,order-item">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>
In this case, the order will never contain model data for the order-item (because the order-
item elements are nested inside the order element.) The in-memory model for the order will be as
follows:
<order id='332'>
<header>
<customer number="123">Joe</customer>
</header>
<order-items />
</order>
There will never be more than one order-item model in memory at any one given time. Each new
model overwrites the previous one. The software was designed in this way in order to ensures that the
amount of memory being used is always kept to a minimum.
To summarize, the Smooks processing model is event-driven (The implication of this is that you
can "hook in" the visitor logic to be applied at different points of the filtering and streaming process.
Chapter 2. Basics
14
The visitor logic is then applied using the message's own content.) This means that you can take
advantage of the mixed-DOM-and-SAX processing model.
You can study the examples at the below URLs to gain an enhanced understanding of the mixed-
DOM-and-SAX approach:
• Groovy scripting: http://www.smooks.org/mediawiki/index.php?title=V1.3:groovy
• FreeMarker templating: http://www.smooks.org/mediawiki/index.php?title=V1.3:xml-to-xml
2.6. Checking the Smooks Execution Process
As Smooks performs the filtering process, it publishes those events which can both be captured and
programmatically-analyzed during, and after, execution. The easiest way to obtain an execution report
from Smooks is to configure the ExecutionContext class to generate it. (Smooks also supports the
generation of an HTML report using the HtmlReportGenerator class.)
The following example demonstrates how to configure Smooks to generate an HTML report:
Smooks smooks = new Smooks("/smooks/smooks-transform-x.xml");
ExecutionContext execContext = smooks.createExecutionContext();
execContext.setEventListener(new HtmlReportGenerator("/tmp/smooks-report.html"));
smooks.filterSource(execContext, new StreamSource(inputStream), new
StreamResult(outputStream));
The HtmlReportGenerator is a tool that can be of use if you are undertaking development work.
It is the closest thing that Smooks has, at present, to an IDE-based debugger. (A "proper" debugger
will be included in a future release.) The HtmlReportGenerator tool is very useful when you
are trying to diagnose issues or simply trying to gain an understanding of an aspect of a particular
transformation.
An example of a report created with the HtmlReportGenerator class is displayed on this web page:
http://www.milyn.org/docs/smooks-report/report.html
Alternatively, you can create a custom ExecutionEventListener implementation. Refer to the
javadocs for more information about this subject.
2.7. Terminating the Filtering Process
Sometimes you want/need to terminate the Smooks filtering process before reaching the end
of a message. This can be done by using a <core:terminate> configuration in the Smooks
configuration. This configuration only works for the SAX filter - it doesn't really make sense to add it for
DOM.
The following is an example configuration that terminates filtering at the end of the customer fragment
of the message:
<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
<!-- Visitors... -->
<core:terminate onElement="customer" />
</smooks-resource-list>
Global Configurations
15
The default behavior is to terminate at the end of the targeted fragment, i.e. on the visitAfter
event. To terminate at the start, on the visitBefore event:
<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
<!-- Visitors... -->
<core:terminate onElement="customer" terminateBefore="true" />
</smooks-resource-list>
2.8. Global Configurations
Global configuration settings are options that need be set once only and are available to all the
resources within a configuration.
Smooks supports two types of global setting, these being default properties and global
parameters.
Global Configuration Parameters
<param> elements can be specified in every <resource-config>. These parameter values will
either be available at run-time through the SmooksResourceConfiguration or, if not, they will
be injected through the @ConfigParam annotation.
Global Configuration Parameters are defined in one place. All run-time components can access
them by using the ExecutionContext.
Default Properties
Default Properties specify the default values for <resource-config> attributes. These
properties are automatically applied to the SmooksResourceConfiguration class when the
corresponding <resource-config> does not specify a value for the attribute.
Note
Refer to the javadocs for more information on the
org.milyn.cdr.SmooksResourceConfiguration class and the
org.milyn.container.ExecutionContext interface.
2.8.1. Global Configuration Parameters
Global properties differ from the defaults in that they are not specified on the root element and are not
automatically applied to resources.
Global parameters are specified in a <params> element:
<params>
<param name="xyz.param1">param1-val</param>
</params>
Global Configuration Parameters are accessible using the ExecutionContext.
public void visitAfter(
final Element element, final ExecutionContext executionContext)
Chapter 2. Basics
16
throws SmooksException
{
String param1 = executionContext.getConfigParameter(
"xyz.param1", "defaultValueABC");
....
}
2.8.2. Default Properties
Default properties are those that can be set on the root element of a Smooks configuration and have
them applied to all resource configurations in the smooks-conf.xml file.
For example, if all the resource configurations have the same selector value, you could specify a
default-selector=order instead of specifying the selector on on every resource configuration:
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:xsl="http://www.milyn.org/xsd/smooks/xsl-1.1.xsd"
default-selector="order">
<resource-config>
<resource>com.acme.VisitorA</resource>
...
</resource-config>
<resource-config>
<resource>com.acme.VisitorB</resource>
...
</resource-config>
<smooks-resource-list>
The following default configuration options are available:
default-selector
Selector that will be applied to all resource-config elements in the Smooks configuration file, where
a selector is not defined.
default-selector-namespace
The default selector name-space, where a name-space is not defined.
default-target-profile
Default target profile that will be applied to all resources in the Smooks configuration file, where a
target-profile is not defined.
default-condition-ref
Refers to a global condition by the conditions identifier. This condition is applied to resources that
define an empty "condition" element (i.e. <condition/>) that does not reference a globally defined
condition.
2.9. Filter Settings
The configuration of filtering is done using the smooks-core configuration namespace — http://
www.milyn.org/xsd/smooks/smooks-core-1.3.xsd — introduced in Smooks v1.3.
Example 2.4. Example configuration
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
Filter Settings
17
xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd">
<core:filterSettings type="SAX" defaultSerialization="true"
terminateOnException="true" readerPoolSize="3" closeSource="true"
closeResult="true" rewriteEntities="true" />
.. Other visitor configs etc...
</smooks-resource-list>
type
Determines the type of processing model that will be used. Either SAX or DOM. Please refer
to Section 2.5, “Filtering Process Selection” for more information about the processing models.
Default is DOM.
defaultSerialization
Determines if default serialization should be switched on. The default value is true. Default
serialization being turned on tells Smooks to locate a StreamResult (or DOMResult) in the
Result objects provided to the Smooks.filterSource method and to, by default, serialize all
events to that Result.
This behavior can be turned off using this global configuration parameter and can
be overridden on a per fragment basis by targeting a Visitor implementation at that
fragment that either takes ownership of the Result writer (when using SAX filtering)
or modifies the DOM (when using DOM filtering). As an example of this, see to
org.milyn.templating.freemarker.FreeMarkerTemplateProcessor in the JavaDocs.
terminateOnException
Determines whether an exception should terminate processing. Defaults to true.
closeSource
Close Source instance streams passed to the Smooks.filterSource method (default true).
The exception here is System.in, which will never be closed.
closeResult
Close Result streams passed to the Smooks.filterSource method (default true). The
exception here is System.out and System.err, which will never be closed.
rewriteEntities
Rewrite XML entities when reading and writing (default serialization) XML.
readerPoolSize
Reader Pool Size. Some Reader implementations are very expensive to create (e.g. Xerces).
Pooling Reader instances (i.e. reusing) can result in a huge performance improvement, especially
when processing lots of "small" messages. The default value for this setting is 0 (i.e. not pooled
- a new Reader instance is created for each message). Configure in line with your applications
threading model.
18
Chapter 3.
19
Extending Smooks
All existing Smooks functionality (Java Binding, EDI processing etc) is built through extension of a
number of well defined APIs. We will look at these APIs in the coming sections.
The main extension points/APIs in Smooks are Reader and Visitor APIs:
Reader APIs
Those for processing Source/Input data (Readers) so as to make it consumable by other Smooks
components as a series of well defined hierarchical events (based on the SAX event model) for all
of the message fragments and sub-fragments.
Visitor APIs
Those for consuming the message fragment SAX Events produced by a Source/Input Reader.
Another very important aspect of writing Smooks extensions is how these components are configured.
Because this is common to all Smooks components, we will look at this first.
3.1. Configuring Smooks Components
All Smooks components are configured in exactly the same way. As far as the Smooks Core
code is concerned, all Smooks components are "resources" and are configured using a
SmooksResourceConfiguration instance, which we talked about in earlier sections.
Smooks provides mechanisms for constructing namespace (XSD) specific XML configurations
for components, but the most basic configuration (and the one that maps directly to the
SmooksResourceConfiguration class) is the basic <resource-config> XML configuration from the base
configuration namespace (http://www.milyn.org/xsd/smooks-1.1.xsd).
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd">
<resource-config selector="">
<resource></resource>
<param name=""></param>
</resource-config>
</smooks-resource-list>
• The selector attribute is the mechanism by which the resource is "selected" e.g. can be an XPath
for a Visitor implementation.
• The resource element is the actual resource. This can be a Java Class name or some other
form of resource such as a template. The resource is assumed to be a Java class name for the
remainder for this section.
• The param elements are configuration parameters for the resource defined in the resource element.
Smooks takes care of all the details of creating the runtime representation of the resource (e.g.
constructing the class named in the the resource element) and injecting all the configuration
parameters. It also works out what the resource type is, and from that, how to interpret things like the
selector e.g. if the resource is a Visitor instance, it knows the selector is an XPath, selecting a Source
message fragment.
3.1.1. Configuration Annotations
After your component has been created, you need to configure it with the <param> element details.
This is done using the @ConfigParam and @Config Annotations.
Chapter 3. Extending Smooks
20
3.1.1.1. @ConfigParam
The @ConfigParam annotation reflectively injects the named parameter from the <param> elements
that have the same name as the annotated property itself. The name can be different but the default
behavior matches against the name of the component property.
This annotation eliminates a lot of noisy code from your component because it:
• Handles decoding of the <param> value before setting it on the annotated component property.
Smooks provides DataDecoders for all of the main types (int, Double, File, Enums etc), but you can
implement and use a custom DataDecoder where the out of the box decoders don't cover specific
decoding requirements e.g. @ConfigParam(decoder = MyQuirkyDataDecoder.class).
Smooks will automatically use your custom decoder (i.e. you won't need to define the decoder
property on this annotation) if it is registered. See the DataDecoder Javadocs for details on
registering a DataDecoder implementation such that Smooks will automatically locate it for decoding
a specific data type.
• Supports a choice constraint for the config property, generating a configuration exception
where the configured value is not one of the defined choice values. For example, you may have a
property which has a constrained value set of ON and OFF. You can use the choice property on this
annotation to constrain the config, raise exceptions etc e.g. @ConfigParam(choice = {"ON",
"OFF"}).
• Can specify default config values e.g. @ConfigParam(defaultVal = "true").
• Can specify whether or not the property config value is required or optional e.g.
@ConfigParam(use = Use.OPTIONAL). By default, all properties are REQUIRED, but setting a
defaultVal implicitly marks the property as being OPTIONAL.
Example 3.1. Using @ConfigParam
This example show the annotated component DataSeeder and its corresponding Smooks
configuration.
public class DataSeeder
{
@ConfigParam
private File seedDataFile;
public File getSeedDataFile()
{
return seedDataFile;
}
// etc...
}
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd">
<resource-config selector="dataSeeder">
<resource>com.acme.DataSeeder</resource>
<param name="seedDataFile">./seedData.xml</param>
</resource-config>
</smooks-resource-list>
3.1.1.2. @Config
The @Config annotation reflectively injects the full SmooksResourceConfiguration
instance, associated with the component resource, onto the annotated component property.
Configuration Annotations
21
Obviously an error will result if this annotation is added to a component property that is not of type
SmooksResourceConfiguration.
Example 3.2. Using @Config
public class MySmooksComponent
{
@Config
private SmooksResourceConfiguration config;
// etc...
}
3.1.1.3. @Initialize and @Uninitialize
The @ConfigParam annotation is great for configuring your component with simple values, but
sometimes your component needs more involved configuration for which we need to write some
"initialization" code. For this, Smooks provides the @Initialize annotation.
On the other side of this, there are times when we need to undo work performed during initialization
when the associated Smooks instance is being discarded (garbage collected) e.g. to release
some resources acquired during initialization etc. For this, Smooks provides the @Uninitialize
annotation.
The basic initialization/un-initialization sequence can be described as follows:
smooks = new Smooks(..);
// Initialize all annotated components
@Initialize
// Use the smooks instance through a series of filterSource invocations...
smooks.filterSource(...);
smooks.filterSource(...);
smooks.filterSource(...);
... etc ...
smooks.close();
// Uninitialize all annotated components
@Uninitialize
Example 3.3. Using @Initialize and @Uninitialize
In this example, assume we have a component that opens multiple connections to a database on
initialization and then needs to release all those database resources when we close the Smooks
instance.
public class MultiDataSourceAccessor
{
@ConfigParam
private File dataSourceConfig;
Map<String, Datasource> datasources = new HashMap<String, Datasource>();
@Initialize
public void createDataSources()
{
// Add DS creation code here....
// Read the dataSourceConfig property to read the DS configs...
}
Chapter 3. Extending Smooks
22
@Uninitialize
public void releaseDataSources()
{
// Add DS release code here....
}
// etc...
}
The following points need to be noted when using the @Initialize and @Uninitialize
annotations:
• The @Initialize and @Uninitialize methods must be public, zero-arg methods.
• The @ConfigParam properties are all initialized before the first @Initialize method is called.
Therefore, you can use the @ConfigParam component properties as input to the initialization
process.
• The @Uninitialize methods are all called in response to a call to the Smooks.close method.
3.1.1.4. Defining Custom Configuration Namespaces
Smooks supports a mechanism for defining custom configuration namespaces for components. This
allows you to support custom, XSD-based, configurations for your components that can be validated
instead of treating them all as generic Smooks resources using the <resource-config> base
configuration.
The basic process involves two steps.
1.Writing an configuration XSD for your component that extends the base http://
www.milyn.org/xsd/smooks-1.1.xsd configuration namespace. This XSD must be supplied
on the classpath with your component. It must be located in the /META-INF/ folder and have the
same path as the namespace URI. For example, if your extended namespace URI is http://
www.acme.com/schemas/smooks/acme-core-1.0.xsd, then the physical XSD file must be
supplied on the classpath in /META-INF/schemas/smooks/acme-core-1.0.xsd.
2.Writing a Smooks configuration namespace mapping configuration file that maps the custom
namespace configuration into a SmooksResourceConfiguration instance. This file must
be named (by convention) based on the name of the namespace it is mapping and must be
physically located on the classpath in the same folder as the XSD. Extending the above example,
the Smooks mapping file would be /META-INF/schemas/smooks/acme-core-1.0.xsd-
smooks.xml. Note the -smooks.xml postfix.
The easiest way to get familiar with this mechanism is by looking at existing extended namespace
configurations within the Smooks code itself. All Smooks components (including the Java Binding
functionality) use this mechanism for defining their configurations. Smooks Core itself defines a
number of extended configuration namesaces.
3.2. Implementing a Source Reader
Implementing and configuring a new Source Reader for Smooks is straightforward. The Smooks
specific parts of the process are easy and are not really the issue. The level of effort involved is a
function of the complexity of the Source data format for which you are implementing the reader.
Implementing a Reader for your custom data format immediately opens all Smooks capabilities to
that data format e.g. Java Binding, Templating, Persistence, Validation, Splitting & Routing etc. So
Implementing a Source Reader
23
a relatively small investment can yield a quite significant return. The only Smooks requirement is
that the Reader implements the standard org.xml.sax.XMLReader interface from the Java JDK.
However, if you want to be able to configure the Reader implementation, it needs to implement the
org.milyn.xml.SmooksXMLReader interface. org.milyn.xml.SmooksXMLReader is an
extension of org.xml.sax.XMLReader. You can easily use an existing org.xml.sax.XMLReader
implementation, or implement a new one.
Refer to http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/XMLReader.html for more details.
Let's now look at a simple example of implementing a Reader for use with Smooks. In this example,
we will implement a Reader that can read a stream of Comma Separated Value (CSV) records,
converting the CSV stream into a stream of SAX events that can be processed by Smooks, allowing
you to do all the things Smooks allows.
We start by implementing the basic Reader class:
public class MyCSVReader implements SmooksXMLReader
{
// Implement all of the XMLReader methods...
}
Two methods from the org.xml.sax.XMLReader interface are of particular interest:
1.setContentHandler(ContentHandler) is called by Smooks Core. It
sets the org.xml.sax.ContentHandler instance for the reader. The
org.xml.sax.ContentHandler instance methods are called from inside the
parse(InputSource) method.
2.parse(InputSource) : This is the method that receives the Source data input stream,
parses it (i.e. in the case of this example, the CSV stream) and generates the SAX event
stream through calls to the org.xml.sax.ContentHandler instance supplied in the
setContentHandler(ContentHandler) method.
Refer to http://download.oracle.com/javase/6/docs/api/org/xml/sax/ContentHandler.html for more
details.
We need to configure our CSV reader with the names of the fields associated with the CSV records.
Configuring a custom reader implementation is the same as for any Smooks component, as described
in Section 3.1, “Configuring Smooks Components”.
So focusing a little more closely on the above methods and our fields configuration:
public class MyCSVReader implements SmooksXMLReader
{
private ContentHandler contentHandler;
@ConfigParam
private String[] fields; // Auto decoded and injected from the "fields" <param> on the
reader config.
public void setContentHandler(ContentHandler contentHandler) {
this.contentHandler = contentHandler;
}
public void parse(InputSource csvInputSource) throws IOException, SAXException {
// TODO: Implement parsing of CSV Stream...
}
// Other XMLReader methods...
Chapter 3. Extending Smooks
24
}
So now we have our basic Reader implementation stub. We can start writing unit tests to test the new
reader implementation.
First thing we need is some sample CSV input. Lets use a simple list of names in a file with the name
names.csv:
Tom,Jones
Mike,Jones
Mark,Jones
Second thing we need is a test Smooks configuration to configure Smooks with our MyCSVReader. As
stated before, everything in Smooks is a resource and can be configured with the basic <resource-
config> configuration. While this works fine, it's a little noisy, so Smooks provides a basic <reader>
configuration element specifically for the purpose of configuring a reader. The configuration for our test
looks like the following, in the mycsvread-config.xml:
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd">
<reader class="com.acme.MyCSVReader">
<params>
<param name="fields">firstname,lastname</param>
</params>
</reader>
</smooks-resource-list>
And of course we need the JUnit test class:
public class MyCSVReaderTest extends TestCase
{

public void test() {
Smooks smooks = new Smooks(getClass().getResourceAsStream("mycsvread-config.xml"));
StringResult serializedCSVEvents = new StringResult();
smooks.filterSource(new StreamSource(getClass().getResourceAsStream("names.csv")),
serializedCSVEvents);
System.out.println(serializedCSVEvents);
// TODO: add assertions etc
}
}
So now we have a basic setup with our custom Reader implementation, as well as a unit test that we
can use to drive our development. Of course, our reader parse method is not doing anything yet and
our test class is not making any assertions etc. So lets start implementing the parse method:
public class MyCSVReader implements SmooksXMLReader
{
private ContentHandler contentHandler;
@ConfigParam
private String[] fields; // Auto decoded and injected from the "fields" <param> on the
reader config.
public void setContentHandler(ContentHandler contentHandler)
{
this.contentHandler = contentHandler;
}
Implementing a Source Reader
25
public void parse(InputSource csvInputSource) throws IOException, SAXException
{
BufferedReader csvRecordReader = new
BufferedReader(csvInputSource.getCharacterStream());
String csvRecord;
// Send the start of message events to the handler...
contentHandler.startDocument();
contentHandler.startElement(XMLConstants.NULL_NS_URI, "message-root", "", new
AttributesImpl());
csvRecord = csvRecordReader.readLine();
while(csvRecord != null)
{
String[] fieldValues = csvRecord.split(",");

// perform checks...
// Send the events for this record...
contentHandler.startElement(XMLConstants.NULL_NS_URI, "record", "", new
AttributesImpl());
for(int i = 0; i < fields.length; i++)
{
contentHandler.startElement(XMLConstants.NULL_NS_URI, fields[i], "", new
AttributesImpl());
contentHandler.characters(fieldValues[i].toCharArray(), 0,
fieldValues[i].length());
contentHandler.endElement(XMLConstants.NULL_NS_URI, fields[i], "");

}
contentHandler.endElement(XMLConstants.NULL_NS_URI, "record", "");
csvRecord = csvRecordReader.readLine();
}
// Send the end of message events to the handler...
contentHandler.endElement(XMLConstants.NULL_NS_URI, "message-root", "");
contentHandler.endDocument();
}
// Other XMLReader methods...
}
If you run the unit test class now, you should see the following output on the console (formatted):
<message-root>
<record>
<firstname>Tom</firstname>
<lastname>Jones</lastname>
</record>
<record>
<firstname>Mike</firstname>
<lastname>Jones</lastname>
</record>
<record>
<firstname>Mark</firstname>
<lastname>Jones</lastname>
</record>
</message-root>
After this, it is a case of expanding the tests, hardening the reader implementation code etc.
Now you can use your reader to perform all sorts of operations supported by Smooks. As an example,
the following configuration (java-binding-config.xml) could be used to bind the names into a
List of PersonName objects:
Chapter 3. Extending Smooks
26
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:jb="http://
www.milyn.org/xsd/smooks/javabean-1.4.xsd">
<reader class="com.acme.MyCSVReader">
<params>
<param name="fields">firstname,lastname</param>
</params>
</reader>
<jb:bean beanId="peopleNames" class="java.util.ArrayList" createOnElement="message-root">
<jb:wiring beanIdRef="personName" />
</jb:bean>
<jb:bean beanId="personName" class="com.acme.PersonName" createOnElement="message-root/
record">
<jb:value property="first" data="record/firstname" />
<jb:value property="last" data="record/lastname" />
</jb:bean>
</smooks-resource-list>
And then a test for this configuration could look as follows:
public class MyCSVReaderTest extends TestCase
{
public void test_java_binding()
{
Smooks smooks = new Smooks(getClass().getResourceAsStream("java-binding-
config.xml"));
JavaResult javaResult = new JavaResult();
smooks.filterSource(new StreamSource(getClass().getResourceAsStream("names.csv")),
javaResult);
List<PersonName> peopleNames = (List<PersonName>) javaResult.getBean("peopleNames");
// TODO: add assertions etc
}
}
Refer to Chapter 4, Java Binding for more information on Java Binding.
• Reader instances are never used concurrently. Smooks Core will create a new instance for every
message, or, will pool and reuse instances as per the readerPoolSize FilterSettings
property.
Refer to Section 2.9, “ Filter Settings”.
• If your Reader requires access to the Smooks ExecutionContext for the current filtering context,
your Reader needs to implement the org.milyn.xml.SmooksXMLReader interface.
• If your Source data is a binary data stream your Reader must implement the
org.milyn.delivery.StreamReader interface.
• You can configure your reader within your source code (e.g. in your unit tests) using a
GenericReaderConfigurator instance, which you then set on the Smookscode> instance.
• While the basic <reader> configuration is fine, it is possible to define a custom configuration
namespace (XSD) for your custom CSV Reader implementation. This topic is not covered
here. Review the source code to see the extended configuration namespace for the Reader
implementations supplied with Smooks, e.g. the EDIReader, CSVReader, JSONReader etc. From
this, you should be able to work out how to do this for your own custom Reader.
Implementing a Binary Source Reader
27
3.2.1. Implementing a Binary Source Reader
It is also possible to implement a Source Reader for a binary data source. In this case your reader
must implement the org.milyn.delivery.StreamReader interface. This is just a marker interface
that tells the Smooks runtime to ensure that an InputStream is supplied.
The binary Reader implementation is essentially the same as a non-binary Reader implementation
(see above), except that the implementation of the the parse method should use the
InputStream from the InputSource (i.e. call InputSource..getByteStream() instead of
InputSource.getCharacterStream()) and generate the XML events from the decoded binary
data.
Example 3.4. Implementing a Binary Source Reader
A simple parse method implementation looks like this:
public static class BinaryFormatXXReader implements SmooksXMLReader, StreamReader
{
@ConfigParam
private String xProtocolVersion;
@ConfigParam
private int someOtherXProtocolConfig;
// etc...
public void parse(InputSource inputSource) throws IOException, SAXException {
// Use the InputStream (binary) on the InputSource...
InputStream binStream = inputSource.getByteStream();
// Create and configure the data decoder...
BinaryFormatXDecoder xDecoder = new BinaryFormatXDecoder();
xDecoder.setProtocolVersion(xProtocolVersion);
xDecoder.setSomeOtherXProtocolConfig(someOtherXProtocolConfig);
xDecoder.setXSource(binStream);
// Generate the XML Events on the contentHandler...
contentHandler.startDocument();
// Use xDecoder to fire startElement, endElement etc events on the contentHandler
(see previous section)...
contentHandler.endDocument();
}
// etc....
}
Configuring the BinaryFormatXXReader reader in your Smooks configuration would be the same
as for any other reader (as outlined in previous section):
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd">
<reader class="com.acme.BinaryFormatXXReader">
<params>
<param name="xProtocolVersion">2.5.7</param>
<param name="someOtherXProtocolConfig">1</param>
... etc...
</params>
</reader>
... Other Smooks configurations e.g. <jb:bean> configs for binding the binary data
into Java objects...
Chapter 3. Extending Smooks
28
</smooks-resource-list>
And then the Smooks execution code (note the InputStream supplied to the StreamSource). In
this case, we're generating two results: XML and Java objects.
StreamResult xmlResult = new StreamResult(xmlOutWriter);
JavaResult javaResult = new JavaResult();
InputStream xBinaryInputStream = getXByteStream();
smooks.filterSource(new StreamSource(xBinaryInputStream), xmlResult, javaResult);
// etc... Use the beans in the javaResult...
3.3. Implementing a Fragment Visitor
Visitor implementations are the workhorse of Smooks. Most of the out-of-the-box functionality in
Smooks (Java Binding, Templating, Persistence etc) was created by creating one or more Visitor
implementations. Visitor implementations often collaborate through the ExecutionContext and
ApplicationContext context objects, accomplishing a common goal by working together.
Smooks supports two types of Visitor implementation:
1.SAX-based implementations based on the org.milyn.delivery.sax.SAXVisitor sub-
interfaces.
2.DOM-based implementations based on the org.milyn.delivery.dom.DOMVisitor sub-
interfaces.
Your implementation can support both SAX and DOM, but we recommend implementing a SAX only
Visitor. SAX-based implementations are usually easier to create, and usually perform faster. For these
reasons, we will concentrate on the SAX only here.
Important
All Visitor implementations are treated as stateless objects. A single Visitor instance must
be usable concurrently across multiple messages i.e. across multiple concurrent calls to the
Smooks.filterSource method. All state associated with the current Smooks.filterSource
execution must be stored in the ExecutionContext.
Refer to Section 3.3.6, “ExecutionContext and ApplicationContext” for more details.
3.3.1. The SAX Visitor API
The SAX Visitor API is made up of a number of interfaces. These interfaces are based on the
org.xml.sax.ContentHandler SAX events that a SAXVisitor implementation can capture and
processes. Depending on the use case being solved with the SAXVisitor implementation, you may
need to implement one or all of these interfaces.
org.milyn.delivery.sax.SAXVisitBefore
Captures the startElement SAX event for the targeted fragment element:
public interface SAXVisitBefore extends SAXVisitor
{
Text Accumulation
29
void visitBefore(SAXElement element, ExecutionContext executionContext)
throws SmooksException, IOException;
}
org.milyn.delivery.sax.SAXVisitChildren
Captures the character based SAX events for the targeted fragment element, as well as Smooks
generated (pseudo) events corresponding to the startElement events of child fragment
elements:
public interface SAXVisitChildren extends SAXVisitor
{
void onChildText(SAXElement element, SAXText childText, ExecutionContext
executionContext) throws SmooksException, IOException;
void onChildElement(SAXElement element, SAXElement childElement,
ExecutionContext executionContext) throws SmooksException, IOException;
}
org.milyn.delivery.sax.SAXVisitAfter
Captures the endElement SAX event for the targeted fragment element:
public interface SAXVisitAfter extends SAXVisitor
{
void visitAfter(SAXElement element, ExecutionContext executionContext)
throws SmooksException, IOException;
}
As a convenience for those implementations that need to capture all of the SAX
events, the above three interfaces are pulled together into a single interface in the
org.milyn.delivery.sax.SAXElementVisitor interface.
Illustrating these events using a piece of XML:
<message>
<target-fragment> <!-- SAXVisitBefore.visitBefore -->
Text!! <!-- SAXVisitChildren.onChildText -->
<child> <!-- SAXVisitChildren.onChildElement -->
</child>
</target-fragment> <!-- SAXVisitAfter.visitAfter -->
</message>
The above is an illustration of a Source message event stream as XML. It could be EDI, CSV, JSON,
or some other format. Consider it to be a Source message event stream, serialized as XML for easy
reading.
As can be seen from the above SAX interfaces, the org.milyn.delivery.sax.SAXElement
type is passed in all method calls. This object contains details about the targeted fragment element,
including attributes and their values. It also contains methods for managing text accumulation, as
well as accessing the Writer associated with any StreamResult instance that may have been
passed in the Smooks.filterSource(Source, Result) method call. We'll see more on text
accumulation and StreamResult writing in the coming sections.
3.3.2. Text Accumulation
SAX is a stream based processing model. It doesn't create a Document Object Model (DOM) or
"accumulate" event data in any way. This is why it is a suitable processing model for processing huge
message streams.
Chapter 3. Extending Smooks
30
The org.milyn.delivery.sax.SAXElement will always contain attribute data associated
with the targeted element, but will not contain the fragment child text data, whose SAX events (
SAXVisitChildren.onChildText ) occur between the SAXVisitBefore.visitBefore
and SAXVisitAfter.visitAfter events (see above illustration). The text events are not
accumulated on the SAXElement because, as already stated, that could result in a significant
performance drain. The downside to this is that if the SAXVisitor implementation needs access to
the text content of a fragment, you need to explicitly tell Smooks to accumulate text for the targeted
fragment. This is done by calling the SAXElement.accumulateText method from inside the
SAXVisitBefore.visitBefore method implementation of your SAXVisitor.
public class MyVisitor implements SAXVisitBefore, SAXVisitAfter
{
public void visitBefore(SAXElement element, ExecutionContext executionContext)
throws SmooksException, IOException
{
element.accumulateText();
}
public void visitAfter(SAXElement element, ExecutionContext executionContext)
throws SmooksException, IOException
{
String fragmentText = element.getTextContent();
// ... etc ...
}
}
The @TextConsumer annotation that can be used to annotate your SAXVisitor implementation
instead of implement the SAXVisitBefore.visitBefore method.
@TextConsumer
public class MyVisitor implements SAXVisitAfter
{
public void visitAfter(SAXElement element, ExecutionContext executionContext)
throws SmooksException, IOException
{
String fragmentText = element.getTextContent();
// ... etc ...
}
}
Note that all of the fragment text will not be available until the SAXVisitAfter.visitAfter event.
3.3.3. StreamResult Writing/Serialization
The Smooks.filterSource(Source, Result) method can take one or
more of a number of different Result type implementations, one of which is the
javax.xml.transform.stream.StreamResult class. Smooks streams the Source in and back
out again through the StreamResult instance. Refer to Chapter 13, Multiple Outputs/Results.
By default, Smooks will always serialize the full Source event stream as XML to any StreamResult
instance provided to the Smooks.filterSource(Source, Result) method. If the Source
provided to the Smooks.filterSource(Source, Result) method is an XML stream and a
StreamResult instance is provided as one of the Result instances, the Source XML will be written
out to the StreamResult unmodified, unless the Smooks instance is configured with one or more
SAXVisitor implementations that modify one or more fragments.
The default serialization behavior can be turned on or off by configuring the filter settings. Refer to
Section 2.9, “ Filter Settings”.
StreamResult Writing/Serialization
31
If you want to modify the serialized form of one of the message fragments you need to implement a
SAXVisitor to perform the transformation and target it at the message fragment using an XPath-like
expression.
Note
Of course, you can also modify the serialized form of a message fragment using one of the
provided templating components. These components are also SAXVisitor implementations.
Refer to Chapter 5, Templates.
The key to implementing a SAXVisitor geared towards transforming the serialized form of a
fragment is telling Smooks that the SAXVisitor implementation in question will be writing to the
StreamResult. You need to tell Smooks this because Smooks supports targeting of multiple
SAXVisitor implementations at a single fragment, but only one SAXVisitor is allowed to write to
the StreamResult, per fragment. If a second SAXVisitor attempts to write to the StreamResult,
a SAXWriterAccessException will result and you will need to modify your Smooks configuration.
In order to be the one that writes to the StreamResult, the SAXVisitor needs
to "acquire ownership" of the Writer to the StreamResult. It does this by simply
making a call to the SAXElement.getWriter(SAXVisitor) method from inside the
SAXVisitBefore.visitBefore methods implementation, passing this as the SAXVisitor