Camel split() can encode incorrectly on multi-byte files in a single-byte locale

Solution Verified - Updated -

Issue

This strange defect affects camel split() operations applied to (at least) file and JMS input messages, in particular locale/encoding combinations.

Consider a Camel route of the following form, which splits XML files that are encoded in UTF-8, and writes a new file also encoded in UTF-8:

   from ("file://in?charset=utf-8")
        .convertBodyTo (String.class)
        .split().method(...) 
        .to ("log://foo")
        .to ("file://out?charset=utf-8");

There are various ways in which the split() operation can be specified, and it is unclear exactly which formulations show the problem -- conceivably they all do.

Because the input and output encodings are both specified, it ought to be the case that the JVM encoding is irrelevant. But, in fact, in Camel versions prior to 2.15 the file will be written out in the JVM encoding, not the specified output encoding. That the problem is caused by the split can easily be seen, because removing this operation allows the file to be written correctly. The log() operation may well show what appears to be correct encoding, despite the file being written incorrectly.

The JVM encoding might be picked up from the environment, or specified directly using a command line switch such as -Dfile.encoding=iso-8859-1 -- the result is the same. The problem was first brought to Red Hat's attention in an application running under JBoss EAP, but it can be see in stand-alone and Fuse-based Camel applications as well.

Environment

  • Fuse 6.1.x
  • Camel 2.12.x
  • Camel 2.13.x
  • Single-byte JVM locale (e.g., ISO-8859-1)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.