Are CXF applications at risk of security weaknesses related to over-long UTF-8 encoding?

Solution Verified - Updated -

Issue

An oddity of UTF-8 encoding is that characters can be encoded in a number of different ways. In a sense, these different representations are similar to the different ways that an ordinary number can be written: the number '4', for example, can equivalently be written '04' and '004'. The implementation details in UTF-8 are more complex than this, but the principle is the same. This peculiarity does not apply to UTF-32, because it is a fixed-length encoding, nor to UTF-16, because it uses either two or four bytes.

Using more than the minimum number of bytes to encode a UTF-8 character is known as "over-long encoding," and is generally considered bad practice. Probably the only legitimate reason to use an over-long encoding is to encode the null (zero) character as two-bytes rather than the plain zero byte, because a lot of software in libraries will interpret the zero as a terminator.

Potential security weaknesses arise because it is very easy to make programming errors when handling UTF-8 data. Because this encoding is, in practical applications, very similar to those encodings that use one byte per character (ASCII, ISO8859), it is easy to overlook the fact that there are multiple representation for the same character. The archetypal example of a mistake of this kind is to search a URL for unwanted characters (typically '/' and '.') by considering the data on a byte-by-byte basis. These characters can, and should, be represented as a single byte, but that is not the only possible representation in UTF-8. The risk is that the programmer thinks that input has been sanitised when, in fact, unwanted characters can still be introduced by an intruder by representing them in over-long form.

Inputs to a CXF application are typically XML, and nearly always encoded as UTF-8. The question therefore arises whether an intruder might use over-long UFT-8 encoding to create a security weakness.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.