Satellite-sync fails with error : " UnicodeDecodeError: 'ascii' codec can't decode byte 0xbe in position 2851: ordinal not in range(128) "

Solution Unverified - Updated -

Environment

  • Red Hat Satellite 5.5
  • Red Hat Satellite 5.6
  • Red Hat Satellite 5.7
  • Satellite-sync

Issue

  • satellite-sync fails with below error on Red Hat Satellite :-
SYNC ERROR: unhandled exception occurred:

Exception reported from satellite.example.com
Time: Wed Jan 30 16:46:07 2013
Exception type <type 'exceptions.UnicodeDecodeError'>

Exception Handler Information
Traceback (most recent call last):
  File "/usr/bin/satellite-sync", line 139, in main
    return satsync.Runner().main()
  File "/usr/lib/python2.6/site-packages/spacewalk/satellite_tools/satsync.py", line 225, in main
    ret = method()
  File "/usr/lib/python2.6/site-packages/spacewalk/satellite_tools/satsync.py", line 320, in _step_download_packages
    return self.syncer.download_package_metadata()
  File "/usr/lib/python2.6/site-packages/spacewalk/satellite_tools/satsync.py", line 1097, in download_package_metadata
    stream_loader.process, is_slow=True)
  File "/usr/lib/python2.6/site-packages/spacewalk/satellite_tools/satsync.py", line 1572, in _proces_batch
    prompt, nevermorethan, process_function_args)
  File "/usr/lib/python2.6/site-packages/spacewalk/satellite_tools/satsync.py", line 1552, in _processWithProgressBar
    process_function(chunk, *process_function_args)
  File "/usr/lib/python2.6/site-packages/spacewalk/satellite_tools/satsync.py", line 1996, in process
    self.handler.process(stream)
  File "/usr/lib/python2.6/site-packages/spacewalk/satellite_tools/xmlSource.py", line 135, in process
    Traceback(ostream=sys.stderr, with_locals=1)
  File "/usr/lib/python2.6/site-packages/spacewalk/common/rhnTB.py", line 174, in Traceback
    outstring = exc.getvalue()
  File "/usr/lib64/python2.6/StringIO.py", line 270, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbe in position 2851: ordinal not in range(128)
  • satellite-sync fails with error Exception type <type 'exceptions.UnicodeDecodeError'> :
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbe in position 2851: ordinal not in range(128)

(The byte and position will almost certainly be different on other systems).

Resolution

Root Cause

This problem is caused by metadata in the packages being written in non-UTF-8 character sets, or binary data being written in the metadata, and being read as UTF-8.

One cause of this corruption is successive translations or reprocessing. A name with accented characters (e.g. Martin Večeřa) contains Unicode characters represented by the bytes 'C4 8D' (in hex) and 'C5 99'. If these are read as Windows-1252 characters, they become Ä (with 8D not representing any character in Windows-1252) and Å™ respectively. If this is then translated back to UTF-8 it becomes 'C3 84' and 'C3 85 E2 84 A2' respectively. This results in more characters in the output. Successive re-translations can continue to expand the name, until it overflows the predefined field limits for some names in change logs and author details within the Satellite database. When these codes get truncated, the UTF-8 decoding process hits a character it is not expecting (e.g. C3 not followed by a character between 80 and BF). Likewise, if the literal reading of C4 8D as Windows-1252 leaves the untranslatable byte 8D in place, and the UTF-8 decoding will complain about this byte being unexpected as bytes between 80 and BF must be preceded by a character from C0 and above.

Similar problems occur when reading non-text data - for example gzipped data - as literal text through the Unicode decoding process.

However, the causes of this data appearing in the package metadata, to be processed as Unicode, are not well understood. There are several possibilities:
* Data corruption by being processed as different character sets. This usually happens when the database is not using UTF-8 characters natively.
* Truncation or altering of package metadata when being read from the CDN. This can happen when a proxy is between the Satellite server and the Red Hat CDN, and either drops or stalls long connections (e.g. by doing virus scans of the gzipped files transferred by the CDN). It is irregular for data to be altered by a proxy but it is not impossible - e.g. a proxy might try to change the character set of embedded text.
* Because Red Hat uses a private certificate authority to secure communications between the CDN and its clients (see https://access.redhat.com/articles/1373143), some proxies or firewalls can interfere with the communications when they see an SSL certificate presented that the proxy or firewall does not recognise. In this case, the error message returned can be incorrectly interpreted as package metadata and incorrectly decoded.

Diagnostic Steps

  • Check if there is any proxy involved and if yes then check if it's injecting anything odd
  • Clear the sat-sync cache and try again
  rm -rf /var/cache/rhn/satsync
  • Sosreport and spacewalk-debug

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

1 Comments

This happened to us on Satellite 5.6 with an http proxy from Bluecoat. Bypassing the proxy helped to sync the channels, clearing the cache was not neccessary.