Select Your Language

Infrastructure and Management

Cloud Computing

Storage

Runtimes

Integration and Automation

  • Comments
  • RHEL5 treats lone surrogate in UTF-8 text as legal!

    Posted on

    ON RHEL5 the behavior is wrong as there is no message for lone surrogate:
    $ echo -e 'P\xed\xa0\x80Q' | iconv -t UTF-8 >/dev/null
    $
    The 3 byte sequence ed-a0-80 is U+D800 which should be treated as illegal. But we do not get any message! The locale is
    LANG=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=

    On RHEL6 we get the expected message for same ( command & local as above):
    iconv: illegal input sequence at position 1

    Any one got clues and resolution for this case?

    by

    points

    Responses

    Red Hat LinkedIn YouTube Facebook X, formerly Twitter

    Quick Links

    Help

    Site Info

    Related Sites

    © 2026 Red Hat