question marks - "?" in UTF-8 to ASCII conversion

  • Red Hat Enterprise Linux (RHEL) - all versions
  • iconv


Converting some UTF-8 characters to ASCII results in '?' characters, for example:

[test@box ~]$ iconv -f UTF-8 -t ASCII//TRANSLIT <<< 'I❤️ASCII ЯRавсде áčďéěíňóřšťú'
I?ASCII ?R????? acdeeinorstu


Translate the most specific characters first manually, e.g. using sed, then translate by iconvthose characters which are recognized by the tool. See following examples:

[test@box ~]$ echo 'I❤️ASCII ЯRавсде áčďéěíňóřšťú' | sed 'y/авсд/avsd/; s/❤️/(heart)/g; s/Я/JA/g; s/е/je/g; ' | iconv -f UTF-8 -t ASCII//TRANSLIT
I(heart)ASCII JARavsdje acdeeinorstu

More readable version of the above solution, easier to maintain:

[test@box ~]$ cat conversion.sed
# transliteration 1:1

# transliteration 1:n
# ...
# ...
# ...
[test@box ~]$ echo 'I❤️ASCII ЯRавсде áčďéěíňóřšťú' | sed -f conversion.sed | iconv -f UTF-8 -t ASCII//TRANSLIT
I(heart)ASCII JARavsdje acdeeinorstu

Root Cause

The iconv command converts those characters for which the transliteration is defined, as per iconv man page:

  • If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

