RHSB-2021-007 Trojan source attacks (CVE-2021-42574,CVE-2021-42694)

Public Date: November 1, 2021, 12:00 am
Updated -
Ongoing Status
Moderate Impact

Red Hat is aware of a new type of attack scenario concerning development environments where the text displayed to the end-user doesn’t match the expectation of what is executed. These issues are assigned CVE-2021-42574 and CVE-2021-42694. Both flaws have a severity impact rating of Moderate.

These issues are not flaws within Red Hat products and have been branded as ‘Trojan Source’ attacks. The flaws exist due to the way Unicode standards are implemented within the context of development environments, which have specialized requirements for rendering text. 

Package updates issued by Red Hat serve as a possible workaround to mitigate this issue. Red Hat is providing an example script to detect these issues in your own development environments. In addition, Red Hat is working with upstream researchers and communities to address these issues in various developer tools.

Text-encoding schemes must support both left-to-right and right-to-left languages. When mixing language scripts with different display orders, the Bidirectional (BiDi) Algorithm in Unicode provides a way to successfully achieve this. However, it can cause confusion when used in source code. What is rendered and displayed to a human reviewer can be ambiguous and easily mistaken for similar looking non-BiDi code, which executes differently.

For example, an attacker could exploit this to deceive a human reviewer by creating a malicious patch containing well-placed BiDi characters. The special handling and rendering of those characters can be then used in an attempt to hide unexpected and potentially dangerous behavior from the reviewer. 

This type of attack could pose a threat to code repositories and build pipelines. No related defects have been found in commonly used development tools (compilers, editors, source code management). Red Hat is working with upstream communities to add appropriate diagnostics in the commonly used development tools to identify problematic BiDi sequences. 

A related issue is with homoglyphs, which are glyphs (An elemental symbol within an agreed set of symbols, intended to represent a readable character for the purpose of writing) that appear identical or similar to each other. Homoglyphs can also be used to modify source code invisibly and perform targeted attacks.

CVE-2021-42574

A flaw was found in the way Unicode standards are implemented in the context of development environments, which have specialized requirements for rendering text. An attacker could exploit this to deceive a human reviewer by creating a malicious patch containing well-placed BiDi characters. The special handling and rendering of those characters can be then used in an attempt to hide unexpected and potentially dangerous behavior from the reviewer.

Unicode’s Directional Formatting Characters (‘BiDi’) are invisible characters that switch the display ordering of one or more characters. BiDi overrides cause characters to display in a different order from that in which they are written.

Most programming languages do not tolerate arbitrary control characters in source code, as they violate the language syntax causing a syntax error. Some languages also exclude the use of some special characters in variable names. However, BiDi characters can be injected in comments and some strings in most languages. 

BiDi characters persist through copy-and-paste operations on most modern systems. There is an ongoing concern of introducing vulnerabilities by copy-pasting code from untrusted sources. Manual review may not be sufficient if the change in logic is subtle enough to go undetected. 

CVE-2021-42694

A flaw was found in the way Unicode standards are implemented in the context of development environments, which have specialized requirements for rendering text. Homoglyphs are different Unicode characters that, to the naked eye, look the same.  An attacker could use homoglyphs to deceive a human reviewer by creating a malicious patch containing functions that look similar to standard library functions, such as print, but replace one character with a homoglyph. This function can then be defined in an upstream dependency to launch source code-related attacks.

This is not a flaw found within Red Hat products.

CVE-2021-42694 has been known for some time and presents a less serious attack vector than CVE-2021-42574. Various upstream projects have been known to work on the homoglyphs issue for the last several years and are currently work under progress.

At this time, Red Hat is conducting internal scanning of our Products and Services, and thus far we have not detected any malicious unicode characters in code for this class of vulnerabilities. Red Hat is also putting in place detection checks in our supply chain for our products that are scanned for unnatural Unicode sequences in code. Red Hat is working with upstream communities to add appropriate diagnostics in the commonly used development tools to identify problematic BiDi sequences. 

Note: This is a flaw with the way Unicode standards are implemented in the context of development environments, which have specialized requirements for rendering text. It is not a flaw in Red Hat products. Package updates issued by Red Hat serve as a possible workaround to mitigate this issue.

A diagnostic script has been developed to determine if your code has BiDi characters present. To verify the authenticity of the script, you can download the detached GPG signature as well. Instructions on how to use GPG signature for verification are available on the Customer Portal.

Note: Red Hat is providing this script as-is to our customers and community. There is no provision for support for its usage or assessment of output. Refer to the README file on usage of the script. It is left to the customer’s discretion to analyze the use of the said BiDi character and make a decision regarding possible malicious use. The technical whitepaper released by the researchers provides in-depth details on BiDi characters.

Alternative Methods to detect BiDi characters in source code:

Some text editors provide hints of the presence of BiDi characters in source code through a combination of syntax highlighting, reversing the direction of text scrolling, and printing the control characters, especially those that reverse direction in a single word. The cat command provides the -A and -v flags to visualize non-printable characters, thus making the presence of BiDi control sequences known.

Red Hat acknowledges Nicholas Boucher and Ross Anderson of University of Cambridge for responsibly reporting this issue.

Q: Does this flaw affect Red Hat products?

A: No, it is not a flaw in Red Hat products. It's a flaw with the way Unicode standards are implemented in the context of development environments, which have specialized requirements for rendering text.

Q: Does this flaw affect the Red Hat supply chain?

A: Red Hat is putting in place ongoing detection checks in our product supply chain to scan for unnatural Unicode sequences in code.

Q: How does Red Hat plan to fix this issue?

A: This is not a flaw in Red Hat products, but rather a design issue which may be misused to introduce malicious code to upstream code. Red Hat is currently working with various upstream projects to design and implement diagnostic and auditing capabilities in different components which would best mitigate the issue.

Q: I have an older version of Red Hat Enterprise Linux (6 or earlier), how do I scan my code for this issue?

A: Use the following bash command line. This is not as powerful as the script enclosed in this bulletin but has basic capabilities:

 $grep -r $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]' /path/to/source

If you see any output, then you need to check your code.

https://trojansource.codes/ 

https://www.lightbluetouchpaper.org/2021/11/01/trojan-source-invisible-vulnerabilities/ 

https://www.unicode.org/reports/tr36/#Canonical_Represenation 

https://www.unicode.org/reports/tr36/#Bidirectional_Text_Spoofing 

https://www.unicode.org/reports/tr39/