Skip to content

Commit c1c66e7

Browse files
committed
Strip Unicode Cf characters in PrintableString
`PrintableString` is the sanitiser LDK uses to render untrusted strings (node aliases, BOLT-12 invoice / offer text, `UntrustedString`, LSPS messages, `lightning-invoice` descriptions) to logs and UI. It only replaced `char::is_control` matches (Unicode general category `Cc`) with U+FFFD, leaving the entire `Cf` (Format) category untouched. That is the exact category covering the bidirectional override / isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack family (CVE-2021-42574): a peer can set its alias / invoice description / offer fields to e.g. `safe\u{202E}cipsxe.exe`, which previously passed through verbatim while a human reader sees `safeexe.cips` — defeating the threat model `PrintableString` exists to defend against. Replace `Cf` codepoints alongside `Cc` ones. The `Cf` ranges are inlined as a `matches!` table sourced from Unicode 16.0 to keep the change `no_std`-friendly with no new dependencies. Co-Authored-By: HAL 9000
1 parent 1a26867 commit c1c66e7

1 file changed

Lines changed: 53 additions & 1 deletion

File tree

lightning-types/src/string.rs

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,50 @@ impl<'a> fmt::Display for PrintableString<'a> {
3131
fn fmt(&self, f: &mut fmt::Formatter) -> Result<(), fmt::Error> {
3232
use core::fmt::Write;
3333
for c in self.0.chars() {
34-
let c = if c.is_control() { core::char::REPLACEMENT_CHARACTER } else { c };
34+
let c = if c.is_control() || is_format_char(c) {
35+
core::char::REPLACEMENT_CHARACTER
36+
} else {
37+
c
38+
};
3539
f.write_char(c)?;
3640
}
3741

3842
Ok(())
3943
}
4044
}
4145

46+
// Codepoints in Unicode general category `Cf` (Format), per Unicode 16.0. These are not matched
47+
// by `char::is_control` (which only covers `Cc`), but include the bidirectional override / isolate
48+
// controls (e.g. U+202E RLO) and zero-width characters behind the "Trojan Source" attack family
49+
// (CVE-2021-42574), where an attacker-supplied string renders to a human reader as something other
50+
// than its byte content. Strip them alongside `Cc` characters when sanitising untrusted input.
51+
fn is_format_char(c: char) -> bool {
52+
matches!(
53+
c as u32,
54+
0x00AD
55+
| 0x0600..=0x0605
56+
| 0x061C
57+
| 0x06DD
58+
| 0x070F
59+
| 0x0890..=0x0891
60+
| 0x08E2
61+
| 0x180E
62+
| 0x200B..=0x200F
63+
| 0x202A..=0x202E
64+
| 0x2060..=0x2064
65+
| 0x2066..=0x206F
66+
| 0xFEFF
67+
| 0xFFF9..=0xFFFB
68+
| 0x110BD
69+
| 0x110CD
70+
| 0x13430..=0x13440
71+
| 0x1BCA0..=0x1BCA3
72+
| 0x1D173..=0x1D17A
73+
| 0xE0001
74+
| 0xE0020..=0xE007F
75+
)
76+
}
77+
4278
#[cfg(test)]
4379
mod tests {
4480
use super::PrintableString;
@@ -50,4 +86,20 @@ mod tests {
5086
"I \u{1F496} LDK!\u{FFFD}\u{26A1}",
5187
);
5288
}
89+
90+
#[test]
91+
fn sanitizes_unicode_bidi_override_characters() {
92+
// U+202E RIGHT-TO-LEFT OVERRIDE and friends are Unicode general category
93+
// `Cf` (Format), not `Cc` (Control). They enable "Trojan Source" /
94+
// bidi-spoofing attacks where an attacker-supplied string (e.g. a node
95+
// alias gossiped from a peer) renders to a human reader as something
96+
// other than its byte content. `PrintableString` is the sanitiser used
97+
// for exactly these untrusted strings, so it must replace them.
98+
let rendered = format!("{}", PrintableString("safe\u{202E}cipsxe.exe"));
99+
assert!(
100+
!rendered.contains('\u{202E}'),
101+
"PrintableString left a U+202E RLO override in its output: {:?}",
102+
rendered
103+
);
104+
}
53105
}

0 commit comments

Comments
 (0)