Skip to content

Commit 2e55bdd

Browse files
fix(scripts): encode residual JSON control chars as \uXXXX instead of stripping (#1872)
* fix(scripts): encode residual control chars as \uXXXX instead of stripping json_escape() was silently deleting control characters (U+0000-U+001F) that were not individually handled (\n, \t, \r, \b, \f). Per RFC 8259, these must be encoded as \uXXXX sequences to preserve data integrity. Replace the tr -d strip with a char-by-char loop that emits proper \uXXXX escapes for any remaining control characters. * fix(scripts): address Copilot review on json_escape control char loop - Set LC_ALL=C for the entire loop (not just printf) so that ${#s} and ${s:$i:1} operate on bytes deterministically across locales - Fix comment: U+0000 (NUL) cannot exist in bash strings, range is U+0001-U+001F; adjust code guard accordingly (code >= 1) - Emit directly to stdout instead of accumulating in a variable, avoiding quadratic string concatenation on longer inputs * perf(scripts): use printf -v to avoid subshell in json_escape loop Replace code=$(printf ...) with printf -v code to assign the character code without spawning a subshell on every byte, reducing overhead for longer inputs.
1 parent eecb723 commit 2e55bdd

File tree

1 file changed

+15
-3
lines changed

1 file changed

+15
-3
lines changed

scripts/bash/common.sh

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -171,9 +171,21 @@ json_escape() {
171171
s="${s//$'\r'/\\r}"
172172
s="${s//$'\b'/\\b}"
173173
s="${s//$'\f'/\\f}"
174-
# Strip remaining control characters (U+0000–U+001F) not individually escaped above
175-
s=$(printf '%s' "$s" | tr -d '\000-\007\013\016-\037')
176-
printf '%s' "$s"
174+
# Escape any remaining U+0001-U+001F control characters as \uXXXX.
175+
# (U+0000/NUL cannot appear in bash strings and is excluded.)
176+
# LC_ALL=C ensures ${#s} counts bytes and ${s:$i:1} yields single bytes,
177+
# so multi-byte UTF-8 sequences (first byte >= 0xC0) pass through intact.
178+
local LC_ALL=C
179+
local i char code
180+
for (( i=0; i<${#s}; i++ )); do
181+
char="${s:$i:1}"
182+
printf -v code '%d' "'$char" 2>/dev/null || code=256
183+
if (( code >= 1 && code <= 31 )); then
184+
printf '\\u%04x' "$code"
185+
else
186+
printf '%s' "$char"
187+
fi
188+
done
177189
}
178190

179191
check_file() { [[ -f "$1" ]] && echo "$2" || echo "$2"; }

0 commit comments

Comments
 (0)