Skip to content

Commit 03377a4

Browse files
fix(scripts): encode residual control chars as \uXXXX instead of stripping
json_escape() was silently deleting control characters (U+0000-U+001F) that were not individually handled (\n, \t, \r, \b, \f). Per RFC 8259, these must be encoded as \uXXXX sequences to preserve data integrity. Replace the tr -d strip with a char-by-char loop that emits proper \uXXXX escapes for any remaining control characters.
1 parent 9c0c144 commit 03377a4

1 file changed

Lines changed: 15 additions & 3 deletions

File tree

scripts/bash/common.sh

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -171,9 +171,21 @@ json_escape() {
171171
s="${s//$'\r'/\\r}"
172172
s="${s//$'\b'/\\b}"
173173
s="${s//$'\f'/\\f}"
174-
# Strip remaining control characters (U+0000–U+001F) not individually escaped above
175-
s=$(printf '%s' "$s" | tr -d '\000-\007\013\016-\037')
176-
printf '%s' "$s"
174+
# Escape any remaining U+0000-U+001F control characters as \uXXXX.
175+
# Only single-byte characters can be JSON control chars; multi-byte UTF-8
176+
# sequences have first-byte values >= 0xC0 and are never control characters.
177+
local i char code
178+
local out=""
179+
for (( i=0; i<${#s}; i++ )); do
180+
char="${s:$i:1}"
181+
code=$(LC_ALL=C printf '%d' "'$char" 2>/dev/null || echo 256)
182+
if (( code >= 0 && code <= 31 )); then
183+
out+=$(printf '\\u%04x' "$code")
184+
else
185+
out+="$char"
186+
fi
187+
done
188+
printf '%s' "$out"
177189
}
178190

179191
check_file() { [[ -f "$1" ]] && echo "$2" || echo "$2"; }

0 commit comments

Comments
 (0)