Skip to content

reject unpaired high surrogate in wchar_t string formatting#2101

Open
nabhan06 wants to merge 1 commit into
abseil:masterfrom
nabhan06:wide-string-surrogate-utf8
Open

reject unpaired high surrogate in wchar_t string formatting#2101
nabhan06 wants to merge 1 commit into
abseil:masterfrom
nabhan06:wide-string-surrogate-utf8

Conversation

@nabhan06

@nabhan06 nabhan06 commented Jul 4, 2026

Copy link
Copy Markdown

The wide-string overload of ConvertStringArg in str_format runs each wchar_t through WideToUtf8 but never inspects the ShiftState after the loop, so a wide string ending on an unpaired UTF-16 high surrogate has only the first two bytes of a 4-byte sequence written while StrFormat("%ls", ...) still reports success, emitting truncated invalid UTF-8. The single-character path in ConvertWCharTImpl already rejects a lone high surrogate, so this adds the same saw_high_surrogate check after the loop to make the string path fail the same way. The regression test in convert_test.cc covers both a trailing high surrogate and a valid non-BMP code point so surrogate pairs keep working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant