Skip to content
This repository was archived by the owner on May 27, 2026. It is now read-only.

Fix UTF8FirstLetterNumBytes to handle malformed UTF-8 correctly#1379

Merged
rodaine merged 1 commit into
bufbuild:mainfrom
kodareef5:fix-utf8len-malformed
May 27, 2026
Merged

Fix UTF8FirstLetterNumBytes to handle malformed UTF-8 correctly#1379
rodaine merged 1 commit into
bufbuild:mainfrom
kodareef5:fix-utf8len-malformed

Conversation

@kodareef5
Copy link
Copy Markdown
Contributor

UTF8FirstLetterNumBytes in validate/validate.h returns the byte count from OneCharLen without validating that the expected continuation bytes actually follow the leader byte. Malformed UTF-8 causes Utf8Len to undercount characters by 2-4x, bypassing string length validation constraints (min_len, max_len, len).

Example: 20 bytes of \xC0 (bare 2-byte leaders, no continuations) produces Utf8Len=10 instead of 20. A field with max_len = 10 incorrectly accepts this input.

This is exploitable when C++ protobuf deserialization doesn't enforce UTF-8 validity (the default), allowing malformed strings to reach pgv validation.

Fix:

  • Clamp consumed bytes to remaining buffer length (prevents reading past end)
  • Validate continuation bytes have the 10xxxxxx pattern
  • Return 1 for any invalid byte sequence (count as single character)

Valid UTF-8 counting is unchanged: "hello"=5, "café"=4, "你好"=2, "😀😀"=2.

UTF8FirstLetterNumBytes returns the byte count from OneCharLen
without validating that the expected continuation bytes actually
follow the leader byte. Malformed UTF-8 (e.g., bare leader bytes
without continuations) causes Utf8Len to undercount characters
by 2-4x, bypassing string length validation constraints.

For example, 20 bytes of 0xC0 (invalid 2-byte leaders) produces
Utf8Len=10 instead of 20, allowing a max_len=10 constraint to
accept 20 bytes of data.

Fix:
- Clamp consumed bytes to remaining buffer length
- Validate continuation bytes have the 10xxxxxx pattern
- Return 1 for any invalid byte (count as single character)

Valid UTF-8 counting is unchanged.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 22, 2026

CLA assistant check
All committers have signed the CLA.

@rodaine rodaine closed this May 27, 2026
@rodaine rodaine reopened this May 27, 2026
Copy link
Copy Markdown
Member

@rodaine rodaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

@rodaine rodaine merged commit 7dfcb22 into bufbuild:main May 27, 2026
6 of 7 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants