Skip to content

fix: MIME Decoding corrupts non-ASCII characters in Base64-encoded words#2291

Merged
GCHQDeveloper581 merged 2 commits into
gchq:masterfrom
williballenthin:fix-2280
Jun 20, 2026
Merged

fix: MIME Decoding corrupts non-ASCII characters in Base64-encoded words#2291
GCHQDeveloper581 merged 2 commits into
gchq:masterfrom
williballenthin:fix-2280

Conversation

@williballenthin

Copy link
Copy Markdown
Contributor

fromBase64() defaults to returning a UTF-8 decoded string, which is then passed to codepage.utils.decode() that treats each char code as a raw byte.
For multi-byte UTF-8 characters, this double-decoding produces garbage (e.g. "café" becomes "caf退").

Pass returnType="byteArray" so codepage receives raw bytes and performs the single correct UTF-8 decode.

Closes #2280

AI disclosure
Claude Code Opus 4.6

fromBase64() defaults to returning a UTF-8 decoded string, which is then
passed to codepage.utils.decode() that treats each char code as a raw
byte.
For multi-byte UTF-8 characters, this double-decoding produces garbage
(e.g. "café" becomes "caf退").

Pass returnType="byteArray" so codepage receives raw bytes and performs
the single correct UTF-8 decode.

Closes gchq#2280

@GCHQDeveloper581 GCHQDeveloper581 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Thanks for your contribution.

@GCHQDeveloper581 GCHQDeveloper581 merged commit 64fc664 into gchq:master Jun 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug report: MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words

2 participants