Skip to content

Bug report: MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words #2280

@williballenthin

Description

@williballenthin

Describe the bug
The MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words. café becomes caf退.

src/core/operations/MIMEDecoding.mjs, inside the =?charset?B?...?= handling branch:

text = (0, _Base.fromBase64)(text);
// ...
return _codepage.default.utils.decode(65001, encodedText);

fromBase64 defaults its returnType parameter to "string", which calls byteArrayToUtf8 on the decoded bytes before returning. The result is a UTF-8 decoded string (e.g. "café" — 4 characters). This string is then passed to codepage.utils.decode(65001, ...), which splits it into characters and maps each via charCodeAt(0), treating each code point as a raw byte value. For multi-byte UTF-8 characters, the code point no longer matches the original bytes, so the second UTF-8 decode produces garbage.

Concretely for =?UTF-8?B?Y2Fmw6k=?=:

  1. Base64 decodes to bytes [99, 97, 102, 195, 169] (UTF-8 for "café")
  2. fromBase64 returns the string "café" (5 bytes → 4 chars)
  3. codepage.decode(65001, "café") splits into char codes [99, 97, 102, 233]
  4. 233 (0xE9) is treated as a UTF-8 lead byte → decoded as part of a 3-byte sequence → produces U+9000 (退)

To Reproduce
https://gchq.github.io/CyberChef/#recipe=MIME_Decoding()&input=U3ViamVjdDogPT9VVEYtOD9CP1kyRm13Nms9Pz0

Expected: Subject: café. Actual: Subject: caf退.

Additional context
Suggested fix — pass "byteArray" as the returnType so fromBase64 returns raw bytes instead of a decoded string:

- text = (0, _Base.fromBase64)(text);
+ text = (0, _Base.fromBase64)(text, undefined, "byteArray");

This makes codepage.decode(65001, ...) receive a Uint8Array directly, bypassing the charCodeAt string path. The UTF-8 decoding then happens exactly once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions