Skip to content

fix(mail): improve Japanese charset decoding reliability on Android#10883

Open
kaberingo wants to merge 1 commit intothunderbird:mainfrom
kaberingo:fix/japanese-charset-decoding
Open

fix(mail): improve Japanese charset decoding reliability on Android#10883
kaberingo wants to merge 1 commit intothunderbird:mainfrom
kaberingo:fix/japanese-charset-decoding

Conversation

@kaberingo
Copy link
Copy Markdown

Summary

This PR improves Japanese email rendering on Android by fixing several
charset handling issues in CharsetSupport and JisSupport.

Changes

CharsetSupport

  • Normalize common Shift-JIS aliases (shift-jis, sjis, ms932,
    windows-31j, x-sjis, x-ms-cp932) to shift_jis
  • Add EUC-JP fallback aliases (x-euc-jp, euc_jp) to the charset
    fallback map
  • Always route iso-2022-jp through Iso2022JpToShiftJisInputStream
    instead of relying on Android's ICU4J decoder, which silently
    mishandles QP-decoded byte sequences and causes ESC bytes or $B/(B
    escape remnants to appear as literal text
  • Auto-detect ISO-2022-JP when Content-Type has no charset parameter
    by scanning for ESC$B / ESC$@ escape sequences — Japanese feature
    phones and carrier webmail systems commonly omit the charset header,
    causing garbled output (e.g. $BJIC...) when decoded as US-ASCII

JisSupport

  • Implement getAddressFromReceivedHeader() (was a no-op stub) to
    correctly parse the recipient address from the for clause of
    Received headers, enabling proper JIS variant detection for DoCoMo,
    SoftBank, and KDDI carrier mail

Testing

Unit tests added/updated for all new logic:

  • CharsetSupportTest: Shift-JIS alias normalization, EUC-JP alias
    fallback, hasIso2022JpEscapeSequence() edge cases
  • MessageExtractorTest: QP-encoded ISO-2022-JP with/without charset
    header, multi-line bodies, raw 7-bit bodies
  • JisSupportTest (new): carrier address detection via From and
    Received headers, iPhone mailer detection

- Normalize Shift-JIS aliases (shift-jis, sjis, ms932, windows-31j, x-sjis,
  x-ms-cp932) to shift_jis for consistent handling
- Add EUC-JP fallback aliases (x-euc-jp, euc_jp) to the charset fallback map
- Always use Iso2022JpToShiftJisInputStream for iso-2022-jp decoding to bypass
  Android ICU4J's unreliable QP-decoded byte sequence handling, which could
  cause ESC bytes and "$B"/"(B" escape remnants to appear as literal text
- Auto-detect ISO-2022-JP when Content-Type has no charset parameter by
  scanning for ESC$B / ESC$@ escape sequences, fixing garbled output from
  Japanese feature phones and carrier webmail that omit the charset header
- Implement getAddressFromReceivedHeader() in JisSupport to properly extract
  addresses from the "for" clause of Received headers (both angle-bracket
  and bare address forms), enabling correct JIS variant detection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Missing report label. Set exactly one of: report: include, report: exclude OR report: highlight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants