Skip to content

fix(textkit): skip hyphen insertion for CJK characters in line breaking#3315

Open
matangot wants to merge 1 commit intodiegomura:masterfrom
matangot:fix/cjk-line-break-hyphenation
Open

fix(textkit): skip hyphen insertion for CJK characters in line breaking#3315
matangot wants to merge 1 commit intodiegomura:masterfrom
matangot:fix/cjk-line-break-hyphenation

Conversation

@matangot
Copy link
Copy Markdown

@matangot matangot commented Mar 26, 2026

Summary

CJK (Chinese/Japanese) text gets incorrectly hyphenated with - dashes at line breaks. This happens because breakLines unconditionally inserts a hyphen glyph when breaking at a penalty node, which is correct for Latin scripts but wrong for CJK text where characters wrap naturally without hyphens.

Before

image

After

image

The Problem

CJK text has no spaces between characters, so the entire text block becomes a single "word" in the layout engine. When a user's hyphenationCallback splits this into individual characters (the correct approach for CJK), breakLines inserts a - hyphen at every line break point — because it assumes all penalty-node breaks are Latin-style hyphenation.

The Fix

In breakLines, check whether the character at the break point is in the CJK Unicode range. If so, skip the hyphen glyph insertion.

const charAtBreak = attributedString.string.charAt(end - 1);
if (!CJK_RANGE.test(charAtBreak)) {
  line = insertGlyph(line.string.length, HYPHEN, line);
}

Where CJK_RANGE = /[\u3000-\u9fff\uf900-\ufaff\uff00-\uffef]/

This covers:

  • Japanese: Hiragana, Katakana, Kanji
  • Chinese: CJK Unified Ideographs
  • CJK punctuation and fullwidth forms

Korean (Hangul) is excluded as it uses spaces between words and doesn't hit this code path in practice.

Notes

A complete CJK line breaking solution would also include kinsoku (禁則処理) rules in the layout engine — preventing punctuation like from starting a new line. That's currently left to userland via hyphenationCallback but could be a follow-up enhancement.

CJK (Chinese/Japanese) text gets incorrectly hyphenated with `-` dashes
at line breaks. breakLines unconditionally inserts a hyphen glyph when
breaking at a penalty node, which is correct for Latin scripts but wrong
for CJK text where characters wrap naturally without hyphens.

Check whether the character at the break point is in the CJK Unicode
range and skip hyphen insertion if so.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 26, 2026

⚠️ No Changeset found

Latest commit: 4dc56f0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant