[codex] Improve Unicode width test coverage by tonyfettes · Pull Request #13 · moonbit-community/unicodewidth.mbt

tonyfettes · 2026-06-01T09:17:15Z

Summary

Add focused regression cases for generated Unicode width state-machine branches.
Cover control-line-break handling, Hebrew ligature-transparent code points, emoji tag edge cases, and Kirat Rai vowel-sign width behavior.
Leave the non-RGI keycap-ZWJ policy case out of this coverage-only PR.

Impact

This is test-only. It does not change the public API or runtime behavior.

Validation

moon fmt
moon info
moon check
moon test
moon coverage analyze -- -f summary -> Total: 358/384

Split out

The transparent zero-width ligature bug fix and its U+0605 regression tests were moved to #14.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84adab0be5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T09:20:05Z

+  assert_str_width("\u{644}\u{605}\u{623}", 1, 1)
+  assert_str_width("\u{6B8}\u{605}\u{627}", 1, 1)


Correct U+0605 lam-alef expectations

For sequences that contain U+0605 between Lam and Alef, this test locks in the ligature width of 1 even though U+0605 is explicitly generated in non_transparent_zero_widths (tables.mbt) from zero_widths - DerivedJoiningType=T (scripts/unicode.py::load_non_transparent_zero_widths). In that context U+0605 should interrupt the Lam-Alef ligature while contributing zero width itself, so these cases should total 2; leaving them at 1 will make a correct fix to the transparent-character handling fail the regression suite.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84adab0be5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T10:47:30Z

  assert_str_width("🧑‍🤝‍🧑", 2, 2)
  assert_str_width("🇮🇱🕊️🇵🇸", 6, 6)
  assert_str_width("🇵🇸\u{200D}🕊️\u{200D}🇮🇱", 2, 2)
+  assert_str_width("#\u{FE0F}\u{20E3}\u{200D}Ⓜ️", 3, 3)


Expect keycap ZWJ sequences to collapse

This assertion locks in the current width of #️⃣‍Ⓜ️ as 3, but the generated state machine already has a dedicated VS16_KEYCAP_ZWJ_EMOJI_PRESENTATION branch for \uFE0F \u20E3 \u200D Emoji_Presentation that should collapse the ASCII keycap base into the ZWJ emoji sequence; it is currently bypassed because # takes the c <= '\u{A0}' early return before that branch. With this expectation in place, making the existing keycap-ZWJ branch reachable for #, *, and digits would fail the regression suite instead of producing the intended width 2.

Useful? React with 👍 / 👎.

tonyfettes marked this pull request as ready for review June 1, 2026 09:17

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

tonyfettes changed the title ~~[codex] Improve Unicode width test coverage~~ [codex] Fix transparent zero-width ligature handling Jun 1, 2026

tonyfettes force-pushed the improve-coverage branch from 6a70db0 to 84adab0 Compare June 1, 2026 10:44

tonyfettes changed the title ~~[codex] Fix transparent zero-width ligature handling~~ [codex] Improve Unicode width test coverage Jun 1, 2026

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

tonyfettes force-pushed the improve-coverage branch 2 times, most recently from 35833e7 to 916ac29 Compare June 1, 2026 11:29

test: cover Unicode width edge cases

3e71662

tonyfettes force-pushed the improve-coverage branch from 916ac29 to 3e71662 Compare June 1, 2026 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Improve Unicode width test coverage#13

[codex] Improve Unicode width test coverage#13
tonyfettes wants to merge 1 commit into
masterfrom
improve-coverage

tonyfettes commented Jun 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		assert_str_width("\u{644}\u{605}\u{623}", 1, 1)
		assert_str_width("\u{6B8}\u{605}\u{627}", 1, 1)

Conversation

tonyfettes commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact

Validation

Split out

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonyfettes commented Jun 1, 2026 •

edited

Loading