Skip to content

[codex] Improve Unicode width test coverage#13

Open
tonyfettes wants to merge 1 commit into
masterfrom
improve-coverage
Open

[codex] Improve Unicode width test coverage#13
tonyfettes wants to merge 1 commit into
masterfrom
improve-coverage

Conversation

@tonyfettes
Copy link
Copy Markdown
Contributor

@tonyfettes tonyfettes commented Jun 1, 2026

Summary

  • Add focused regression cases for generated Unicode width state-machine branches.
  • Cover control-line-break handling, Hebrew ligature-transparent code points, emoji tag edge cases, and Kirat Rai vowel-sign width behavior.
  • Leave the non-RGI keycap-ZWJ policy case out of this coverage-only PR.

Impact

This is test-only. It does not change the public API or runtime behavior.

Validation

  • moon fmt
  • moon info
  • moon check
  • moon test
  • moon coverage analyze -- -f summary -> Total: 358/384

Split out

The transparent zero-width ligature bug fix and its U+0605 regression tests were moved to #14.

@tonyfettes tonyfettes marked this pull request as ready for review June 1, 2026 09:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84adab0be5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unicodewidth_test.mbt Outdated
Comment on lines +325 to +326
assert_str_width("\u{644}\u{605}\u{623}", 1, 1)
assert_str_width("\u{6B8}\u{605}\u{627}", 1, 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct U+0605 lam-alef expectations

For sequences that contain U+0605 between Lam and Alef, this test locks in the ligature width of 1 even though U+0605 is explicitly generated in non_transparent_zero_widths (tables.mbt) from zero_widths - DerivedJoiningType=T (scripts/unicode.py::load_non_transparent_zero_widths). In that context U+0605 should interrupt the Lam-Alef ligature while contributing zero width itself, so these cases should total 2; leaving them at 1 will make a correct fix to the transparent-character handling fail the regression suite.

Useful? React with 👍 / 👎.

@tonyfettes tonyfettes changed the title [codex] Improve Unicode width test coverage [codex] Fix transparent zero-width ligature handling Jun 1, 2026
@tonyfettes tonyfettes changed the title [codex] Fix transparent zero-width ligature handling [codex] Improve Unicode width test coverage Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84adab0be5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unicodewidth_test.mbt Outdated
assert_str_width("🧑‍🤝‍🧑", 2, 2)
assert_str_width("🇮🇱🕊️🇵🇸", 6, 6)
assert_str_width("🇵🇸\u{200D}🕊️\u{200D}🇮🇱", 2, 2)
assert_str_width("#\u{FE0F}\u{20E3}\u{200D}Ⓜ️", 3, 3)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expect keycap ZWJ sequences to collapse

This assertion locks in the current width of #️⃣‍Ⓜ️ as 3, but the generated state machine already has a dedicated VS16_KEYCAP_ZWJ_EMOJI_PRESENTATION branch for \uFE0F \u20E3 \u200D Emoji_Presentation that should collapse the ASCII keycap base into the ZWJ emoji sequence; it is currently bypassed because # takes the c <= '\u{A0}' early return before that branch. With this expectation in place, making the existing keycap-ZWJ branch reachable for #, *, and digits would fail the regression suite instead of producing the intended width 2.

Useful? React with 👍 / 👎.

@tonyfettes tonyfettes force-pushed the improve-coverage branch 2 times, most recently from 35833e7 to 916ac29 Compare June 1, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant