Skip to content

feat: render w:noBreakHyphen tags (SD-2746)#3168

Open
luccas-harbour wants to merge 9 commits intomainfrom
luccas/sd-2746-feature-render-wnobreakhyphen-tags-as-visible-non-breaking
Open

feat: render w:noBreakHyphen tags (SD-2746)#3168
luccas-harbour wants to merge 9 commits intomainfrom
luccas/sd-2746-feature-render-wnobreakhyphen-tags-as-visible-non-breaking

Conversation

@luccas-harbour
Copy link
Copy Markdown
Contributor

Summary

Adds full support for OOXML <w:noBreakHyphen/> tags so non-breaking hyphens are preserved end-to-end (import β†’ schema β†’ adapter β†’ render β†’ export) and behave like the visible U+2011 character users expect β€” including in search, get-text, diffing, and the document-api text.rewrite operation.

The atom is selectable, deletable from either side via Backspace/Delete, and round-trips back to <w:noBreakHyphen/> on export.

Changes

Schema & extension

  • New noBreakHyphen PM node (atom, inline, non-selectable) β€” super-editor/src/editors/v1/extensions/no-break-hyphen/.
  • Schema.js surfaces PM's function-valued leafText NodeSpec field (without invoking it) so flattening APIs see the rendered glyph.

Import / export

  • v2 + v3 importers translate <w:noBreakHyphen/> to the new node.
  • v3 adds noBreakHyphen translator with co-located tests (translator + r-translator integration).
  • Exporter writes the atom back as <w:noBreakHyphen/>.

Layout / pm-adapter

  • New no-break-hyphen.ts inline converter emits a TextRun carrying U+2011 β€” DomPainter renders it through the existing text path with no painter changes.
  • paragraph.ts wires the converter into the run pipeline.

Editing behavior

  • backspaceAtomBefore command + keymap entry: handles all three caret positions around the atom (inside its run, between runs, at start of next run) on an opt-in allowlist (currently just noBreakHyphen) so existing bookmarkEnd behavior is preserved.
  • deleteAtomAfter command + keymap entry: mirrors the above for forward-delete when the caret sits inside the atom's wrapper run.
  • Without these, the keymap fell through to backspaceAcrossRuns, which intentionally skips non-text inline nodes and would delete the previous character instead of the hyphen.

Document-API / text helpers

  • textBetweenWithTabs honors leafText (with a fallback for leaves that don't define one) β€” search/get-text/diff now see U+2011 instead of the generic placeholder.
  • plan-engine/executor.ts charOffsetToDocPos now counts atomic inline leaves as leafText(node).length slots (and tabs as one slot), so text.rewrite lands edits at the correct PM position. Previously, rewriting "a‑b" β†’ "a‑c" replaced 'a' instead of 'b'.

The atom lives inside its own run wrapper after import, so "caret right after
the hyphen" resolves to one of three PM positions: inside the atom's run
(nodeBefore = atom), at paragraph level between runs (nodeBefore = the
wrapper run), or at the start of the next run (nodeBefore = null). Without
a handler for these, the keymap chain fell through to backspaceAcrossRuns,
whose findPreviousTextDeleteRange skips non-text inline nodes by design
(so bookmarkEnd markers survive backspace) β€” it then walked past the atom
and deleted the previous character instead, so the hyphen visually appeared
not to delete.

Adds backspaceAtomBefore, slotted before backspaceAcrossRuns in the keymap
chain. It handles all three caret positions and gates on an opt-in
allowlist (currently just noBreakHyphen) to preserve bookmarkEnd's existing
behavior.

Covered by tests/behavior/tests/basic-commands/sd-2746-no-break-hyphen-
backspace.spec.ts (three caret-position scenarios).
Inline leaf atoms like noBreakHyphen previously surfaced the generic
leafFallback placeholder when flattened, so search, get-text, and diff
consumers couldn't see the rendered character. Surface PM's `leafText`
NodeSpec field through the Schema builder, declare it on the
noBreakHyphen extension (β†’ U+2011), and honor it in textBetweenWithTabs
with a fallback for leaves that don't define one.
charOffsetToDocPos previously only counted text nodes, so character
offsets computed against `textBetweenWithTabs` (which emits '\t' for
tabs and the leafText glyph for atoms like noBreakHyphen) drifted past
those nodes and resolved to the wrong PM position. text.rewrite then
landed edits at the wrong boundary β€” e.g. rewriting "a‑b" β†’ "a‑c"
replaced 'a' instead of 'b'. Mirror the same accounting here: tabs
contribute one slot, inline leaves contribute `leafText(node).length`,
and offsets that fall strictly inside an atom resolve to the position
immediately after it. Add an integration spec covering round-tripping,
the prefix-match regression, and the symmetric suffix case.
Backspace already handles every caret position around the atom, but
forward delete was broken when the caret sat inside the atom's wrapper
run with the atom as nodeAfter β€” the whole keymap chain bailed and
nothing happened. Add a `deleteAtomAfter` command that mirrors
`backspaceAtomBefore`'s case 1 (remove the wrapper run when the atom is
its only child) and slot it into the Delete chain after
`deleteSkipEmptyRun`. Rename the spec file to cover both directions and
add tests for all three "before atom" caret positions.
@luccas-harbour luccas-harbour self-assigned this May 5, 2026
@luccas-harbour luccas-harbour requested a review from a team as a code owner May 5, 2026 16:24
@linear
Copy link
Copy Markdown

linear Bot commented May 5, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

The ecma-spec MCP tools require a permission grant I can't prompt you for. I'll proceed with my ECMA-376 knowledge β€” w:noBreakHyphen is a well-defined, simple element (Β§17.3.3.19) so this is low-risk.


Status: PASS

The w:noBreakHyphen implementation is spec-compliant. Here's what I checked:

Element identity β€” w:noBreakHyphen is a real OOXML element (Β§17.3.3.19, type CT_Empty). The translator correctly names it 'w:noBreakHyphen' and declares attributes: [], which matches spec exactly β€” CT_Empty carries no attributes and no children. Nothing invented here.

Export structure β€” The decoder wraps the element in <w:r><w:rPr/><w:noBreakHyphen/></w:r>. That's correct: w:noBreakHyphen is run-level content, w:r is its required parent, and w:rPr must precede run content in the content model. The unshift(rPrNode) call preserves that ordering.

Void element handling β€” w:noBreakHyphen is self-closing. The internal representation { name: 'w:noBreakHyphen', elements: [] } is fine as long as the XML serializer emits <w:noBreakHyphen/> rather than <w:noBreakHyphen></w:noBreakHyphen> β€” both are valid XML, but the former is conventional for empty elements.

Unicode value β€” U+2011 (‑) is the correct code point for NON-BREAKING HYPHEN. The spec defines w:noBreakHyphen as representing exactly this character.

One non-spec note worth flagging β€” Several test assertions in no-break-hyphen-translator.test.js compare against { name: 'w:noBreakHyphen' } (no elements key), but the decoder always returns { name: 'w:noBreakHyphen', elements: [] }. Vitest's toEqual treats these as unequal, so those assertions would fail at runtime. For example, line 96 and line 91 in the test file. This is a test-correctness issue, not a spec violation β€” but it means the test suite isn't actually verifying the round-trip contract it claims to cover.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a11255733

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

βœ… All modified and coverable lines are covered by tests.

πŸ“’ Thoughts on this report? Let us know!

@caio-pizzol caio-pizzol self-assigned this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants