Skip to content

fix(super-converter): bibliography/index/TOA field-code import fidelity (SD-3066)#3565

Merged
harbournick merged 6 commits into
tadeu/sd-3005-feature-bibliographyfrom
tadeu/sd-3066-field-code-import
May 31, 2026
Merged

fix(super-converter): bibliography/index/TOA field-code import fidelity (SD-3066)#3565
harbournick merged 6 commits into
tadeu/sd-3005-feature-bibliographyfrom
tadeu/sd-3066-field-code-import

Conversation

@tupizz
Copy link
Copy Markdown
Contributor

@tupizz tupizz commented May 29, 2026

Note

📚 Stacked PRs — review bottom-up

Order PR Scope
1️⃣ #3538 Single-paragraph BIBLIOGRAPHY/INDEX/TOA crash (SD-3005)
2️⃣ #3565 super-editor — field-code import fidelity + DRY 👈 this PR
3️⃣ #3566 pm-adapter — render fields inside documentPartObject

Review order: #3538#3565#3566. Each PR's base auto-retargets to main as the one below it merges.


Summary

Real Word documents (a marked-citations TOA doc and a native Insert→Bibliography doc) surfaced several import defects in the block field-code family. This fixes them and unifies the shared shape behind one set of helpers (DRY/KISS).

Stacked on #3538. Base will retarget to main once #3538 merges. Rendering of these fields is in the follow-up PR (pm-adapter).

Linear: SD-3066 (parent of SD-3005)

Fixes

  • Multi-run instruction corruption — fragments were joined with an injected separator space, mangling instructions Word splits across runs (XE " Building Standard "). Join verbatim.
  • Table of Authorities dropped on import — no sd:tableOfAuthorities v2 importer existed, so the node was silently discarded. Registered tableOfAuthoritiesImporter alongside index/bibliography.
  • Crash on nested-SDT bibliography — a content control wrapping a block field imported as inline structuredContent; inside a block-only documentPartObject this threw "Invalid content for node type documentPartObject" and the editor failed to mount. Classify an SDT whose content is a block field as structuredContentBlock.
  • Bibliography token round-trip — bibliography neither captured nor replayed instructionTokens (unlike index/toa), so split BIBLIOGRAPHY instructions collapsed on export. Wired through preprocessor → extension attr → encode/decode.

Refactors (DRY/KISS)

  • buildBlockFieldNode — shared by the bibliography/index/toa preprocessors.
  • wrapParagraphsAsComplexField — shared by their translator decoders (mirrors the existing inline-field helpers).
  • BLOCK_FIELD_XML_NAMES — one source of truth for the paragraph importer + SDT classifier.

Test plan

  • RED→GREEN unit tests for each fix + the new shared helpers
  • pnpm --filter super-editor exec vitest run src/editors/v1/core/super-converter — 2990 pass
  • Browser-verified: the TOA doc and the bibliography doc now import without crashing

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 29, 2026

SD-3066

@github-actions
Copy link
Copy Markdown
Contributor

The ecma-spec MCP tools require permission grants that are being declined in this session, so I couldn't run the live spec lookups. I reviewed the OOXML surface against ECMA-376 from knowledge instead — and importantly, this PR's spec-relevant footprint is small, since most of the churn is in SuperDoc's internal sd:* representation (not OOXML).

Status: PASS

Here's what I checked and why it's clean:

Real OOXML elements/attributes touched — all valid and used correctly:

  • w:fldChar with w:fldCharType set to begin / separate / end (in build-block-field-paragraphs.js and the translators). These are exactly the three ST_FldCharType values, and the begin → instruction → separate → result → end ordering produced by wrapParagraphsAsComplexField is the correct complex-field structure (spec).
  • w:fldSimple with its w:instr attribute (tableOfAuthoritiesImporter.test.js / new fldSimple INDEX test). w:instr is the required attribute on CT_SimpleField and it's present (spec).
  • w:instrText carrying field code text, including xml:space="preserve" on split instruction runs — correct, since the SD-3066 fix relies on preserving literal whitespace that Word stores inside each run (spec).
  • w:r, w:p, w:pPr, w:t, w:tab — all used in their normal roles; begin/separate are spliced after w:pPr, which respects paragraph content ordering.

Field codes / switches referenced in tests are all real Word fields and valid switches: BIBLIOGRAPHY \l, INDEX \c, TOA \h \c \p, XE, HYPERLINK (spec).

The sd:* elements (sd:bibliography, sd:index, sd:tableOfAuthorities, sd:tableOfContents, sd:indexEntry) and the instruction / instructionTokens attributes are SuperDoc's intermediate representation, not OOXML — so there's nothing for the spec to constrain there. They're consumed by the preprocessors/translators and never emitted to the .docx.

No non-existent OOXML elements/attributes, no missing required attributes (notably w:fldSimple/@w:instr and w:fldChar/@w:fldCharType are always set), and no incorrect defaults.

One caveat for transparency: I'd recommend re-running this with the ecma-spec tools enabled if you want the schema-graph confirmation on record — my pass is based on ECMA-376 knowledge, not a live lookup this session. But the changes are a refactor + round-trip-fidelity fix, and the OOXML they generate is structurally sound.

@tupizz tupizz marked this pull request as ready for review May 29, 2026 14:53
@tupizz tupizz requested a review from a team as a code owner May 29, 2026 14:53
@tupizz tupizz self-assigned this May 29, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 24 files

Tip: cubic could auto-approve low-risk PRs like this, if it thinks it's safe to merge. Learn more

Re-trigger cubic

tupizz and others added 6 commits May 31, 2026 13:03
…ty (SD-3066)

Real Word documents surfaced several import defects in the block field-code
family (BIBLIOGRAPHY, INDEX, XE, TOA). This addresses them and unifies the
shared shape behind one set of helpers.

Fixes:
- Multi-run instruction aggregation joined fragments with an injected separator
  space, corrupting instructions Word splits across runs (e.g.
  `XE " Building Standard "`). Join verbatim; literal spacing is preserved.
- Table of Authorities was dropped on import: the v2 importer had no
  `sd:tableOfAuthorities` handler, so the node was silently discarded. Register
  tableOfAuthoritiesImporter alongside index/bibliography.
- A content control wrapping a block field imported as an inline
  `structuredContent` node; inside a block-only documentPartObject this threw
  "Invalid content for node type documentPartObject" and the editor failed to
  mount. Classify an SDT whose content is a block field as
  structuredContentBlock.
- Bibliography neither captured nor replayed instructionTokens (unlike
  index/toa), so split BIBLIOGRAPHY instructions did not round-trip. Add the
  attribute and wire it through preprocessor, encode and decode.

Refactors (DRY/KISS):
- Extract buildBlockFieldNode (shared by the bibliography/index/toa
  preprocessors) and wrapParagraphsAsComplexField (shared by their translator
  decoders), mirroring the existing inline-field helpers.
- Centralize BLOCK_FIELD_XML_NAMES so the paragraph importer and SDT classifier
  agree on which sd:* nodes are block content.

Adds RED→GREEN unit tests for each fix and the new shared helpers. Full
super-converter suite passes (2990).

Linear: SD-3066 (parent of SD-3005)
…ect (SD-3066) (#3566)

Word wraps a generated bibliography (and other block fields) in a docPartObject
SDT, sometimes via a nested content control. The handler only converted
paragraph and tableOfContents children to flow blocks, so the field's entry
paragraphs were silently dropped — the heading rendered but the entries did
not.

- documentPartObject now renders bibliography/index/tableOfAuthorities children
  via the shared paragraph-container handler, and structuredContentBlock
  children via their handler.
- structuredContentBlock recurses block-field children (transparent wrapper),
  rendering their paragraphs.

Section-counting invariant: findParagraphsWithSectPr recurses bibliography
(so the handler advances currentParagraphIndex per entry) but not
structuredContentBlock (so the scb path renders without advancing). Both paths
were validated against the invariant; the prior code dropped entries AND
under-counted, which could drift section breaks.

Also extracts the shared handleParagraphContainerNode used by the bibliography,
index and tableOfAuthorities handlers (previously three byte-identical copies).

Regression tests cover both nesting shapes and the counter behavior. pm-adapter
suite passes (302); layout corpus comparison shows no regressions attributable
to this change.

Linear: SD-3066
…SD-3322)

Under modules.contentControls.chrome:'none' the painter erased the SDT look
entirely, so a consumer who wanted a custom field/clause appearance had to target
the painted wrapper with !important and reach into internal state classes
(.ProseMirror-selectednode, .sdt-group-hover) to keep it stable across hover and
selection. That's the wrong "best practice" to teach.

Make the chrome-none reset read a --sd-content-controls-custom-* variable layer
with default-preserving fallbacks (0-width transparent border, no background /
radius / padding). chrome:'none' stays visually empty by default - existing
consumers see no change - but a consumer can now paint inline and block controls
by setting variables on a data-sdt-* selector. The painter applies them across
rest, hover, and selected, so the box stays stable (no jitter) and no !important
or state-class selectors are needed. `border` is a full shorthand; block adds a
`-border-left` accent rail; background vars cascade (hover from rest, selected
from hover).

- variables.css: document the custom-* surface; note the built-in chrome still
  uses the existing --sd-content-controls-* variables.
- docs: add a "Style the controls in place" section to the custom-UI content
  controls guide.
- test: assert the surface is wired and default-preserving; existing chrome-none
  selector + source-order tests are unchanged and still pass (painter-dom 1178/1178).
…' (SD-3322)

The custom hover background was overridden for LOCKED controls under chrome:'none'.
The base lock-hover rules (a built-in tint on inline, transparent on block) have
equal specificity to the plain custom hover rules but come later in source order,
so they won; the chrome-none lock-hover reset only reset z-index, not background.

Re-assert the custom hover background in that reset block - it carries the extra
.superdoc-cc-chrome-none class, so it outranks the base lock-hover rules. A locked
control now follows --sd-content-controls-custom-*-hover-bg. With no custom var
set the default is empty, so the built-in lock-hover tint no longer leaks under
chrome:'none' for locked controls (consistently empty). Only the contract-templates
demo has locked chrome-none controls, and it wants the custom hover, not the tint.

Add a regression test asserting the custom hover vars are re-asserted after the
base lock-hover rules (source order = it wins). painter-dom 1179/1179 green.
…s (SD-3322)

The content-controls theming table themes the built-in chrome. Add a one-line
note that under chrome:'none' you style controls with the
--sd-content-controls-custom-* variables instead, linking the custom UI guide.
…D-3322)

Rewrite the contract-templates demo's SDT styling onto SuperDoc's public
--sd-content-controls-custom-* variables (from #3590), proving the new API in the
real legal-template use case. The demo now styles its inline fields and block
clauses with zero !important and zero internal state selectors
(.ProseMirror-selectednode, .sdt-group-hover); the painter applies the variables
across rest, hover, selected, and locked-hover. This is the copy-pasteable
pattern for styling custom SDTs under chrome:'none'.

- style.css: replace the per-state !important rules with one variable-setting
  rule per tag (inline + block); update the host-owned-styling comment.
- test: add state coverage - the custom hover background drives a painted field
  (and wins over the built-in lock-hover tint), the border stays constant across
  states (no jitter), and no built-in label/chrome leaks. Demo suite 13/13.
- docs (Document API > Content controls): correct the contentLocked wording (it
  rejects Document API content writes too, not just the editor); document the
  locked-template pattern (unlock -> write -> relock, incl. a locked parent for
  nested fields); add the single-use governed clause-library pattern alongside
  versioned reusable sections (kept - it's a valid pattern).
- docs (Custom UI > Content controls): add a "Build a custom field system"
  walkthrough; describe the demo as a full custom contract-template UI.
- README: note the demo styles through the public custom variables.

Stacked on #3590 (the painter variable layer); retarget to main once it merges.
@harbournick harbournick force-pushed the tadeu/sd-3066-field-code-import branch from 4bc0eb3 to 5d22d00 Compare May 31, 2026 20:05
@harbournick harbournick merged commit 92bab61 into tadeu/sd-3005-feature-bibliography May 31, 2026
7 checks passed
@harbournick harbournick deleted the tadeu/sd-3066-field-code-import branch May 31, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants