Skip to content

feat(compare-rendering): add Word↔SuperDoc layout comparison#2891

Draft
caio-pizzol wants to merge 3 commits intomainfrom
caio/compare-rendering
Draft

feat(compare-rendering): add Word↔SuperDoc layout comparison#2891
caio-pizzol wants to merge 3 commits intomainfrom
caio/compare-rendering

Conversation

@caio-pizzol
Copy link
Copy Markdown
Contributor

@caio-pizzol caio-pizzol commented Apr 22, 2026

A dev tool that compares how Word and SuperDoc render the same .docx. Runs via pnpm compare-rendering. Instead of eyeballing screenshots, it prints a list of differences — text, pagination, structure — and points at the SuperDoc code to look at.

Scope is small on purpose: paragraph-only docs. Tables, images, comments, and tracked changes get skipped with a clear reason, so we never fake a "looks fine."

  • Talks to Word through the word-api REST endpoint; talks to SuperDoc through the existing pnpm layout:export-one.
  • Caches Word output so re-runs don't hit the VM again.
  • 13 unit tests on the diff logic. Full 75-doc corpus: 20 match, 40 differ, 13 skipped.

Roadmap is in the README. Next is baselines (so an agent can tell what its change affected) and a screenshot judge for things schema diff can't see.

Review: src/word.ts polling, src/differ.ts category routing, src/extract-layout.ps1 short-circuit. Skip scripts/batch.ts.

Diffs resolved paragraph state between Word (via word-mcp run_powershell
on a Windows VM) and SuperDoc (via layout:export-one) for paragraph-only
docx files. Emits typed Finding[] with category/severity/specRef/codeArea
hints so an agent consumer can route fixes to the right SuperDoc module.

Unsupported features (tables, inline/floating shapes, tracked changes,
comments) short-circuit with a skipped finding rather than producing a
misleading diff. Word extraction is cached by sha256(docx) +
sha256(extract-layout.ps1) so PS edits bust the cache automatically.

Scope: paragraph-only flow. Categories emitted: text, pagination,
structure, unsupported. Style/indent/color/numbering deferred to M2.
Move from the MCP JSON-RPC envelope to word-api's REST `/v1/executions`
+ `/v1/jobs/:id` polling. Smaller, clearer error taxonomy, and aligned
with the direction the API is taking (async-first).

- word.ts shrinks ~30 lines — drop SSE parser, content-type dispatch,
  regex JSON fallback. Plain JSON envelope all the way.
- Poll interval 500ms with `timeoutSeconds * 1000 + 30s` outer deadline
  so a stuck job can't pin a batch forever.
- Cache key and short-circuit behavior unchanged.
- WORD_API_URL / WORD_API_TOKEN replace WORD_MCP_URL / WORD_MCP_TOKEN.

Also ship scripts/batch.ts — the ad-hoc corpus sweep we used to
pressure-test the refactor, kept as a stepping stone to M2's proper
`--input-dir`. README milestones revised after M1 corpus-batch
insights: M2 is now baseline + delta reporting (agent-usable signal),
M3 the LLM screenshot judge (catches false negatives schema diff
cannot see, e.g. border-style rendering on sd-1741), with resolved
style fields and tables pushed to M4/M5.
@caio-pizzol caio-pizzol marked this pull request as draft April 22, 2026 00:24
@caio-pizzol caio-pizzol changed the title feat(compare-rendering): add Word↔SuperDoc paragraph-diff CLI (M1) feat(compare-rendering): add Word↔SuperDoc layout comparison Apr 22, 2026
@harbournick
Copy link
Copy Markdown
Collaborator

@caio-pizzol sounds interesting. let's just make sure we're not both doing the same work? I'm doing a lot around this in labs.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 370bb8823e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread devtools/compare-rendering/src/word.ts Outdated
Comment thread devtools/compare-rendering/src/superdoc.ts Outdated
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…ia fileURLToPath

Two small fixes from PR review:

- parseExtractionOutput searched for JSON_END from offset 0, so a
  document whose paragraph text contained the literal string
  "JSON_END" could truncate the payload before JSON.parse. Search
  from past JSON_BEGIN instead.
- REPO_ROOT in superdoc.ts was using URL.pathname, which yields
  /C:/… on Windows and keeps URL-encoded special chars. Use
  fileURLToPath like we already do in cache.ts / word.ts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants