feat(compare-rendering): add Word↔SuperDoc layout comparison#2891
Draft
caio-pizzol wants to merge 3 commits intomainfrom
Draft
feat(compare-rendering): add Word↔SuperDoc layout comparison#2891caio-pizzol wants to merge 3 commits intomainfrom
caio-pizzol wants to merge 3 commits intomainfrom
Conversation
Diffs resolved paragraph state between Word (via word-mcp run_powershell on a Windows VM) and SuperDoc (via layout:export-one) for paragraph-only docx files. Emits typed Finding[] with category/severity/specRef/codeArea hints so an agent consumer can route fixes to the right SuperDoc module. Unsupported features (tables, inline/floating shapes, tracked changes, comments) short-circuit with a skipped finding rather than producing a misleading diff. Word extraction is cached by sha256(docx) + sha256(extract-layout.ps1) so PS edits bust the cache automatically. Scope: paragraph-only flow. Categories emitted: text, pagination, structure, unsupported. Style/indent/color/numbering deferred to M2.
Move from the MCP JSON-RPC envelope to word-api's REST `/v1/executions` + `/v1/jobs/:id` polling. Smaller, clearer error taxonomy, and aligned with the direction the API is taking (async-first). - word.ts shrinks ~30 lines — drop SSE parser, content-type dispatch, regex JSON fallback. Plain JSON envelope all the way. - Poll interval 500ms with `timeoutSeconds * 1000 + 30s` outer deadline so a stuck job can't pin a batch forever. - Cache key and short-circuit behavior unchanged. - WORD_API_URL / WORD_API_TOKEN replace WORD_MCP_URL / WORD_MCP_TOKEN. Also ship scripts/batch.ts — the ad-hoc corpus sweep we used to pressure-test the refactor, kept as a stepping stone to M2's proper `--input-dir`. README milestones revised after M1 corpus-batch insights: M2 is now baseline + delta reporting (agent-usable signal), M3 the LLM screenshot judge (catches false negatives schema diff cannot see, e.g. border-style rendering on sd-1741), with resolved style fields and tables pushed to M4/M5.
Collaborator
|
@caio-pizzol sounds interesting. let's just make sure we're not both doing the same work? I'm doing a lot around this in labs. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 370bb8823e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…ia fileURLToPath Two small fixes from PR review: - parseExtractionOutput searched for JSON_END from offset 0, so a document whose paragraph text contained the literal string "JSON_END" could truncate the payload before JSON.parse. Search from past JSON_BEGIN instead. - REPO_ROOT in superdoc.ts was using URL.pathname, which yields /C:/… on Windows and keeps URL-encoded special chars. Use fileURLToPath like we already do in cache.ts / word.ts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A dev tool that compares how Word and SuperDoc render the same
.docx. Runs viapnpm compare-rendering. Instead of eyeballing screenshots, it prints a list of differences — text, pagination, structure — and points at the SuperDoc code to look at.Scope is small on purpose: paragraph-only docs. Tables, images, comments, and tracked changes get skipped with a clear reason, so we never fake a "looks fine."
word-apiREST endpoint; talks to SuperDoc through the existingpnpm layout:export-one.Roadmap is in the README. Next is baselines (so an agent can tell what its change affected) and a screenshot judge for things schema diff can't see.
Review:
src/word.tspolling,src/differ.tscategory routing,src/extract-layout.ps1short-circuit. Skipscripts/batch.ts.