Skip to content

Commit 370bb88

Browse files
committed
refactor(compare-rendering): use word-api REST with async polling
Move from the MCP JSON-RPC envelope to word-api's REST `/v1/executions` + `/v1/jobs/:id` polling. Smaller, clearer error taxonomy, and aligned with the direction the API is taking (async-first). - word.ts shrinks ~30 lines — drop SSE parser, content-type dispatch, regex JSON fallback. Plain JSON envelope all the way. - Poll interval 500ms with `timeoutSeconds * 1000 + 30s` outer deadline so a stuck job can't pin a batch forever. - Cache key and short-circuit behavior unchanged. - WORD_API_URL / WORD_API_TOKEN replace WORD_MCP_URL / WORD_MCP_TOKEN. Also ship scripts/batch.ts — the ad-hoc corpus sweep we used to pressure-test the refactor, kept as a stepping stone to M2's proper `--input-dir`. README milestones revised after M1 corpus-batch insights: M2 is now baseline + delta reporting (agent-usable signal), M3 the LLM screenshot judge (catches false negatives schema diff cannot see, e.g. border-style rendering on sd-1741), with resolved style fields and tables pushed to M4/M5.
1 parent ca48ab9 commit 370bb88

4 files changed

Lines changed: 207 additions & 68 deletions

File tree

devtools/compare-rendering/README.md

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ This is a dev tool, not a pass/fail test. It surfaces concrete divergences so yo
1313
## Quick start
1414

1515
```bash
16-
export WORD_MCP_URL="https://word-mcp.superdoc.workers.dev/mcp"
17-
export WORD_MCP_TOKEN="<your-bearer-token>"
16+
export WORD_API_URL="https://word-mcp.superdoc.workers.dev"
17+
export WORD_API_TOKEN="<your-bearer-token>"
1818

1919
pnpm compare-rendering -- \
2020
--input evals/fixtures/docs/memorandum.docx \
@@ -50,7 +50,7 @@ Example output (truncated):
5050

5151
```
5252
docx
53-
├── word adapter (POST run_powershell to word-mcp worker) ─► word.json (cached)
53+
├── word adapter (POST /v1/executions to word-api) ─► word.json (cached)
5454
└── superdoc adapter (spawn pnpm layout:export-one) ─► sd.layout.json
5555
5656
normalize both sides
@@ -69,8 +69,8 @@ docx
6969

7070
| Variable | Purpose |
7171
|------------------|------------------------------------------------------|
72-
| `WORD_MCP_URL` | HTTP endpoint of the word-mcp MCP worker |
73-
| `WORD_MCP_TOKEN` | Bearer token (same one you use in your `.mcp.json`) |
72+
| `WORD_API_URL` | Base URL of the word-api worker |
73+
| `WORD_API_TOKEN` | Bearer token |
7474

7575
## Exit codes
7676

@@ -87,10 +87,25 @@ Makes it CI-usable later without rework.
8787
- Auto-fix generation.
8888
- Publishing as a package.
8989

90-
## Milestones
90+
## Milestones (revised after M1 corpus-batch insights)
9191

92-
- **M1** (this): CLI works end-to-end on paragraph-only docs. 3 categories. JSON + markdown output. Caching.
93-
- **M2**: Pull resolved style fields out of SuperDoc's block schema. Taxonomy extends to `style`, `indent`, `font`, `color`, `alignment`, `spacing`, `numbering`.
94-
- **M3**: Batch mode (`--input-dir`), nightly run against the paragraph-only subset of the corpus, per-category dashboard.
95-
- **M4**: MCP wrapper `compare_rendering(docx_path)`. Agent dogfood with ECMA-spec MCP in context.
96-
- **M5**: Table support. Non-trivial — needs parallel table walks on both sides.
92+
- **M1** ✅ — CLI works end-to-end on paragraph-only docs. 4 categories (`text`, `pagination`, `structure`, `unsupported`). JSON + markdown output. Word-extraction cache. Ad-hoc `scripts/batch.ts` runner for whole-corpus sweeps.
93+
- **M2** — Baseline + delta reporting. Snapshot findings against a pinned SuperDoc sha, emit only `resolved` / `new` since baseline. This is what makes the tool **agent-usable**: signal becomes "my change fixed N, broke M" instead of "here are 367 absolute findings." Pin a `main`-branch baseline at `test-corpus/.baseline.json`.
94+
- **M3** — LLM screenshot judge for docs where schema diff is silent or near-silent. Catches rendering divergences that don't surface in layout data at all (e.g. `w:val="wave"` border styles rendered as plain lines, font substitution, painter-level overflow).
95+
- **M4** — Populate `NormalizedParagraph.resolved` on SuperDoc side. Taxonomy extends to `style`, `indent`, `font`, `color`, `alignment`, `spacing`, `numbering`. Safe to add once M2 absorbs the "new field adds findings everywhere" noise.
96+
- **M5** — Table support. Non-trivial; needs parallel table walks on both sides.
97+
98+
## Insights from M1 corpus batch (75 docs, April 2026)
99+
100+
- **Pagination findings compound.** Many "N pagination findings" collapse to one underlying bug expressed N times. `memorandum.docx` (3 findings) and `sd-1741-paragraph-between-borders` (36 findings) share the same root cause — SuperDoc fits slightly more content per page than Word; drift accumulates across pages. One fix likely eliminates most findings at once.
101+
- **Schema diff has real false negatives.** `sd-1741` reports 0 text/style findings, but visually SuperDoc renders every border-between style (`wave`, `doubleWave`, `dashDotStroked`, `triple`, …) as a plain line while Word renders each correctly. Schema-level comparison will never catch this class without the M3 screenshot judge.
102+
- **~27 % of the corpus is in M1 scope.** 13 / 75 docs are short-circuited for tables/shapes/comments/revisions; the rest yield meaningful findings. Real-world DOCX coverage unlocks at M5 (tables).
103+
104+
## Corpus sweep
105+
106+
Ad-hoc batch runner at `scripts/batch.ts` — iterates every `.docx` under a directory, writes per-doc JSON reports plus a `_summary.json`, and prints a one-line status per doc. Graduates to a proper `--input-dir` flag in M2 alongside baseline support.
107+
108+
```bash
109+
WORD_API_URL=... WORD_API_TOKEN=... \
110+
bun devtools/compare-rendering/scripts/batch.ts test-corpus/rendering
111+
```
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
#!/usr/bin/env bun
2+
// Ad-hoc batch runner — iterates every .docx under a directory, runs compare-rendering,
3+
// and prints a summary. Not part of M1; this is scaffolding for M3's --input-dir mode.
4+
5+
import { readdir, readFile, writeFile, mkdir } from 'node:fs/promises';
6+
import { resolve, join, basename } from 'node:path';
7+
import { spawn } from 'node:child_process';
8+
9+
const INPUT_DIR = resolve(process.argv[2] ?? 'test-corpus/rendering');
10+
const OUTPUT_DIR = resolve(process.argv[3] ?? 'test-corpus/.reports');
11+
const CLI = resolve(new URL('../src/cli.ts', import.meta.url).pathname);
12+
13+
await mkdir(OUTPUT_DIR, { recursive: true });
14+
15+
const files = (await readdir(INPUT_DIR)).filter((f) => f.endsWith('.docx')).sort();
16+
17+
console.log(`[batch] ${files.length} docs under ${INPUT_DIR}\n`);
18+
19+
type Summary = {
20+
file: string;
21+
status: 'match' | 'diffs' | 'skipped' | 'error';
22+
supported: boolean;
23+
unsupportedReason?: string;
24+
findings: number;
25+
byCategory: Record<string, number>;
26+
durationMs: number;
27+
note?: string;
28+
};
29+
30+
const summaries: Summary[] = [];
31+
const start = Date.now();
32+
33+
for (let i = 0; i < files.length; i += 1) {
34+
const f = files[i]!;
35+
const input = join(INPUT_DIR, f);
36+
const output = join(OUTPUT_DIR, f.replace(/\.docx$/, '.json'));
37+
const t0 = Date.now();
38+
39+
const status = await runOne(input, output);
40+
const ms = Date.now() - t0;
41+
42+
try {
43+
const raw = JSON.parse(await readFile(output, 'utf8'));
44+
const byCategory: Record<string, number> = {};
45+
for (const finding of raw.findings ?? []) {
46+
byCategory[finding.category] = (byCategory[finding.category] ?? 0) + 1;
47+
}
48+
const supported = raw.wordSupported === true;
49+
summaries.push({
50+
file: f,
51+
status: !supported ? 'skipped' : raw.findings.length === 0 ? 'match' : 'diffs',
52+
supported,
53+
unsupportedReason: raw.unsupportedReason,
54+
findings: (raw.findings ?? []).length,
55+
byCategory,
56+
durationMs: ms,
57+
});
58+
} catch {
59+
summaries.push({
60+
file: f,
61+
status: 'error',
62+
supported: false,
63+
findings: 0,
64+
byCategory: {},
65+
durationMs: ms,
66+
note: `CLI exit ${status}`,
67+
});
68+
}
69+
70+
const last = summaries[summaries.length - 1]!;
71+
const marker = last.status === 'match' ? '✓' : last.status === 'diffs' ? '⚠' : last.status === 'skipped' ? '–' : '✗';
72+
const tail =
73+
last.status === 'skipped'
74+
? `skipped: ${last.unsupportedReason}`
75+
: last.status === 'diffs'
76+
? `${last.findings} finding(s) ${JSON.stringify(last.byCategory)}`
77+
: last.status === 'match'
78+
? 'match'
79+
: (last.note ?? 'error');
80+
console.log(`[${i + 1}/${files.length}] ${marker} ${f} (${ms}ms) — ${tail}`);
81+
}
82+
83+
const totalMs = Date.now() - start;
84+
85+
console.log(`\n[batch] done in ${(totalMs / 1000).toFixed(1)}s`);
86+
87+
const counts = { match: 0, diffs: 0, skipped: 0, error: 0 };
88+
for (const s of summaries) counts[s.status] += 1;
89+
console.log(`\nOverall:`);
90+
console.log(` match: ${counts.match}`);
91+
console.log(` diffs: ${counts.diffs}`);
92+
console.log(` skipped: ${counts.skipped}`);
93+
console.log(` error: ${counts.error}`);
94+
95+
const reasons: Record<string, number> = {};
96+
for (const s of summaries) {
97+
if (s.status === 'skipped' && s.unsupportedReason) {
98+
const key = s.unsupportedReason.replace(/\s*\(\d+\)$/, '');
99+
reasons[key] = (reasons[key] ?? 0) + 1;
100+
}
101+
}
102+
if (Object.keys(reasons).length) {
103+
console.log(`\nSkip reasons:`);
104+
for (const [k, v] of Object.entries(reasons).sort((a, b) => b[1] - a[1])) {
105+
console.log(` ${v.toString().padStart(3)} × ${k}`);
106+
}
107+
}
108+
109+
const categories: Record<string, number> = {};
110+
for (const s of summaries) {
111+
for (const [cat, count] of Object.entries(s.byCategory)) {
112+
categories[cat] = (categories[cat] ?? 0) + count;
113+
}
114+
}
115+
if (Object.keys(categories).length) {
116+
console.log(`\nFindings by category (across docs with diffs):`);
117+
for (const [k, v] of Object.entries(categories).sort((a, b) => b[1] - a[1])) {
118+
console.log(` ${v.toString().padStart(3)} × ${k}`);
119+
}
120+
}
121+
122+
const summaryPath = join(OUTPUT_DIR, '_summary.json');
123+
await writeFile(summaryPath, JSON.stringify({ totalMs, counts, reasons, categories, summaries }, null, 2));
124+
console.log(`\n[batch] summary written to ${summaryPath}`);
125+
126+
function runOne(input: string, output: string): Promise<number> {
127+
return new Promise((resolve) => {
128+
const child = spawn('bun', [CLI, '--input', input, '--output', output, '--format', 'json'], {
129+
stdio: ['ignore', 'pipe', 'pipe'],
130+
env: process.env,
131+
});
132+
// Drain stdio to avoid backpressure; we don't print the CLI's own logs.
133+
child.stdout?.on('data', () => {});
134+
child.stderr?.on('data', () => {});
135+
child.on('close', (code) => resolve(code ?? 0));
136+
});
137+
}

devtools/compare-rendering/src/cli.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ Options:
3131
-h, --help Show this help.
3232
3333
Env:
34-
WORD_MCP_URL HTTP endpoint of the word-mcp worker.
35-
WORD_MCP_TOKEN Bearer token for the worker.
34+
WORD_API_URL Base URL of the word-api worker (e.g. https://word-mcp.superdoc.workers.dev).
35+
WORD_API_TOKEN Bearer token for the worker.
3636
3737
Exit codes:
3838
0 — ran; findings are at most visible/cosmetic.

devtools/compare-rendering/src/word.ts

Lines changed: 42 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -5,71 +5,58 @@ import { hashFile, readCache, sha256, writeCache } from './cache.ts';
55

66
const SCRIPT_PATH = fileURLToPath(new URL('./extract-layout.ps1', import.meta.url));
77

8-
type McpResponse = {
9-
result?: { content?: Array<{ type: string; text?: string }> };
10-
error?: { message: string };
8+
type JobEnvelope = {
9+
id: string;
10+
status: 'queued' | 'running' | 'succeeded' | 'failed';
11+
result?: { output?: string } | null;
12+
error?: { code: string; message: string } | null;
1113
};
1214

13-
async function callWordMcp(command: string, timeoutSeconds = 240): Promise<string> {
14-
const url = process.env.WORD_MCP_URL;
15-
const token = process.env.WORD_MCP_TOKEN;
16-
if (!url) throw new Error('WORD_MCP_URL not set');
17-
if (!token) throw new Error('WORD_MCP_TOKEN not set');
18-
19-
const body = {
20-
jsonrpc: '2.0',
21-
id: 1,
22-
method: 'tools/call',
23-
params: {
24-
name: 'run_powershell',
25-
arguments: { command, timeout_seconds: timeoutSeconds },
26-
},
27-
};
28-
29-
const res = await fetch(url, {
15+
const POLL_INTERVAL_MS = 500;
16+
const POLL_BUFFER_MS = 30_000;
17+
18+
async function runPowerShell(script: string, timeoutSeconds: number): Promise<string> {
19+
const base = process.env.WORD_API_URL;
20+
const token = process.env.WORD_API_TOKEN;
21+
if (!base) throw new Error('WORD_API_URL not set');
22+
if (!token) throw new Error('WORD_API_TOKEN not set');
23+
24+
const root = base.replace(/\/$/, '');
25+
const authHeaders = { Authorization: `Bearer ${token}` } as const;
26+
27+
const res = await fetch(`${root}/v1/executions`, {
3028
method: 'POST',
31-
headers: {
32-
'Content-Type': 'application/json',
33-
Accept: 'application/json, text/event-stream',
34-
Authorization: `Bearer ${token}`,
35-
},
36-
body: JSON.stringify(body),
29+
headers: { ...authHeaders, 'Content-Type': 'application/json' },
30+
body: JSON.stringify({ script, timeout_seconds: timeoutSeconds }),
3731
});
3832

3933
if (!res.ok) {
40-
const errText = await res.text().catch((e) => `<body read failed: ${(e as Error).message}>`);
41-
throw new Error(`word-mcp HTTP ${res.status}: ${errText.slice(0, 5000)}`);
34+
const body = await res.text().catch((e) => `<body read failed: ${(e as Error).message}>`);
35+
throw new Error(`word-api HTTP ${res.status}: ${body.slice(0, 5000)}`);
4236
}
4337

44-
const contentType = res.headers.get('content-type') ?? '';
45-
const parsed = contentType.startsWith('text/event-stream')
46-
? parseSseResponse(await res.text())
47-
: ((await res.json()) as McpResponse);
48-
49-
if (parsed.error) throw new Error(`word-mcp error: ${parsed.error.message}`);
50-
const content = parsed.result?.content ?? [];
51-
return content
52-
.filter((c) => c.type === 'text' && typeof c.text === 'string')
53-
.map((c) => c.text!)
54-
.join('\n');
55-
}
38+
let job = (await res.json()) as JobEnvelope;
39+
const deadline = Date.now() + timeoutSeconds * 1000 + POLL_BUFFER_MS;
5640

57-
function parseSseResponse(stream: string): McpResponse {
58-
// SSE events are separated by blank lines; we want the last `data:` payload that parses as our JSON-RPC response.
59-
const events = stream.split(/\r?\n\r?\n/);
60-
for (let i = events.length - 1; i >= 0; i -= 1) {
61-
const data = events[i]!.split(/\r?\n/)
62-
.filter((line) => line.startsWith('data:'))
63-
.map((line) => line.slice(5).replace(/^ /, ''))
64-
.join('\n');
65-
if (!data || data === '[DONE]') continue;
66-
try {
67-
return JSON.parse(data) as McpResponse;
68-
} catch {
69-
// try the previous event
41+
while (job.status === 'queued' || job.status === 'running') {
42+
if (Date.now() > deadline) {
43+
throw new Error(`word-api job ${job.id} poll deadline exceeded (${job.status})`);
7044
}
45+
await new Promise((r) => setTimeout(r, POLL_INTERVAL_MS));
46+
const pollRes = await fetch(`${root}/v1/jobs/${job.id}`, { headers: authHeaders });
47+
if (!pollRes.ok) {
48+
const body = await pollRes.text().catch(() => '');
49+
throw new Error(`word-api poll HTTP ${pollRes.status}: ${body.slice(0, 500)}`);
50+
}
51+
job = (await pollRes.json()) as JobEnvelope;
52+
}
53+
54+
if (job.status !== 'succeeded') {
55+
const code = job.error?.code ?? 'unknown';
56+
const message = job.error?.message ?? 'no error message';
57+
throw new Error(`word-api job ${job.id} ${job.status} (${code}): ${message}`);
7158
}
72-
throw new Error(`word-mcp: no parseable SSE payload in:\n${stream.slice(0, 500)}`);
59+
return job.result?.output ?? '';
7360
}
7461

7562
function parseExtractionOutput(output: string): WordExtraction {
@@ -99,7 +86,7 @@ export async function extractWord(
9986
const b64 = docxBytes.toString('base64');
10087
const command = `$b64 = '${b64}'\n${psBody}`;
10188

102-
const output = await callWordMcp(command);
89+
const output = await runPowerShell(command, 600);
10390
const extraction = parseExtractionOutput(output);
10491

10592
if (useCache) await writeCache(docxSha, psSha, extraction);

0 commit comments

Comments
 (0)