refactor(compare-rendering): use word-api REST with async polling

caio-pizzol · caio-pizzol · commit 370bb8823e27 · 2026-04-21T21:23:00.000-03:00
Move from the MCP JSON-RPC envelope to word-api's REST `/v1/executions`
+ `/v1/jobs/:id` polling. Smaller, clearer error taxonomy, and aligned
with the direction the API is taking (async-first).

- word.ts shrinks ~30 lines — drop SSE parser, content-type dispatch,
  regex JSON fallback. Plain JSON envelope all the way.
- Poll interval 500ms with `timeoutSeconds * 1000 + 30s` outer deadline
  so a stuck job can't pin a batch forever.
- Cache key and short-circuit behavior unchanged.
- WORD_API_URL / WORD_API_TOKEN replace WORD_MCP_URL / WORD_MCP_TOKEN.

Also ship scripts/batch.ts — the ad-hoc corpus sweep we used to
pressure-test the refactor, kept as a stepping stone to M2's proper
`--input-dir`. README milestones revised after M1 corpus-batch
insights: M2 is now baseline + delta reporting (agent-usable signal),
M3 the LLM screenshot judge (catches false negatives schema diff
cannot see, e.g. border-style rendering on sd-1741), with resolved
style fields and tables pushed to M4/M5.
diff --git a/devtools/compare-rendering/README.md b/devtools/compare-rendering/README.md
@@ -13,8 +13,8 @@ This is a dev tool, not a pass/fail test. It surfaces concrete divergences so yo
 ## Quick start
 
 ```bash
-export WORD_MCP_URL="https://word-mcp.superdoc.workers.dev/mcp"
-export WORD_MCP_TOKEN="<your-bearer-token>"
+export WORD_API_URL="https://word-mcp.superdoc.workers.dev"
+export WORD_API_TOKEN="<your-bearer-token>"
 
 pnpm compare-rendering -- \
   --input evals/fixtures/docs/memorandum.docx \
@@ -50,7 +50,7 @@ Example output (truncated):
 
 ```
 docx
- ├── word adapter (POST run_powershell to word-mcp worker) ─► word.json (cached)
+ ├── word adapter (POST /v1/executions to word-api)         ─► word.json (cached)
  └── superdoc adapter (spawn pnpm layout:export-one)        ─► sd.layout.json
                                 │
                         normalize both sides
@@ -69,8 +69,8 @@ docx
 
 | Variable         | Purpose                                              |
 |------------------|------------------------------------------------------|
-| `WORD_MCP_URL`   | HTTP endpoint of the word-mcp MCP worker             |
-| `WORD_MCP_TOKEN` | Bearer token (same one you use in your `.mcp.json`)  |
+| `WORD_API_URL`   | Base URL of the word-api worker                      |
+| `WORD_API_TOKEN` | Bearer token                                         |
 
 ## Exit codes
 
@@ -87,10 +87,25 @@ Makes it CI-usable later without rework.
 - Auto-fix generation.
 - Publishing as a package.
 
-## Milestones
+## Milestones (revised after M1 corpus-batch insights)
 
-- **M1** (this): CLI works end-to-end on paragraph-only docs. 3 categories. JSON + markdown output. Caching.
-- **M2**: Pull resolved style fields out of SuperDoc's block schema. Taxonomy extends to `style`, `indent`, `font`, `color`, `alignment`, `spacing`, `numbering`.
-- **M3**: Batch mode (`--input-dir`), nightly run against the paragraph-only subset of the corpus, per-category dashboard.
-- **M4**: MCP wrapper `compare_rendering(docx_path)`. Agent dogfood with ECMA-spec MCP in context.
-- **M5**: Table support. Non-trivial — needs parallel table walks on both sides.
+- **M1** ✅ — CLI works end-to-end on paragraph-only docs. 4 categories (`text`, `pagination`, `structure`, `unsupported`). JSON + markdown output. Word-extraction cache. Ad-hoc `scripts/batch.ts` runner for whole-corpus sweeps.
+- **M2** — Baseline + delta reporting. Snapshot findings against a pinned SuperDoc sha, emit only `resolved` / `new` since baseline. This is what makes the tool **agent-usable**: signal becomes "my change fixed N, broke M" instead of "here are 367 absolute findings." Pin a `main`-branch baseline at `test-corpus/.baseline.json`.
+- **M3** — LLM screenshot judge for docs where schema diff is silent or near-silent. Catches rendering divergences that don't surface in layout data at all (e.g. `w:val="wave"` border styles rendered as plain lines, font substitution, painter-level overflow).
+- **M4** — Populate `NormalizedParagraph.resolved` on SuperDoc side. Taxonomy extends to `style`, `indent`, `font`, `color`, `alignment`, `spacing`, `numbering`. Safe to add once M2 absorbs the "new field adds findings everywhere" noise.
+- **M5** — Table support. Non-trivial; needs parallel table walks on both sides.
+
+## Insights from M1 corpus batch (75 docs, April 2026)
+
+- **Pagination findings compound.** Many "N pagination findings" collapse to one underlying bug expressed N times. `memorandum.docx` (3 findings) and `sd-1741-paragraph-between-borders` (36 findings) share the same root cause — SuperDoc fits slightly more content per page than Word; drift accumulates across pages. One fix likely eliminates most findings at once.
+- **Schema diff has real false negatives.** `sd-1741` reports 0 text/style findings, but visually SuperDoc renders every border-between style (`wave`, `doubleWave`, `dashDotStroked`, `triple`, …) as a plain line while Word renders each correctly. Schema-level comparison will never catch this class without the M3 screenshot judge.
+- **~27 % of the corpus is in M1 scope.** 13 / 75 docs are short-circuited for tables/shapes/comments/revisions; the rest yield meaningful findings. Real-world DOCX coverage unlocks at M5 (tables).
+
+## Corpus sweep
+
+Ad-hoc batch runner at `scripts/batch.ts` — iterates every `.docx` under a directory, writes per-doc JSON reports plus a `_summary.json`, and prints a one-line status per doc. Graduates to a proper `--input-dir` flag in M2 alongside baseline support.
+
+```bash
+WORD_API_URL=... WORD_API_TOKEN=... \
+  bun devtools/compare-rendering/scripts/batch.ts test-corpus/rendering
+```
diff --git a/devtools/compare-rendering/scripts/batch.ts b/devtools/compare-rendering/scripts/batch.ts
@@ -0,0 +1,137 @@
+#!/usr/bin/env bun
+// Ad-hoc batch runner — iterates every .docx under a directory, runs compare-rendering,
+// and prints a summary. Not part of M1; this is scaffolding for M3's --input-dir mode.
+
+import { readdir, readFile, writeFile, mkdir } from 'node:fs/promises';
+import { resolve, join, basename } from 'node:path';
+import { spawn } from 'node:child_process';
+
+const INPUT_DIR = resolve(process.argv[2] ?? 'test-corpus/rendering');
+const OUTPUT_DIR = resolve(process.argv[3] ?? 'test-corpus/.reports');
+const CLI = resolve(new URL('../src/cli.ts', import.meta.url).pathname);
+
+await mkdir(OUTPUT_DIR, { recursive: true });
+
+const files = (await readdir(INPUT_DIR)).filter((f) => f.endsWith('.docx')).sort();
+
+console.log(`[batch] ${files.length} docs under ${INPUT_DIR}\n`);
+
+type Summary = {
+  file: string;
+  status: 'match' | 'diffs' | 'skipped' | 'error';
+  supported: boolean;
+  unsupportedReason?: string;
+  findings: number;
+  byCategory: Record<string, number>;
+  durationMs: number;
+  note?: string;
+};
+
+const summaries: Summary[] = [];
+const start = Date.now();
+
+for (let i = 0; i < files.length; i += 1) {
+  const f = files[i]!;
+  const input = join(INPUT_DIR, f);
+  const output = join(OUTPUT_DIR, f.replace(/\.docx$/, '.json'));
+  const t0 = Date.now();
+
+  const status = await runOne(input, output);
+  const ms = Date.now() - t0;
+
+  try {
+    const raw = JSON.parse(await readFile(output, 'utf8'));
+    const byCategory: Record<string, number> = {};
+    for (const finding of raw.findings ?? []) {
+      byCategory[finding.category] = (byCategory[finding.category] ?? 0) + 1;
+    }
+    const supported = raw.wordSupported === true;
+    summaries.push({
+      file: f,
+      status: !supported ? 'skipped' : raw.findings.length === 0 ? 'match' : 'diffs',
+      supported,
+      unsupportedReason: raw.unsupportedReason,
+      findings: (raw.findings ?? []).length,
+      byCategory,
+      durationMs: ms,
+    });
+  } catch {
+    summaries.push({
+      file: f,
+      status: 'error',
+      supported: false,
+      findings: 0,
+      byCategory: {},
+      durationMs: ms,
+      note: `CLI exit ${status}`,
+    });
+  }
+
+  const last = summaries[summaries.length - 1]!;
+  const marker = last.status === 'match' ? '✓' : last.status === 'diffs' ? '⚠' : last.status === 'skipped' ? '–' : '✗';
+  const tail =
+    last.status === 'skipped'
+      ? `skipped: ${last.unsupportedReason}`
+      : last.status === 'diffs'
+        ? `${last.findings} finding(s) ${JSON.stringify(last.byCategory)}`
+        : last.status === 'match'
+          ? 'match'
+          : (last.note ?? 'error');
+  console.log(`[${i + 1}/${files.length}] ${marker} ${f} (${ms}ms) — ${tail}`);
+}
+
+const totalMs = Date.now() - start;
+
+console.log(`\n[batch] done in ${(totalMs / 1000).toFixed(1)}s`);
+
+const counts = { match: 0, diffs: 0, skipped: 0, error: 0 };
+for (const s of summaries) counts[s.status] += 1;
+console.log(`\nOverall:`);
+console.log(`  match:   ${counts.match}`);
+console.log(`  diffs:   ${counts.diffs}`);
+console.log(`  skipped: ${counts.skipped}`);
+console.log(`  error:   ${counts.error}`);
+
+const reasons: Record<string, number> = {};
+for (const s of summaries) {
+  if (s.status === 'skipped' && s.unsupportedReason) {
+    const key = s.unsupportedReason.replace(/\s*\(\d+\)$/, '');
+    reasons[key] = (reasons[key] ?? 0) + 1;
+  }
+}
+if (Object.keys(reasons).length) {
+  console.log(`\nSkip reasons:`);
+  for (const [k, v] of Object.entries(reasons).sort((a, b) => b[1] - a[1])) {
+    console.log(`  ${v.toString().padStart(3)} × ${k}`);
+  }
+}
+
+const categories: Record<string, number> = {};
+for (const s of summaries) {
+  for (const [cat, count] of Object.entries(s.byCategory)) {
+    categories[cat] = (categories[cat] ?? 0) + count;
+  }
+}
+if (Object.keys(categories).length) {
+  console.log(`\nFindings by category (across docs with diffs):`);
+  for (const [k, v] of Object.entries(categories).sort((a, b) => b[1] - a[1])) {
+    console.log(`  ${v.toString().padStart(3)} × ${k}`);
+  }
+}
+
+const summaryPath = join(OUTPUT_DIR, '_summary.json');
+await writeFile(summaryPath, JSON.stringify({ totalMs, counts, reasons, categories, summaries }, null, 2));
+console.log(`\n[batch] summary written to ${summaryPath}`);
+
+function runOne(input: string, output: string): Promise<number> {
+  return new Promise((resolve) => {
+    const child = spawn('bun', [CLI, '--input', input, '--output', output, '--format', 'json'], {
+      stdio: ['ignore', 'pipe', 'pipe'],
+      env: process.env,
+    });
+    // Drain stdio to avoid backpressure; we don't print the CLI's own logs.
+    child.stdout?.on('data', () => {});
+    child.stderr?.on('data', () => {});
+    child.on('close', (code) => resolve(code ?? 0));
+  });
+}
diff --git a/devtools/compare-rendering/src/cli.ts b/devtools/compare-rendering/src/cli.ts
@@ -31,8 +31,8 @@ Options:
   -h, --help                 Show this help.
 
 Env:
-  WORD_MCP_URL               HTTP endpoint of the word-mcp worker.
-  WORD_MCP_TOKEN             Bearer token for the worker.
+  WORD_API_URL               Base URL of the word-api worker (e.g. https://word-mcp.superdoc.workers.dev).
+  WORD_API_TOKEN             Bearer token for the worker.
 
 Exit codes:
   0  — ran; findings are at most visible/cosmetic.
diff --git a/devtools/compare-rendering/src/word.ts b/devtools/compare-rendering/src/word.ts
@@ -5,71 +5,58 @@ import { hashFile, readCache, sha256, writeCache } from './cache.ts';
 
 const SCRIPT_PATH = fileURLToPath(new URL('./extract-layout.ps1', import.meta.url));
 
-type McpResponse = {
-  result?: { content?: Array<{ type: string; text?: string }> };
-  error?: { message: string };
+type JobEnvelope = {
+  id: string;
+  status: 'queued' | 'running' | 'succeeded' | 'failed';
+  result?: { output?: string } | null;
+  error?: { code: string; message: string } | null;
 };
 
-async function callWordMcp(command: string, timeoutSeconds = 240): Promise<string> {
-  const url = process.env.WORD_MCP_URL;
-  const token = process.env.WORD_MCP_TOKEN;
-  if (!url) throw new Error('WORD_MCP_URL not set');
-  if (!token) throw new Error('WORD_MCP_TOKEN not set');
-
-  const body = {
-    jsonrpc: '2.0',
-    id: 1,
-    method: 'tools/call',
-    params: {
-      name: 'run_powershell',
-      arguments: { command, timeout_seconds: timeoutSeconds },
-    },
-  };
-
-  const res = await fetch(url, {
+const POLL_INTERVAL_MS = 500;
+const POLL_BUFFER_MS = 30_000;
+
+async function runPowerShell(script: string, timeoutSeconds: number): Promise<string> {
+  const base = process.env.WORD_API_URL;
+  const token = process.env.WORD_API_TOKEN;
+  if (!base) throw new Error('WORD_API_URL not set');
+  if (!token) throw new Error('WORD_API_TOKEN not set');
+
+  const root = base.replace(/\/$/, '');
+  const authHeaders = { Authorization: `Bearer ${token}` } as const;
+
+  const res = await fetch(`${root}/v1/executions`, {
     method: 'POST',
-    headers: {
-      'Content-Type': 'application/json',
-      Accept: 'application/json, text/event-stream',
-      Authorization: `Bearer ${token}`,
-    },
-    body: JSON.stringify(body),
+    headers: { ...authHeaders, 'Content-Type': 'application/json' },
+    body: JSON.stringify({ script, timeout_seconds: timeoutSeconds }),
   });
 
   if (!res.ok) {
-    const errText = await res.text().catch((e) => `<body read failed: ${(e as Error).message}>`);
-    throw new Error(`word-mcp HTTP ${res.status}: ${errText.slice(0, 5000)}`);
+    const body = await res.text().catch((e) => `<body read failed: ${(e as Error).message}>`);
+    throw new Error(`word-api HTTP ${res.status}: ${body.slice(0, 5000)}`);
   }
 
-  const contentType = res.headers.get('content-type') ?? '';
-  const parsed = contentType.startsWith('text/event-stream')
-    ? parseSseResponse(await res.text())
-    : ((await res.json()) as McpResponse);
-
-  if (parsed.error) throw new Error(`word-mcp error: ${parsed.error.message}`);
-  const content = parsed.result?.content ?? [];
-  return content
-    .filter((c) => c.type === 'text' && typeof c.text === 'string')
-    .map((c) => c.text!)
-    .join('\n');
-}
+  let job = (await res.json()) as JobEnvelope;
+  const deadline = Date.now() + timeoutSeconds * 1000 + POLL_BUFFER_MS;
 
-function parseSseResponse(stream: string): McpResponse {
-  // SSE events are separated by blank lines; we want the last `data:` payload that parses as our JSON-RPC response.
-  const events = stream.split(/\r?\n\r?\n/);
-  for (let i = events.length - 1; i >= 0; i -= 1) {
-    const data = events[i]!.split(/\r?\n/)
-      .filter((line) => line.startsWith('data:'))
-      .map((line) => line.slice(5).replace(/^ /, ''))
-      .join('\n');
-    if (!data || data === '[DONE]') continue;
-    try {
-      return JSON.parse(data) as McpResponse;
-    } catch {
-      // try the previous event
+  while (job.status === 'queued' || job.status === 'running') {
+    if (Date.now() > deadline) {
+      throw new Error(`word-api job ${job.id} poll deadline exceeded (${job.status})`);
     }
+    await new Promise((r) => setTimeout(r, POLL_INTERVAL_MS));
+    const pollRes = await fetch(`${root}/v1/jobs/${job.id}`, { headers: authHeaders });
+    if (!pollRes.ok) {
+      const body = await pollRes.text().catch(() => '');
+      throw new Error(`word-api poll HTTP ${pollRes.status}: ${body.slice(0, 500)}`);
+    }
+    job = (await pollRes.json()) as JobEnvelope;
+  }
+
+  if (job.status !== 'succeeded') {
+    const code = job.error?.code ?? 'unknown';
+    const message = job.error?.message ?? 'no error message';
+    throw new Error(`word-api job ${job.id} ${job.status} (${code}): ${message}`);
   }
-  throw new Error(`word-mcp: no parseable SSE payload in:\n${stream.slice(0, 500)}`);
+  return job.result?.output ?? '';
 }
 
 function parseExtractionOutput(output: string): WordExtraction {
@@ -99,7 +86,7 @@ export async function extractWord(
   const b64 = docxBytes.toString('base64');
   const command = `$b64 = '${b64}'\n${psBody}`;
 
-  const output = await callWordMcp(command);
+  const output = await runPowerShell(command, 600);
   const extraction = parseExtractionOutput(output);
 
   if (useCache) await writeCache(docxSha, psSha, extraction);