fix(decompose): address 4 Major review findings on PR OpenCoworkAI#241

HomenShum · HomenShum · commit 01c1e0d7b27e · 2026-05-01T14:11:19.000-07:00
1. Decompose loop success now triggers on first clean pass. The prompt previously required BOTH verifiers to return {verified, needs_review}, but the deterministic verifier only emits {ok, needs_iteration}. Forced an unnecessary extra iteration even when deterministic parity already passed. Fix: prompt now uses each verifier's actual vocabulary — success := deterministic.status === 'ok' && visual.status ∈ {verified, needs_review, unavailable} Updated both EN and ZH prompts in decomposePrompt.ts. 2. Visual verifier now actually has a source image at runtime. `verify_ui_kit_visual_parity({slug})` defaults to `source.png`, but `createRuntimeTextEditorFs` only seeded `index.html` + frames + skills from FRAME_TEMPLATES + DESIGN_SKILLS. Image attachments lived in `promptContext.attachments` but were never persisted to the agent's virtual FS. The visual judge silently degraded to `unavailable` on every normal run. Fix: `createRuntimeTextEditorFs` now accepts `sourceAttachments` and seeds `source.png` from the first image attachment's `imageDataUrl`. The runtime call site at runGenerate threads `input.attachments` through. 3. Judge/render failures now fall back to structured `unavailable`. `renderUiKit()` (Playwright) and `judgeVisualParity()` (vision LLM) were awaited without try/catch. Empty/non-JSON judge replies threw, text-only models threw, headless render crashes threw — all bubbled up and broke the agent loop instead of returning the documented `status: 'unavailable'` path. Fix: wrap both awaits in try/catch returning `unavailableReport()` with the underlying error message. Logged at info level for trace visibility. 4. Changeset no longer claims `Closes OpenCoworkAI#225`. PR template says use `Closes` only for fully resolved issues. This diff stops at emitting a `ui_kits/<slug>/` handoff bundle and explicitly tells the agent NOT to continue into the prototype flow. Phase 2 (cross-page flows, state machines, prototype orchestration) is separate work. Fix: changeset now says `Refs OpenCoworkAI#225 (Phase 1 of …)` and notes Phase 2 is tracked separately. Verification: - npx tsc --noEmit -p packages/core - npx tsc --noEmit -p apps/desktop - both clean (0 errors)
diff --git a/.changeset/decompose-to-ui-kit.md b/.changeset/decompose-to-ui-kit.md
@@ -6,4 +6,4 @@
 
 Add **Decompose to UI Kit** — one-click in the chat sidebar emits a `ui_kits/<slug>/` folder shaped for coding-agent handoff (`index.html` + `components/*.tsx` + `tokens.css` + `manifest.json` + `README.md`). Built-in deterministic + vision verifiers self-check parity using a 12-question boolean rubric (`parityScore = passCount / totalChecks`, no LLM-fabricated floats) and re-iterate on gaps. Per-decompose cost surfaces inline as a toast.
 
-Closes Phase 1 of #225.
+Refs #225 (Phase 1 of the requested image → componentization → prototype workflow). Phase 2 (cross-page flows, state machines, prototype orchestration) is tracked separately.
diff --git a/apps/desktop/src/main/index.ts b/apps/desktop/src/main/index.ts
@@ -280,6 +280,15 @@ interface CreateRuntimeTextEditorFsOptions {
   previousHtml: string | null;
   sendEvent: (event: AgentStreamEvent) => void;
   logger: Pick<CoreLogger, 'error'>;
+  /**
+   * Image attachments from `preparePromptContext`. The first image (if any) is
+   * persisted into the agent's virtual FS as `source.png` so that
+   * `verify_ui_kit_visual_parity({slug})` can read it via its default
+   * `sourceImagePath`. Without this, the visual judge silently degrades to
+   * `status: 'unavailable'` even when the host has wired up the judge
+   * callback (review finding #2 on PR #241).
+   */
+  sourceAttachments?: ReadonlyArray<{ imageDataUrl?: string }>;
 }
 
 export function createRuntimeTextEditorFs({
@@ -289,6 +298,7 @@ export function createRuntimeTextEditorFs({
   previousHtml,
   sendEvent,
   logger,
+  sourceAttachments,
 }: CreateRuntimeTextEditorFsOptions) {
   const baseCtx = { designId: designId ?? '', generationId } as const;
   const fsMap = new Map<string, string>();
@@ -301,6 +311,13 @@ export function createRuntimeTextEditorFs({
   for (const [name, content] of DESIGN_SKILLS) {
     fsMap.set(`skills/${name}`, content);
   }
+  // Seed source.png from the first image attachment so the visual verifier
+  // can read it via its default `sourceImagePath: 'source.png'`. Stored as a
+  // data URL to match `verify_ui_kit_visual_parity`'s expected format.
+  const firstSourceImage = sourceAttachments?.find((a) => Boolean(a.imageDataUrl));
+  if (firstSourceImage?.imageDataUrl) {
+    fsMap.set('source.png', firstSourceImage.imageDataUrl);
+  }
 
   function emitFsUpdated(filePath: string, content: string): void {
     if (designId === null) return;
@@ -510,6 +527,9 @@ function registerIpcHandlers(db: Database | null): void {
       logger: logIpc,
       previousHtml,
       sendEvent,
+      // Pipe image attachments through so `source.png` is seeded for
+      // verify_ui_kit_visual_parity (PR #241 review fix #2).
+      sourceAttachments: input.attachments,
     });
     const cfg = getCachedConfig();
     const imageConfig = cfg ? resolveImageGenerationConfig(cfg) : null;
diff --git a/apps/desktop/src/renderer/src/hooks/decomposePrompt.ts b/apps/desktop/src/renderer/src/hooks/decomposePrompt.ts
@@ -43,9 +43,9 @@ export const DECOMPOSE_PROMPT_ZH = `把刚才那个设计拆成一个 ui_kits/<s
 5. 调 verify_ui_kit_visual_parity({slug}) 拿视觉判定 (vision LLM judge, 12 个 boolean check)
    - 如果返回 status="unavailable", host 没接 judge callback, 跳过这一步用 step 4 的结果做决定
    - 如果返回了, 看 checks[].passed + reason, 失败的 check 就是要修的点
-6. 综合两份 report:
-   - 两个都 status ∈ {verified, needs_review} (12/12 或 11/12 个 check 过): 直接调 done
-   - 任一为 needs_iteration / failed: 把两边的 gaps 合并去重 + 失败 check 的 reason 一起作为反馈, 重新调一次 decompose_to_ui_kit
+6. 综合两份 report (注意: 两个 verifier 的 status 词汇不同):
+   - 成功条件: deterministic.status === 'ok' 且 visual.status ∈ {verified, needs_review, unavailable} → 直接调 done
+   - 任一失败: deterministic.status === 'needs_iteration' 或 visual.status ∈ {needs_iteration, failed} → 把两边的 gaps 合并去重 + 失败 check 的 reason 一起作为反馈, 重新调一次 decompose_to_ui_kit
 7. 最多迭代两轮. 第二轮验证完不管 score 多少都调 done.
 8. done 的 summary 必须诚实写出:
    - 结构化 verifier 的 passCount/totalChecks + status
@@ -68,9 +68,9 @@ export const DECOMPOSE_PROMPT_EN = `Decompose the design you just produced into
 5. Call verify_ui_kit_visual_parity({slug}) — vision-LLM judge with the 12 standard boolean checks (layout / color / typography / content / components dimensions). Each check is yes/no with a reason. parityScore = passCount/12 (derived deterministically).
    - If it returns status="unavailable", the host hasn't injected the judge callback. Proceed with step 4's deterministic report alone.
    - If it returns successfully, read each checks[].passed + reason. Failed checks are the things to fix.
-6. Reconcile both reports:
-   - Both status ∈ {verified, needs_review} (12/12 or 11/12 checks passed): call done
-   - Either status === 'needs_iteration' or 'failed': merge + dedup gaps from both reports + the failed checks' reasons, re-call decompose_to_ui_kit addressing them
+6. Reconcile both reports (NOTE: the two verifiers use DIFFERENT status vocabularies):
+   - Success: deterministic.status === 'ok' AND visual.status ∈ {verified, needs_review, unavailable} → call done
+   - Iterate: deterministic.status === 'needs_iteration' OR visual.status ∈ {needs_iteration, failed} → merge + dedup gaps from both reports + the failed checks' reasons, re-call decompose_to_ui_kit addressing them
 7. Iterate at most TWICE. After the second verify, call done regardless of score.
 8. The done summary MUST honestly report:
    - deterministic verifier passCount/totalChecks + status
diff --git a/packages/core/src/tools/verify-ui-kit-visual-parity.ts b/packages/core/src/tools/verify-ui-kit-visual-parity.ts
@@ -308,11 +308,27 @@ export function makeVerifyUiKitVisualParityTool(
         mediaType: parseMediaType(sourceFile.content),
       };
 
-      logger.info('[verify_ui_kit_visual_parity] step=render', { slug: params.slug });
-      const candidateImg = await renderUiKit(decomposed.content, signal);
-
-      logger.info('[verify_ui_kit_visual_parity] step=judge', { slug: params.slug });
-      const judgeResult = await judgeVisualParity(sourceImg, candidateImg, signal);
+      // Render + judge are external best-effort calls (Playwright headless +
+      // vision-LLM). If either throws (text-only model, malformed JSON,
+      // headless render crash, abort), we degrade to `unavailable` instead
+      // of bubbling the error and breaking the agent loop. This matches the
+      // tool's documented contract — review fix #3 on PR #241.
+      let candidateImg: VisualParityImageRef;
+      let judgeResult: Awaited<ReturnType<typeof judgeVisualParity>>;
+      try {
+        logger.info('[verify_ui_kit_visual_parity] step=render', { slug: params.slug });
+        candidateImg = await renderUiKit(decomposed.content, signal);
+        logger.info('[verify_ui_kit_visual_parity] step=judge', { slug: params.slug });
+        judgeResult = await judgeVisualParity(sourceImg, candidateImg, signal);
+      } catch (error) {
+        const message = error instanceof Error ? error.message : String(error);
+        logger.info('[verify_ui_kit_visual_parity] step=unavailable', {
+          slug: params.slug,
+          reason: message,
+        });
+        const report = unavailableReport(`render or judge failed: ${message}`);
+        return { content: [{ type: 'text', text: report.summary }], details: report };
+      }
 
       const checks = normalizeChecks(judgeResult.checks ?? []);
       const passCount = checks.filter((c) => c.passed).length;

Original file line number	Diff line number	Diff line change
`@@ -6,4 +6,4 @@`
`6`	`6`
`7`	`7`	Add Decompose to UI Kit — one-click in the chat sidebar emits a `ui_kits/<slug>/` folder shaped for coding-agent handoff (`index.html` + `components/*.tsx` + `tokens.css` + `manifest.json` + `README.md`). Built-in deterministic + vision verifiers self-check parity using a 12-question boolean rubric (`parityScore = passCount / totalChecks`, no LLM-fabricated floats) and re-iterate on gaps. Per-decompose cost surfaces inline as a toast.
`8`	`8`
`9`		`-Closes Phase 1 of #225.`
	`9`	`+Refs #225 (Phase 1 of the requested image → componentization → prototype workflow). Phase 2 (cross-page flows, state machines, prototype orchestration) is tracked separately.`