fix(decompose): address 4 Major review findings on PR #241

HomenShum · Sun-sunshine06 · commit 55e5cfc4b629 · 2026-06-06T18:53:55.000+08:00
1. Decompose loop success now triggers on first clean pass. The prompt previously required BOTH verifiers to return {verified, needs_review}, but the deterministic verifier only emits {ok, needs_iteration}. Forced an unnecessary extra iteration even when deterministic parity already passed. Fix: prompt now uses each verifier's actual vocabulary — success := deterministic.status === 'ok' && visual.status ∈ {verified, needs_review, unavailable} Updated both EN and ZH prompts in decomposePrompt.ts. 2. Visual verifier now actually has a source image at runtime. `verify_ui_kit_visual_parity({slug})` defaults to `source.png`, but `createRuntimeTextEditorFs` only seeded `index.html` + frames + skills from FRAME_TEMPLATES + DESIGN_SKILLS. Image attachments lived in `promptContext.attachments` but were never persisted to the agent's virtual FS. The visual judge silently degraded to `unavailable` on every normal run. Fix: `createRuntimeTextEditorFs` now accepts `sourceAttachments` and seeds `source.png` from the first image attachment's `imageDataUrl`. The runtime call site at runGenerate threads `input.attachments` through. 3. Judge/render failures now fall back to structured `unavailable`. `renderUiKit()` (Playwright) and `judgeVisualParity()` (vision LLM) were awaited without try/catch. Empty/non-JSON judge replies threw, text-only models threw, headless render crashes threw — all bubbled up and broke the agent loop instead of returning the documented `status: 'unavailable'` path. Fix: wrap both awaits in try/catch returning `unavailableReport()` with the underlying error message. Logged at info level for trace visibility. 4. Changeset no longer claims `Closes #225`. PR template says use `Closes` only for fully resolved issues. This diff stops at emitting a `ui_kits/<slug>/` handoff bundle and explicitly tells the agent NOT to continue into the prototype flow. Phase 2 (cross-page flows, state machines, prototype orchestration) is separate work. Fix: changeset now says `Refs #225 (Phase 1 of …)` and notes Phase 2 is tracked separately. Verification: - npx tsc --noEmit -p packages/core - npx tsc --noEmit -p apps/desktop - both clean (0 errors)
diff --git a/.changeset/decompose-to-ui-kit.md b/.changeset/decompose-to-ui-kit.md
@@ -6,4 +6,4 @@
 
 Add **Decompose to UI Kit** — one-click in the chat sidebar emits a `ui_kits/<slug>/` folder shaped for coding-agent handoff (`index.html` + `components/*.tsx` + `tokens.css` + `manifest.json` + `README.md`). Built-in deterministic + vision verifiers self-check parity using a 12-question boolean rubric (`parityScore = passCount / totalChecks`, no LLM-fabricated floats) and re-iterate on gaps. Per-decompose cost surfaces inline as a toast.
 
-Closes Phase 1 of #225.
+Refs #225 (Phase 1 of the requested image → componentization → prototype workflow). Phase 2 (cross-page flows, state machines, prototype orchestration) is tracked separately.
diff --git a/apps/desktop/src/main/ipc/runtime-fs.ts b/apps/desktop/src/main/ipc/runtime-fs.ts
@@ -99,6 +99,13 @@ function attachmentViewContent(attachment: AttachmentContext): string | null {
   ].join('\n');
 }
 
+function firstImageAttachmentDataUrl(attachments: ReadonlyArray<AttachmentContext>): string | null {
+  return (
+    attachments.find((attachment) => attachment.mediaType?.startsWith('image/') === true)
+      ?.imageDataUrl ?? null
+  );
+}
+
 export function createRuntimeTextEditorFs({
   db,
   generationId,
@@ -127,6 +134,11 @@ export function createRuntimeTextEditorFs({
     if (content === null) continue;
     fsMap.set(normalizeDesignFilePath(attachment.path), content);
   }
+  const sourceImageDataUrl = firstImageAttachmentDataUrl(attachments);
+  const currentSourcePng = fsMap.get('source.png');
+  if (sourceImageDataUrl !== null && currentSourcePng?.startsWith('data:') !== true) {
+    fsMap.set('source.png', sourceImageDataUrl);
+  }
   if (
     previousSource &&
     previousSource.trim().length > 0 &&
diff --git a/apps/desktop/src/renderer/src/hooks/decomposePrompt.ts b/apps/desktop/src/renderer/src/hooks/decomposePrompt.ts
@@ -43,9 +43,9 @@ export const DECOMPOSE_PROMPT_ZH = `把刚才那个设计拆成一个 ui_kits/<s
 5. 调 verify_ui_kit_visual_parity({slug}) 拿视觉判定 (vision LLM judge, 12 个 boolean check)
    - 如果返回 status="unavailable", host 没接 judge callback, 跳过这一步用 step 4 的结果做决定
    - 如果返回了, 看 checks[].passed + reason, 失败的 check 就是要修的点
-6. 综合两份 report:
-   - 两个都 status ∈ {verified, needs_review} (12/12 或 11/12 个 check 过): 直接调 done
-   - 任一为 needs_iteration / failed: 把两边的 gaps 合并去重 + 失败 check 的 reason 一起作为反馈, 重新调一次 decompose_to_ui_kit
+6. 综合两份 report (注意: 两个 verifier 的 status 词汇不同):
+   - 成功条件: deterministic.status === 'ok' 且 visual.status ∈ {verified, needs_review, unavailable} → 直接调 done
+   - 任一失败: deterministic.status === 'needs_iteration' 或 visual.status ∈ {needs_iteration, failed} → 把两边的 gaps 合并去重 + 失败 check 的 reason 一起作为反馈, 重新调一次 decompose_to_ui_kit
 7. 最多迭代两轮. 第二轮验证完不管 score 多少都调 done.
 8. done 的 summary 必须诚实写出:
    - 结构化 verifier 的 passCount/totalChecks + status
@@ -68,9 +68,9 @@ export const DECOMPOSE_PROMPT_EN = `Decompose the design you just produced into
 5. Call verify_ui_kit_visual_parity({slug}) — vision-LLM judge with the 12 standard boolean checks (layout / color / typography / content / components dimensions). Each check is yes/no with a reason. parityScore = passCount/12 (derived deterministically).
    - If it returns status="unavailable", the host hasn't injected the judge callback. Proceed with step 4's deterministic report alone.
    - If it returns successfully, read each checks[].passed + reason. Failed checks are the things to fix.
-6. Reconcile both reports:
-   - Both status ∈ {verified, needs_review} (12/12 or 11/12 checks passed): call done
-   - Either status === 'needs_iteration' or 'failed': merge + dedup gaps from both reports + the failed checks' reasons, re-call decompose_to_ui_kit addressing them
+6. Reconcile both reports (NOTE: the two verifiers use DIFFERENT status vocabularies):
+   - Success: deterministic.status === 'ok' AND visual.status ∈ {verified, needs_review, unavailable} → call done
+   - Iterate: deterministic.status === 'needs_iteration' OR visual.status ∈ {needs_iteration, failed} → merge + dedup gaps from both reports + the failed checks' reasons, re-call decompose_to_ui_kit addressing them
 7. Iterate at most TWICE. After the second verify, call done regardless of score.
 8. The done summary MUST honestly report:
    - deterministic verifier passCount/totalChecks + status
diff --git a/packages/core/src/tools/verify-ui-kit-visual-parity.ts b/packages/core/src/tools/verify-ui-kit-visual-parity.ts
@@ -308,11 +308,27 @@ export function makeVerifyUiKitVisualParityTool(
         mediaType: parseMediaType(sourceFile.content),
       };
 
-      logger.info('[verify_ui_kit_visual_parity] step=render', { slug: params.slug });
-      const candidateImg = await renderUiKit(decomposed.content, signal);
-
-      logger.info('[verify_ui_kit_visual_parity] step=judge', { slug: params.slug });
-      const judgeResult = await judgeVisualParity(sourceImg, candidateImg, signal);
+      // Render + judge are external best-effort calls (Playwright headless +
+      // vision-LLM). If either throws (text-only model, malformed JSON,
+      // headless render crash, abort), we degrade to `unavailable` instead
+      // of bubbling the error and breaking the agent loop. This matches the
+      // tool's documented contract — review fix #3 on PR #241.
+      let candidateImg: VisualParityImageRef;
+      let judgeResult: Awaited<ReturnType<typeof judgeVisualParity>>;
+      try {
+        logger.info('[verify_ui_kit_visual_parity] step=render', { slug: params.slug });
+        candidateImg = await renderUiKit(decomposed.content, signal);
+        logger.info('[verify_ui_kit_visual_parity] step=judge', { slug: params.slug });
+        judgeResult = await judgeVisualParity(sourceImg, candidateImg, signal);
+      } catch (error) {
+        const message = error instanceof Error ? error.message : String(error);
+        logger.info('[verify_ui_kit_visual_parity] step=unavailable', {
+          slug: params.slug,
+          reason: message,
+        });
+        const report = unavailableReport(`render or judge failed: ${message}`);
+        return { content: [{ type: 'text', text: report.summary }], details: report };
+      }
 
       const checks = normalizeChecks(judgeResult.checks ?? []);
       const passCount = checks.filter((c) => c.passed).length;

Original file line number	Diff line number	Diff line change
`@@ -6,4 +6,4 @@`
`6`	`6`
`7`	`7`	Add Decompose to UI Kit — one-click in the chat sidebar emits a `ui_kits/<slug>/` folder shaped for coding-agent handoff (`index.html` + `components/*.tsx` + `tokens.css` + `manifest.json` + `README.md`). Built-in deterministic + vision verifiers self-check parity using a 12-question boolean rubric (`parityScore = passCount / totalChecks`, no LLM-fabricated floats) and re-iterate on gaps. Per-decompose cost surfaces inline as a toast.
`8`	`8`
`9`		`-Closes Phase 1 of #225.`
	`9`	`+Refs #225 (Phase 1 of the requested image → componentization → prototype workflow). Phase 2 (cross-page flows, state machines, prototype orchestration) is tracked separately.`