frankbria · frankbria · May 31, 2026 · May 31, 2026 · May 31, 2026 · May 31, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -36,7 +36,7 @@ If you are an agent working in this repo: **do not improvise architecture**. Fol
 
 ### Current Focus: Phase 4A
 
-**Phase 5.4 is complete** — PRD stress-test web UI: trigger + streaming (#561). Backend: `GET /api/v2/prd/stress-test` SSE endpoint streams `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py:stress_test_prd_stream()`, resolving the LLM provider via the standard chain and applying the standard rate limit. Frontend: `useStressTestStream` hook manages the SSE connection and event accumulation; `StressTestModal` renders the streaming progress and is opened via a "Stress Test" button on the `/prd` page (enabled only when a PRD exists). Results rendering (#562) is out of scope and still pending.
+**Phase 5.4 is complete** — PRD stress-test web UI: trigger + streaming (#561). Backend: `GET /api/v2/prd/stress-test` SSE endpoint streams `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py:stress_test_prd_stream()`, resolving the LLM provider via the standard chain and applying the standard rate limit. Frontend: `useStressTestStream` hook manages the SSE connection and event accumulation; `StressTestModal` renders the streaming progress and is opened via a "Stress Test" button on the `/prd` page (enabled only when a PRD exists). Results rendering + refinement (#562) is **complete**: the `complete` SSE event now carries structured, severity-tagged `ambiguities` (`Ambiguity.severity` is `"blocking"`/`"warning"`); `StressTestModal` shows a results view of `AmbiguityCard`s (question text, severity badge, answer textarea) with an "X of Y answered" progress indicator and a **[Refine PRD]** button (disabled until every blocking ambiguity is answered). Refine posts to `POST /api/v2/prd/stress-test/refine`, which folds the answers into a new PRD version via `resolve_ambiguities_into_prd` (offloaded with `asyncio.to_thread`) and `prd.create_new_version`, then `mutatePrd` reflects it in the editor.
 
 **Phase 5.3 is complete** — Async notifications cover both surfaces:
 - **Browser + in-app center (#559)**: `useNotifications` hook with workspace-scoped `localStorage` persistence and browser Notification dispatch (only when tab hidden + permission granted); `NotificationProvider` in root layout; `NotificationCenter` (bell icon + dropdown) mounts in sidebar footer. `BatchExecutionMonitor` dispatches `batch.completed` on terminal status transitions (distinguishing COMPLETED/FAILED/CANCELLED in both the in-app message and the success icon) and `blocker.created` on per-task BLOCKED transitions. `/execution` requests browser permission once on mount when permission is `'default'`. `/proof` dispatches `gate.run.failed` per failed gate when a proof run completes with `passed === false`. Known limitation: notifications only fire while `BatchExecutionMonitor` is mounted (cross-page background poller is out of scope; tracked for future work).

diff --git a/codeframe/core/prd_stress_test.py b/codeframe/core/prd_stress_test.py
@@ -14,7 +14,7 @@
 import uuid
 from dataclasses import dataclass
 from enum import Enum
-from typing import AsyncGenerator, Optional
+from typing import AsyncGenerator, Literal, Optional
 
 from codeframe.adapters.llm.base import Purpose
 
@@ -52,6 +52,9 @@ class Ambiguity:
     source_node_title: str
     questions: list[str]
     recommendation: str
+    # "blocking" ambiguities must be answered before a PRD can be refined;
+    # "warning" ambiguities are advisory and skippable (issue #562).
+    severity: Literal["blocking", "warning"] = "blocking"
     resolved_answer: Optional[str] = None
 
 
@@ -94,9 +97,14 @@ class StressTestResult:
   "ambiguity_label": "SHORT LABEL",                       // only if ambiguous
   "questions": ["question 1", "question 2"],              // only if ambiguous
   "recommendation": "what to add to the PRD",             // only if ambiguous
+  "severity": "blocking" | "warning",                     // only if ambiguous
   "complexity_hint": "Low" | "Low-Medium" | "Medium" | "High"  // always
 }
 
+For "severity": use "blocking" when the missing information prevents \
+implementation and must be answered; use "warning" for advisory gaps that \
+have a reasonable default and can be skipped.
+
 Return ONLY valid JSON. No markdown wrapping."""
 
 AMBIGUITY_RESOLUTION_SYSTEM = (
@@ -183,12 +191,15 @@ def classify_and_decompose(
 
     ambiguity = None
     if cls == Classification.AMBIGUOUS:
+        raw_severity = str(data.get("severity", "blocking")).lower()
+        severity = raw_severity if raw_severity in ("blocking", "warning") else "blocking"
         ambiguity = Ambiguity(
             id=str(uuid.uuid4()),
             label=data.get("ambiguity_label", "UNSPECIFIED"),
             source_node_title=title,
             questions=data.get("questions", []),
             recommendation=data.get("recommendation", ""),
+            severity=severity,
         )
 
     return cls, children, ambiguity, complexity
@@ -321,6 +332,23 @@ def render_ambiguity_report(ambiguities: list[Ambiguity]) -> str:
     return "\n".join(lines)
 
 
+def ambiguity_to_dict(amb: Ambiguity) -> dict[str, object]:
+    """Serialize an :class:`Ambiguity` for SSE / JSON transport (issue #562).
+
+    Carries the structured fields the web results view needs to render an
+    answerable card: question text, severity, and the recommendation.
+    """
+    return {
+        "id": amb.id,
+        "label": amb.label,
+        "source_node_title": amb.source_node_title,
+        "questions": list(amb.questions),
+        "recommendation": amb.recommendation,
+        "severity": amb.severity,
+        "resolved_answer": amb.resolved_answer,
+    }
+
+
 def resolve_ambiguities_into_prd(
     prd_content: str,
     ambiguities: list[Ambiguity],
@@ -422,6 +450,7 @@ async def stress_test_prd_stream(
     - ``{"type": "goal_analyzed", "goal": str, "classification": str,
          "ambiguities_so_far": int}`` (once per top-level goal)
     - ``{"type": "complete", "ambiguity_count": int,
+         "ambiguities": [ambiguity_to_dict(...)],
          "tech_spec_markdown": str, "ambiguity_report": str}``
     - ``{"type": "error", "message": str}`` if decomposition raises
 
@@ -461,6 +490,7 @@ async def stress_test_prd_stream(
         yield {
             "type": "complete",
             "ambiguity_count": len(ambiguities),
+            "ambiguities": [ambiguity_to_dict(a) for a in ambiguities],
             "tech_spec_markdown": tech_spec,
             "ambiguity_report": amb_report,
         }

diff --git a/codeframe/ui/routers/prd_v2.py b/codeframe/ui/routers/prd_v2.py
@@ -14,14 +14,15 @@
     GET  /api/v2/prd/{id}/diff            - Diff two versions
 """
 
+import asyncio
 import json
 import logging
 import os
 from typing import AsyncGenerator, Optional
 
 from fastapi import APIRouter, Depends, HTTPException, Query, Request
 from fastapi.responses import StreamingResponse
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, field_validator
 
 from codeframe.core.workspace import Workspace
 from codeframe.lib.rate_limiter import rate_limit_standard
@@ -96,6 +97,39 @@ class PrdDiffResponse(BaseModel):
     diff: str
 
 
+class AmbiguityAnswer(BaseModel):
+    """A single answered ambiguity from the stress-test results view (#562)."""
+
+    label: str = Field(..., min_length=1, description="Short ambiguity label")
+    questions: list[str] = Field(
+        default_factory=list, description="The unanswered questions"
+    )
+    answer: str = Field(..., min_length=1, description="The user's answer")
+
+    @field_validator("answer")
+    @classmethod
+    def _answer_not_blank(cls, v: str) -> str:
+        # min_length alone admits whitespace-only answers from API callers;
+        # reject them so a blank string is never treated as resolved input.
+        if not v.strip():
+            raise ValueError("answer must not be blank")
+        return v
+
+
+class StressTestRefineRequest(BaseModel):
+    """Request to refine a PRD from resolved stress-test ambiguities (#562).
+
+    Stateless: the client sends back the answered ambiguities' content (the
+    server does not persist stress-test runs), which are folded into the PRD
+    and saved as a new version.
+    """
+
+    prd_id: str = Field(..., description="ID of the PRD to refine")
+    answers: list[AmbiguityAnswer] = Field(
+        ..., min_length=1, description="Resolved ambiguities to fold into the PRD"
+    )
+
+
 # ============================================================================
 # Helper Functions
 # ============================================================================
@@ -194,33 +228,17 @@ def _sse(event: dict) -> str:
     return f"data: {json.dumps(event)}\n\n"
 
 
-async def _stress_test_event_stream(
-    workspace: Workspace,
-    max_depth: int,
-    request: Optional[Request] = None,
-) -> AsyncGenerator[str, None]:
-    """Yield SSE frames for a PRD stress-test.
+def _resolve_llm_provider(workspace: Workspace):
+    """Resolve the LLM provider for PRD stress-test web operations.
 
-    Recoverable problems (missing PRD, missing ``ANTHROPIC_API_KEY``) are
-    surfaced as in-stream ``error`` events rather than HTTP errors, so a
-    browser ``EventSource`` can display them via its message handler.
+    Follows the documented chain: env var → workspace config
+    (``.codeframe/config.yaml``) → default ``anthropic``. (No CLI flag here —
+    this is the web surface.) Mirrors ``runtime.py`` and the stress-test stream.
 
-    Stops early if the client disconnects, so an abandoned stream does not keep
-    issuing LLM calls — mirroring ``event_stream_generator`` in streaming_v2.
+    Raises:
+        ValueError: with a user-facing message when the Anthropic API key is
+            missing or the provider cannot be constructed.
     """
-    from codeframe.core.prd_stress_test import stress_test_prd_stream
-
-    record = prd.get_latest(workspace)
-    if not record:
-        yield _sse({
-            "type": "error",
-            "message": "No PRD found. Add or generate a PRD first.",
-        })
-        return
-
-    # Resolve the LLM provider following the documented chain:
-    # env var → workspace config (.codeframe/config.yaml) → default "anthropic".
-    # (No CLI flag here — this is the web surface.) Mirrors runtime.py.
     from codeframe.adapters.llm import get_provider
     from codeframe.core.config import load_environment_config
 
@@ -235,11 +253,7 @@ async def _stress_test_event_stream(
     # Only the Anthropic provider needs an API key up front; local providers
     # (ollama/vllm/compatible) do not.
     if provider_type == "anthropic" and not os.getenv("ANTHROPIC_API_KEY"):
-        yield _sse({
-            "type": "error",
-            "message": "ANTHROPIC_API_KEY environment variable required.",
-        })
-        return
+        raise ValueError("ANTHROPIC_API_KEY environment variable required.")
 
     provider_kwargs: dict = {}
     model_override = os.getenv("CODEFRAME_LLM_MODEL") or (
@@ -253,8 +267,38 @@ async def _stress_test_event_stream(
     if base_url_override:
         provider_kwargs["base_url"] = base_url_override
 
+    return get_provider(provider_type, **provider_kwargs)
+
+
+async def _stress_test_event_stream(
+    workspace: Workspace,
+    max_depth: int,
+    request: Optional[Request] = None,
+) -> AsyncGenerator[str, None]:
+    """Yield SSE frames for a PRD stress-test.
+
+    Recoverable problems (missing PRD, missing ``ANTHROPIC_API_KEY``) are
+    surfaced as in-stream ``error`` events rather than HTTP errors, so a
+    browser ``EventSource`` can display them via its message handler.
+
+    Stops early if the client disconnects, so an abandoned stream does not keep
+    issuing LLM calls — mirroring ``event_stream_generator`` in streaming_v2.
+    """
+    from codeframe.core.prd_stress_test import stress_test_prd_stream
+
+    record = prd.get_latest(workspace)
+    if not record:
+        yield _sse({
+            "type": "error",
+            "message": "No PRD found. Add or generate a PRD first.",
+        })
+        return
+
+    # Resolve the LLM provider following the documented chain (shared with the
+    # refine endpoint). Recoverable problems become in-stream error events so a
+    # browser EventSource can display them.
     try:
-        provider = get_provider(provider_type, **provider_kwargs)
+        provider = _resolve_llm_provider(workspace)
     except ValueError as exc:
         yield _sse({"type": "error", "message": str(exc)})
         return
@@ -305,6 +349,114 @@ async def stress_test_prd_stream_endpoint(
     )
 
 
+# NOTE: registered before the "/{prd_id}" catch-all so FastAPI does not match
+# "stress-test/refine" as a PRD id.
+@router.post("/stress-test/refine", response_model=PrdResponse)
+@rate_limit_standard()
+async def refine_prd_from_stress_test(
+    request: Request,
+    body: StressTestRefineRequest,
+    workspace: Workspace = Depends(get_v2_workspace),
+) -> PrdResponse:
+    """Refine a PRD by folding in answered stress-test ambiguities (#562).
+
+    Reconstructs :class:`Ambiguity` objects from the submitted answers, calls
+    the headless ``resolve_ambiguities_into_prd`` to rewrite the PRD via the
+    LLM, then persists the result as a new PRD version. Returns the new version.
+    """
+    from codeframe.core.prd_stress_test import (
+        Ambiguity,
+        resolve_ambiguities_into_prd,
+    )
+
+    record = prd.get_by_id(workspace, body.prd_id)
+    if not record:
+        raise HTTPException(
+            status_code=404,
+            detail=api_error(
+                "PRD not found", ErrorCodes.NOT_FOUND, f"No PRD with id {body.prd_id}"
+            ),
+        )
+
+    try:
+        provider = _resolve_llm_provider(workspace)
+    except ValueError as exc:
+        # The request is well-formed; the server lacks LLM configuration
+        # (missing API key or unknown provider) → 503, not 400.
+        raise HTTPException(
+            status_code=503,
+            detail=api_error(
+                "LLM provider unavailable",
+                ErrorCodes.SERVICE_UNAVAILABLE,
+                str(exc),
+            ),
+        )
+
+    # resolve_ambiguities_into_prd only reads label, questions, and
+    # resolved_answer, so source_node_title/recommendation are intentionally
+    # left empty here (the client does not need to round-trip them).
+    ambiguities = [
+        Ambiguity(
+            id=str(i),
+            label=ans.label,
+            source_node_title="",
+            questions=list(ans.questions),
+            recommendation="",
+            resolved_answer=ans.answer,
+        )
+        for i, ans in enumerate(body.answers)
+    ]
+
+    try:
+        # resolve_ambiguities_into_prd makes a synchronous, blocking LLM call;
+        # offload it to a thread so it does not stall the event loop (mirrors
+        # stress_test_prd_stream's asyncio.to_thread usage).
+        refined_content = await asyncio.to_thread(
+            resolve_ambiguities_into_prd, record.content, ambiguities, provider
+        )
+        # resolve_ambiguities_into_prd returns the original content unchanged
+        # when the LLM rewrite looks truncated. Surface that as an error rather
+        # than recording a no-op duplicate version under a "success" toast.
+        if refined_content == record.content:
+            raise HTTPException(
+                status_code=502,
+                detail=api_error(
+                    "PRD refinement produced no changes",
+                    ErrorCodes.EXECUTION_FAILED,
+                    "The model returned no usable changes (its output may have "
+                    "been truncated). Please try again.",
+                ),
+            )
+        new_record = prd.create_new_version(
+            workspace,
+            parent_prd_id=body.prd_id,
+            new_content=refined_content,
+            change_summary="Refined via stress-test ambiguity resolution",
+        )
+        if not new_record:
+            # get_by_id already confirmed the PRD exists, so a None here is a
+            # persistence fault, not a missing resource → 500, not 404.
+            raise HTTPException(
+                status_code=500,
+                detail=api_error(
+                    "Failed to persist new PRD version",
+                    ErrorCodes.INTERNAL_ERROR,
+                    f"create_new_version returned no record for {body.prd_id}",
+                ),
+            )
+        return _prd_to_response(new_record)
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Failed to refine PRD: {e}", exc_info=True)
+        raise HTTPException(
+            status_code=500,
+            detail=api_error(
+                "Failed to refine PRD", ErrorCodes.EXECUTION_FAILED, str(e)
+            ),
+        )
+
+
 @router.get("/{prd_id}", response_model=PrdResponse)
 @rate_limit_standard()
 async def get_prd(

diff --git a/docs/PRODUCT_ROADMAP.md b/docs/PRODUCT_ROADMAP.md
@@ -147,13 +147,14 @@ Without a settings page, a new user who cannot find the env vars cannot use the
 
 ### 4. PRD Stress-Test Web UI
 
-**Current state**: Phase 5.4 trigger + streaming shipped (#561). The `/prd` page now has a "Stress Test" button (enabled only when a PRD exists) that opens `StressTestModal`. The modal connects via `useStressTestStream` to `GET /api/v2/prd/stress-test` (SSE), which streams `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py`. Results rendering — displaying the decomposition tree, surfacing ambiguities as answerable questions, feeding answers back to refine the PRD — is tracked in #562 and is not yet built.
+**Current state**: Phase 5.4 is **fully shipped** (trigger + streaming #561; results view + refinement #562). The `/prd` page's "Stress Test" button opens `StressTestModal`, which connects via `useStressTestStream` to `GET /api/v2/prd/stress-test` (SSE) streaming `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py`. The `complete` event carries structured, severity-tagged `ambiguities`, which the modal renders as a results view.
 
-**What remains (#562)**:
+**Shipped in #562**:
 
-- A **results view** showing the decomposition tree with ambiguities surfaced as questions, styled similarly to the existing Discovery transcript
-- Each ambiguity has an inline answer field — the user's answers are fed back to refine the PRD
-- On completion: the refined PRD is saved and the user can proceed to task generation
+- A **results view** of `AmbiguityCard`s — each shows the question text, a severity badge (`blocking`/`warning`), and an inline answer textarea, with an "X of Y answered" progress indicator
+- A **[Refine PRD]** button, disabled until every blocking ambiguity is answered, that posts answers to `POST /api/v2/prd/stress-test/refine`
+- The refine endpoint folds answers into the PRD via `resolve_ambiguities_into_prd` and persists a new version (`prd.create_new_version`); the editor updates via `mutatePrd`, ready for task generation
+- Out of scope (per acceptance criteria): full collapsible decomposition-tree visualization (the streaming log already surfaces the goal breakdown)
 
 **Why it matters for the vision**: "Gaps discovered at planning time, not execution time." The stress-test is the mechanism that makes requirements specific enough for agents to execute correctly. Without it in the web UI, the web-first user skips the most valuable part of the THINK phase.
 
@@ -203,7 +204,7 @@ These are items that were considered and excluded because they do not serve the
 | 5.1 | Settings page (skeleton + agent config + PROOF9/workspace tabs) | ✅ Complete | #554–556 |
 | 5.2 | Cost analytics | ✅ Complete | #557–558 |
 | 5.3 | Async notifications | ✅ Complete (browser + in-app center #559, webhook #560) | #559–560 |
-| 5.4 | PRD stress-test web UI | ✅ Complete (trigger + streaming #561; results rendering #562 pending) | #561–562 |
+| 5.4 | PRD stress-test web UI | ✅ Complete (trigger + streaming #561; results view + refinement #562) | #561–562 |
 | 5.5 | GitHub Issues import | ❌ Not started | #563–565 |
 
 **Current focus**: Phase 4A — PR status tracking + PROOF9 merge gate.