Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ If you are an agent working in this repo: **do not improvise architecture**. Fol

### Current Focus: Phase 4A

**Phase 5.4 is complete** — PRD stress-test web UI: trigger + streaming (#561). Backend: `GET /api/v2/prd/stress-test` SSE endpoint streams `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py:stress_test_prd_stream()`, resolving the LLM provider via the standard chain and applying the standard rate limit. Frontend: `useStressTestStream` hook manages the SSE connection and event accumulation; `StressTestModal` renders the streaming progress and is opened via a "Stress Test" button on the `/prd` page (enabled only when a PRD exists). Results rendering (#562) is out of scope and still pending.
**Phase 5.4 is complete** — PRD stress-test web UI: trigger + streaming (#561). Backend: `GET /api/v2/prd/stress-test` SSE endpoint streams `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py:stress_test_prd_stream()`, resolving the LLM provider via the standard chain and applying the standard rate limit. Frontend: `useStressTestStream` hook manages the SSE connection and event accumulation; `StressTestModal` renders the streaming progress and is opened via a "Stress Test" button on the `/prd` page (enabled only when a PRD exists). Results rendering + refinement (#562) is **complete**: the `complete` SSE event now carries structured, severity-tagged `ambiguities` (`Ambiguity.severity` is `"blocking"`/`"warning"`); `StressTestModal` shows a results view of `AmbiguityCard`s (question text, severity badge, answer textarea) with an "X of Y answered" progress indicator and a **[Refine PRD]** button (disabled until every blocking ambiguity is answered). Refine posts to `POST /api/v2/prd/stress-test/refine`, which folds the answers into a new PRD version via `resolve_ambiguities_into_prd` (offloaded with `asyncio.to_thread`) and `prd.create_new_version`, then `mutatePrd` reflects it in the editor.

**Phase 5.3 is complete** — Async notifications cover both surfaces:
- **Browser + in-app center (#559)**: `useNotifications` hook with workspace-scoped `localStorage` persistence and browser Notification dispatch (only when tab hidden + permission granted); `NotificationProvider` in root layout; `NotificationCenter` (bell icon + dropdown) mounts in sidebar footer. `BatchExecutionMonitor` dispatches `batch.completed` on terminal status transitions (distinguishing COMPLETED/FAILED/CANCELLED in both the in-app message and the success icon) and `blocker.created` on per-task BLOCKED transitions. `/execution` requests browser permission once on mount when permission is `'default'`. `/proof` dispatches `gate.run.failed` per failed gate when a proof run completes with `passed === false`. Known limitation: notifications only fire while `BatchExecutionMonitor` is mounted (cross-page background poller is out of scope; tracked for future work).
Expand Down
32 changes: 31 additions & 1 deletion codeframe/core/prd_stress_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import uuid
from dataclasses import dataclass
from enum import Enum
from typing import AsyncGenerator, Optional
from typing import AsyncGenerator, Literal, Optional

from codeframe.adapters.llm.base import Purpose

Expand Down Expand Up @@ -52,6 +52,9 @@ class Ambiguity:
source_node_title: str
questions: list[str]
recommendation: str
# "blocking" ambiguities must be answered before a PRD can be refined;
# "warning" ambiguities are advisory and skippable (issue #562).
severity: Literal["blocking", "warning"] = "blocking"
resolved_answer: Optional[str] = None


Expand Down Expand Up @@ -94,9 +97,14 @@ class StressTestResult:
"ambiguity_label": "SHORT LABEL", // only if ambiguous
"questions": ["question 1", "question 2"], // only if ambiguous
"recommendation": "what to add to the PRD", // only if ambiguous
"severity": "blocking" | "warning", // only if ambiguous
"complexity_hint": "Low" | "Low-Medium" | "Medium" | "High" // always
}

For "severity": use "blocking" when the missing information prevents \
implementation and must be answered; use "warning" for advisory gaps that \
have a reasonable default and can be skipped.

Return ONLY valid JSON. No markdown wrapping."""

AMBIGUITY_RESOLUTION_SYSTEM = (
Expand Down Expand Up @@ -183,12 +191,15 @@ def classify_and_decompose(

ambiguity = None
if cls == Classification.AMBIGUOUS:
raw_severity = str(data.get("severity", "blocking")).lower()
severity = raw_severity if raw_severity in ("blocking", "warning") else "blocking"
ambiguity = Ambiguity(
id=str(uuid.uuid4()),
label=data.get("ambiguity_label", "UNSPECIFIED"),
source_node_title=title,
questions=data.get("questions", []),
recommendation=data.get("recommendation", ""),
severity=severity,
)

return cls, children, ambiguity, complexity
Expand Down Expand Up @@ -321,6 +332,23 @@ def render_ambiguity_report(ambiguities: list[Ambiguity]) -> str:
return "\n".join(lines)


def ambiguity_to_dict(amb: Ambiguity) -> dict[str, object]:
"""Serialize an :class:`Ambiguity` for SSE / JSON transport (issue #562).

Carries the structured fields the web results view needs to render an
answerable card: question text, severity, and the recommendation.
"""
return {
"id": amb.id,
"label": amb.label,
"source_node_title": amb.source_node_title,
"questions": list(amb.questions),
"recommendation": amb.recommendation,
"severity": amb.severity,
"resolved_answer": amb.resolved_answer,
}


def resolve_ambiguities_into_prd(
prd_content: str,
ambiguities: list[Ambiguity],
Expand Down Expand Up @@ -422,6 +450,7 @@ async def stress_test_prd_stream(
- ``{"type": "goal_analyzed", "goal": str, "classification": str,
"ambiguities_so_far": int}`` (once per top-level goal)
- ``{"type": "complete", "ambiguity_count": int,
"ambiguities": [ambiguity_to_dict(...)],
"tech_spec_markdown": str, "ambiguity_report": str}``
- ``{"type": "error", "message": str}`` if decomposition raises

Expand Down Expand Up @@ -461,6 +490,7 @@ async def stress_test_prd_stream(
yield {
"type": "complete",
"ambiguity_count": len(ambiguities),
"ambiguities": [ambiguity_to_dict(a) for a in ambiguities],
"tech_spec_markdown": tech_spec,
"ambiguity_report": amb_report,
}
Expand Down
214 changes: 183 additions & 31 deletions codeframe/ui/routers/prd_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,15 @@
GET /api/v2/prd/{id}/diff - Diff two versions
"""

import asyncio
import json
import logging
import os
from typing import AsyncGenerator, Optional

from fastapi import APIRouter, Depends, HTTPException, Query, Request
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, field_validator

from codeframe.core.workspace import Workspace
from codeframe.lib.rate_limiter import rate_limit_standard
Expand Down Expand Up @@ -96,6 +97,39 @@ class PrdDiffResponse(BaseModel):
diff: str


class AmbiguityAnswer(BaseModel):
"""A single answered ambiguity from the stress-test results view (#562)."""

label: str = Field(..., min_length=1, description="Short ambiguity label")
questions: list[str] = Field(
default_factory=list, description="The unanswered questions"
)
answer: str = Field(..., min_length=1, description="The user's answer")

@field_validator("answer")
@classmethod
def _answer_not_blank(cls, v: str) -> str:
# min_length alone admits whitespace-only answers from API callers;
# reject them so a blank string is never treated as resolved input.
if not v.strip():
raise ValueError("answer must not be blank")
return v


class StressTestRefineRequest(BaseModel):
"""Request to refine a PRD from resolved stress-test ambiguities (#562).

Stateless: the client sends back the answered ambiguities' content (the
server does not persist stress-test runs), which are folded into the PRD
and saved as a new version.
"""

prd_id: str = Field(..., description="ID of the PRD to refine")
answers: list[AmbiguityAnswer] = Field(
..., min_length=1, description="Resolved ambiguities to fold into the PRD"
)


# ============================================================================
# Helper Functions
# ============================================================================
Expand Down Expand Up @@ -194,33 +228,17 @@ def _sse(event: dict) -> str:
return f"data: {json.dumps(event)}\n\n"


async def _stress_test_event_stream(
workspace: Workspace,
max_depth: int,
request: Optional[Request] = None,
) -> AsyncGenerator[str, None]:
"""Yield SSE frames for a PRD stress-test.
def _resolve_llm_provider(workspace: Workspace):
"""Resolve the LLM provider for PRD stress-test web operations.

Recoverable problems (missing PRD, missing ``ANTHROPIC_API_KEY``) are
surfaced as in-stream ``error`` events rather than HTTP errors, so a
browser ``EventSource`` can display them via its message handler.
Follows the documented chain: env var → workspace config
(``.codeframe/config.yaml``) → default ``anthropic``. (No CLI flag here —
this is the web surface.) Mirrors ``runtime.py`` and the stress-test stream.

Stops early if the client disconnects, so an abandoned stream does not keep
issuing LLM calls — mirroring ``event_stream_generator`` in streaming_v2.
Raises:
ValueError: with a user-facing message when the Anthropic API key is
missing or the provider cannot be constructed.
"""
from codeframe.core.prd_stress_test import stress_test_prd_stream

record = prd.get_latest(workspace)
if not record:
yield _sse({
"type": "error",
"message": "No PRD found. Add or generate a PRD first.",
})
return

# Resolve the LLM provider following the documented chain:
# env var → workspace config (.codeframe/config.yaml) → default "anthropic".
# (No CLI flag here — this is the web surface.) Mirrors runtime.py.
from codeframe.adapters.llm import get_provider
from codeframe.core.config import load_environment_config

Expand All @@ -235,11 +253,7 @@ async def _stress_test_event_stream(
# Only the Anthropic provider needs an API key up front; local providers
# (ollama/vllm/compatible) do not.
if provider_type == "anthropic" and not os.getenv("ANTHROPIC_API_KEY"):
yield _sse({
"type": "error",
"message": "ANTHROPIC_API_KEY environment variable required.",
})
return
raise ValueError("ANTHROPIC_API_KEY environment variable required.")

provider_kwargs: dict = {}
model_override = os.getenv("CODEFRAME_LLM_MODEL") or (
Expand All @@ -253,8 +267,38 @@ async def _stress_test_event_stream(
if base_url_override:
provider_kwargs["base_url"] = base_url_override

return get_provider(provider_type, **provider_kwargs)


async def _stress_test_event_stream(
workspace: Workspace,
max_depth: int,
request: Optional[Request] = None,
) -> AsyncGenerator[str, None]:
"""Yield SSE frames for a PRD stress-test.

Recoverable problems (missing PRD, missing ``ANTHROPIC_API_KEY``) are
surfaced as in-stream ``error`` events rather than HTTP errors, so a
browser ``EventSource`` can display them via its message handler.

Stops early if the client disconnects, so an abandoned stream does not keep
issuing LLM calls — mirroring ``event_stream_generator`` in streaming_v2.
"""
from codeframe.core.prd_stress_test import stress_test_prd_stream

record = prd.get_latest(workspace)
if not record:
yield _sse({
"type": "error",
"message": "No PRD found. Add or generate a PRD first.",
})
return

# Resolve the LLM provider following the documented chain (shared with the
# refine endpoint). Recoverable problems become in-stream error events so a
# browser EventSource can display them.
try:
provider = get_provider(provider_type, **provider_kwargs)
provider = _resolve_llm_provider(workspace)
except ValueError as exc:
yield _sse({"type": "error", "message": str(exc)})
return
Expand Down Expand Up @@ -305,6 +349,114 @@ async def stress_test_prd_stream_endpoint(
)


# NOTE: registered before the "/{prd_id}" catch-all so FastAPI does not match
# "stress-test/refine" as a PRD id.
@router.post("/stress-test/refine", response_model=PrdResponse)
@rate_limit_standard()
async def refine_prd_from_stress_test(
request: Request,
body: StressTestRefineRequest,
workspace: Workspace = Depends(get_v2_workspace),
) -> PrdResponse:
"""Refine a PRD by folding in answered stress-test ambiguities (#562).

Reconstructs :class:`Ambiguity` objects from the submitted answers, calls
the headless ``resolve_ambiguities_into_prd`` to rewrite the PRD via the
LLM, then persists the result as a new PRD version. Returns the new version.
"""
from codeframe.core.prd_stress_test import (
Ambiguity,
resolve_ambiguities_into_prd,
)

record = prd.get_by_id(workspace, body.prd_id)
if not record:
raise HTTPException(
status_code=404,
detail=api_error(
"PRD not found", ErrorCodes.NOT_FOUND, f"No PRD with id {body.prd_id}"
),
)

try:
provider = _resolve_llm_provider(workspace)
except ValueError as exc:
# The request is well-formed; the server lacks LLM configuration
# (missing API key or unknown provider) → 503, not 400.
raise HTTPException(
status_code=503,
detail=api_error(
"LLM provider unavailable",
ErrorCodes.SERVICE_UNAVAILABLE,
str(exc),
),
)

# resolve_ambiguities_into_prd only reads label, questions, and
# resolved_answer, so source_node_title/recommendation are intentionally
# left empty here (the client does not need to round-trip them).
ambiguities = [
Ambiguity(
id=str(i),
label=ans.label,
source_node_title="",
questions=list(ans.questions),
recommendation="",
resolved_answer=ans.answer,
)
for i, ans in enumerate(body.answers)
]

try:
# resolve_ambiguities_into_prd makes a synchronous, blocking LLM call;
# offload it to a thread so it does not stall the event loop (mirrors
# stress_test_prd_stream's asyncio.to_thread usage).
refined_content = await asyncio.to_thread(
resolve_ambiguities_into_prd, record.content, ambiguities, provider
)
# resolve_ambiguities_into_prd returns the original content unchanged
# when the LLM rewrite looks truncated. Surface that as an error rather
# than recording a no-op duplicate version under a "success" toast.
if refined_content == record.content:
raise HTTPException(
status_code=502,
detail=api_error(
"PRD refinement produced no changes",
ErrorCodes.EXECUTION_FAILED,
"The model returned no usable changes (its output may have "
"been truncated). Please try again.",
),
)
new_record = prd.create_new_version(
workspace,
parent_prd_id=body.prd_id,
new_content=refined_content,
change_summary="Refined via stress-test ambiguity resolution",
)
if not new_record:
# get_by_id already confirmed the PRD exists, so a None here is a
# persistence fault, not a missing resource → 500, not 404.
raise HTTPException(
status_code=500,
detail=api_error(
"Failed to persist new PRD version",
ErrorCodes.INTERNAL_ERROR,
f"create_new_version returned no record for {body.prd_id}",
),
)
return _prd_to_response(new_record)
except HTTPException:
raise
except Exception as e:
logger.error(f"Failed to refine PRD: {e}", exc_info=True)
raise HTTPException(
status_code=500,
detail=api_error(
"Failed to refine PRD", ErrorCodes.EXECUTION_FAILED, str(e)
),
)


@router.get("/{prd_id}", response_model=PrdResponse)
@rate_limit_standard()
async def get_prd(
Expand Down
13 changes: 7 additions & 6 deletions docs/PRODUCT_ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,13 +147,14 @@ Without a settings page, a new user who cannot find the env vars cannot use the

### 4. PRD Stress-Test Web UI

**Current state**: Phase 5.4 trigger + streaming shipped (#561). The `/prd` page now has a "Stress Test" button (enabled only when a PRD exists) that opens `StressTestModal`. The modal connects via `useStressTestStream` to `GET /api/v2/prd/stress-test` (SSE), which streams `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py`. Results rendering — displaying the decomposition tree, surfacing ambiguities as answerable questions, feeding answers back to refine the PRD — is tracked in #562 and is not yet built.
**Current state**: Phase 5.4 is **fully shipped** (trigger + streaming #561; results view + refinement #562). The `/prd` page's "Stress Test" button opens `StressTestModal`, which connects via `useStressTestStream` to `GET /api/v2/prd/stress-test` (SSE) streaming `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py`. The `complete` event carries structured, severity-tagged `ambiguities`, which the modal renders as a results view.

**What remains (#562)**:
**Shipped in #562**:

- A **results view** showing the decomposition tree with ambiguities surfaced as questions, styled similarly to the existing Discovery transcript
- Each ambiguity has an inline answer field — the user's answers are fed back to refine the PRD
- On completion: the refined PRD is saved and the user can proceed to task generation
- A **results view** of `AmbiguityCard`s — each shows the question text, a severity badge (`blocking`/`warning`), and an inline answer textarea, with an "X of Y answered" progress indicator
- A **[Refine PRD]** button, disabled until every blocking ambiguity is answered, that posts answers to `POST /api/v2/prd/stress-test/refine`
- The refine endpoint folds answers into the PRD via `resolve_ambiguities_into_prd` and persists a new version (`prd.create_new_version`); the editor updates via `mutatePrd`, ready for task generation
- Out of scope (per acceptance criteria): full collapsible decomposition-tree visualization (the streaming log already surfaces the goal breakdown)

**Why it matters for the vision**: "Gaps discovered at planning time, not execution time." The stress-test is the mechanism that makes requirements specific enough for agents to execute correctly. Without it in the web UI, the web-first user skips the most valuable part of the THINK phase.

Expand Down Expand Up @@ -203,7 +204,7 @@ These are items that were considered and excluded because they do not serve the
| 5.1 | Settings page (skeleton + agent config + PROOF9/workspace tabs) | ✅ Complete | #554–556 |
| 5.2 | Cost analytics | ✅ Complete | #557–558 |
| 5.3 | Async notifications | ✅ Complete (browser + in-app center #559, webhook #560) | #559–560 |
| 5.4 | PRD stress-test web UI | ✅ Complete (trigger + streaming #561; results rendering #562 pending) | #561–562 |
| 5.4 | PRD stress-test web UI | ✅ Complete (trigger + streaming #561; results view + refinement #562) | #561–562 |
| 5.5 | GitHub Issues import | ❌ Not started | #563–565 |

**Current focus**: Phase 4A — PR status tracking + PROOF9 merge gate.
Expand Down
Loading
Loading