Skip to content

Commit 79fa686

Browse files
authored
feat(prd): stress-test web UI — trigger + streaming progress (#561)
## Summary - Core: `stress_test_prd_stream()` async generator yields goals_extracted / goal_analyzed / complete / error events (provider.complete offloaded via asyncio.to_thread; headless). - Backend: `GET /api/v2/prd/stress-test` SSE endpoint (rate-limited; resolves LLM provider via the standard chain; missing PRD/key + client disconnect handled). - Frontend: `useStressTestStream` hook, `StressTestModal`, and a "Stress Test" button on `/prd` (enabled only when a PRD exists). ## Validation - Tests: 909 frontend + backend stress-test/core suites passing (local + CI) - Lint: ruff + mypy clean - Cross-family review: codex (primary) — 2 P2s fixed; CodeRabbit + claude — all findings fixed - Demo: all acceptance criteria verified with live outcome evidence (screenshots in /tmp/demo-561-*.png) Closes #561
1 parent 38007da commit 79fa686

21 files changed

Lines changed: 1338 additions & 7 deletions

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ If you are an agent working in this repo: **do not improvise architecture**. Fol
3636

3737
### Current Focus: Phase 4A
3838

39+
**Phase 5.4 is complete** — PRD stress-test web UI: trigger + streaming (#561). Backend: `GET /api/v2/prd/stress-test` SSE endpoint streams `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py:stress_test_prd_stream()`, resolving the LLM provider via the standard chain and applying the standard rate limit. Frontend: `useStressTestStream` hook manages the SSE connection and event accumulation; `StressTestModal` renders the streaming progress and is opened via a "Stress Test" button on the `/prd` page (enabled only when a PRD exists). Results rendering (#562) is out of scope and still pending.
40+
3941
**Phase 5.3 is complete** — Async notifications cover both surfaces:
4042
- **Browser + in-app center (#559)**: `useNotifications` hook with workspace-scoped `localStorage` persistence and browser Notification dispatch (only when tab hidden + permission granted); `NotificationProvider` in root layout; `NotificationCenter` (bell icon + dropdown) mounts in sidebar footer. `BatchExecutionMonitor` dispatches `batch.completed` on terminal status transitions (distinguishing COMPLETED/FAILED/CANCELLED in both the in-app message and the success icon) and `blocker.created` on per-task BLOCKED transitions. `/execution` requests browser permission once on mount when permission is `'default'`. `/proof` dispatches `gate.run.failed` per failed gate when a proof run completes with `passed === false`. Known limitation: notifications only fire while `BatchExecutionMonitor` is mounted (cross-page background poller is out of scope; tracked for future work).
4143
- **Outbound webhook (#560)**: Settings → Notifications tab takes a single URL + enabled toggle, persisted to `.codeframe/notifications_config.json` via `atomic_write_json`. `GET/PUT /api/v2/settings/notifications` and `POST /api/v2/settings/notifications/test` (test fires a sample payload and surfaces status code). `WebhookNotificationService.send_event` is the generic backend; dispatched fire-and-forget (5s timeout) from `core/conductor.py` on `BATCH_COMPLETED` only (not PARTIAL/FAILED/CANCELLED), `core/blockers.py:create()` after `BLOCKER_CREATED`, and `ui/routers/pr_v2.py:merge_pull_request` after successful merge. Failures are logged but never break the triggering operation.

codeframe/core/prd_stress_test.py

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,13 @@
88
This module is headless — no FastAPI or HTTP dependencies.
99
"""
1010

11+
import asyncio
1112
import json
1213
import logging
1314
import uuid
1415
from dataclasses import dataclass
1516
from enum import Enum
16-
from typing import Optional
17+
from typing import AsyncGenerator, Optional
1718

1819
from codeframe.adapters.llm.base import Purpose
1920

@@ -407,3 +408,62 @@ def stress_test_prd(
407408
tech_spec_markdown=tech_spec,
408409
ambiguity_report=amb_report,
409410
)
411+
412+
413+
async def stress_test_prd_stream(
414+
prd_content: str, provider, max_depth: int = 3
415+
) -> AsyncGenerator[dict, None]:
416+
"""Async streaming variant of :func:`stress_test_prd`.
417+
418+
Yields progress event dicts suitable for SSE delivery as each top-level
419+
goal is decomposed, so a UI can render incremental output:
420+
421+
- ``{"type": "goals_extracted", "goals": [...]}``
422+
- ``{"type": "goal_analyzed", "goal": str, "classification": str,
423+
"ambiguities_so_far": int}`` (once per top-level goal)
424+
- ``{"type": "complete", "ambiguity_count": int,
425+
"tech_spec_markdown": str, "ambiguity_report": str}``
426+
- ``{"type": "error", "message": str}`` if decomposition raises
427+
428+
The underlying ``provider.complete()`` calls are synchronous and blocking,
429+
so each is offloaded via :func:`asyncio.to_thread` to keep the event loop
430+
responsive. This function stays headless (no FastAPI/HTTP imports).
431+
"""
432+
try:
433+
goals = await asyncio.to_thread(extract_goals, prd_content, provider)
434+
yield {"type": "goals_extracted", "goals": goals}
435+
436+
ambiguities: list[Ambiguity] = []
437+
tree: list[DecompositionNode] = []
438+
439+
for goal in goals:
440+
node = await asyncio.to_thread(
441+
recursive_decompose,
442+
goal, # title
443+
goal, # description
444+
[], # lineage
445+
prd_content,
446+
0, # depth
447+
max_depth,
448+
ambiguities,
449+
provider,
450+
)
451+
tree.append(node)
452+
yield {
453+
"type": "goal_analyzed",
454+
"goal": node.title,
455+
"classification": node.classification.value,
456+
"ambiguities_so_far": len(ambiguities),
457+
}
458+
459+
tech_spec = render_tech_spec(tree, ambiguities)
460+
amb_report = render_ambiguity_report(ambiguities)
461+
yield {
462+
"type": "complete",
463+
"ambiguity_count": len(ambiguities),
464+
"tech_spec_markdown": tech_spec,
465+
"ambiguity_report": amb_report,
466+
}
467+
except Exception as exc: # noqa: BLE001 — surface any failure to the client
468+
logger.warning("Stress test stream failed: %s", exc, exc_info=True)
469+
yield {"type": "error", "message": str(exc)}

codeframe/ui/routers/prd_v2.py

Lines changed: 120 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,13 @@
1414
GET /api/v2/prd/{id}/diff - Diff two versions
1515
"""
1616

17+
import json
1718
import logging
18-
from typing import Optional
19+
import os
20+
from typing import AsyncGenerator, Optional
1921

2022
from fastapi import APIRouter, Depends, HTTPException, Query, Request
23+
from fastapi.responses import StreamingResponse
2124
from pydantic import BaseModel, Field
2225

2326
from codeframe.core.workspace import Workspace
@@ -186,6 +189,122 @@ async def get_latest_prd(
186189
return _prd_to_response(record)
187190

188191

192+
def _sse(event: dict) -> str:
193+
"""Format a stress-test event dict as an SSE ``data:`` frame."""
194+
return f"data: {json.dumps(event)}\n\n"
195+
196+
197+
async def _stress_test_event_stream(
198+
workspace: Workspace,
199+
max_depth: int,
200+
request: Optional[Request] = None,
201+
) -> AsyncGenerator[str, None]:
202+
"""Yield SSE frames for a PRD stress-test.
203+
204+
Recoverable problems (missing PRD, missing ``ANTHROPIC_API_KEY``) are
205+
surfaced as in-stream ``error`` events rather than HTTP errors, so a
206+
browser ``EventSource`` can display them via its message handler.
207+
208+
Stops early if the client disconnects, so an abandoned stream does not keep
209+
issuing LLM calls — mirroring ``event_stream_generator`` in streaming_v2.
210+
"""
211+
from codeframe.core.prd_stress_test import stress_test_prd_stream
212+
213+
record = prd.get_latest(workspace)
214+
if not record:
215+
yield _sse({
216+
"type": "error",
217+
"message": "No PRD found. Add or generate a PRD first.",
218+
})
219+
return
220+
221+
# Resolve the LLM provider following the documented chain:
222+
# env var → workspace config (.codeframe/config.yaml) → default "anthropic".
223+
# (No CLI flag here — this is the web surface.) Mirrors runtime.py.
224+
from codeframe.adapters.llm import get_provider
225+
from codeframe.core.config import load_environment_config
226+
227+
env_cfg = load_environment_config(workspace.repo_path)
228+
llm_cfg = env_cfg.llm if (env_cfg and env_cfg.llm) else None
229+
provider_type = (
230+
os.getenv("CODEFRAME_LLM_PROVIDER")
231+
or (llm_cfg.provider if llm_cfg else None)
232+
or "anthropic"
233+
)
234+
235+
# Only the Anthropic provider needs an API key up front; local providers
236+
# (ollama/vllm/compatible) do not.
237+
if provider_type == "anthropic" and not os.getenv("ANTHROPIC_API_KEY"):
238+
yield _sse({
239+
"type": "error",
240+
"message": "ANTHROPIC_API_KEY environment variable required.",
241+
})
242+
return
243+
244+
provider_kwargs: dict = {}
245+
model_override = os.getenv("CODEFRAME_LLM_MODEL") or (
246+
llm_cfg.model if llm_cfg else None
247+
)
248+
base_url_override = (llm_cfg.base_url if llm_cfg else None) or os.getenv(
249+
"OPENAI_BASE_URL"
250+
)
251+
if model_override:
252+
provider_kwargs["model"] = model_override
253+
if base_url_override:
254+
provider_kwargs["base_url"] = base_url_override
255+
256+
try:
257+
provider = get_provider(provider_type, **provider_kwargs)
258+
except ValueError as exc:
259+
yield _sse({"type": "error", "message": str(exc)})
260+
return
261+
262+
async for event in stress_test_prd_stream(
263+
record.content, provider, max_depth=max_depth,
264+
):
265+
# If the browser has gone away, stop iterating the core generator so its
266+
# next (blocking, billable) LLM call is never made.
267+
if request is not None and await request.is_disconnected():
268+
logger.info("Client disconnected from stress-test stream; aborting")
269+
break
270+
yield _sse(event)
271+
272+
273+
@router.get("/stress-test")
274+
@rate_limit_standard()
275+
async def stress_test_prd_stream_endpoint(
276+
request: Request,
277+
max_depth: int = Query(3, ge=1, le=10, description="Maximum recursion depth"),
278+
workspace: Workspace = Depends(get_v2_workspace),
279+
) -> StreamingResponse:
280+
"""Stream a PRD stress-test (recursive decomposition) via SSE.
281+
282+
Runs the headless ``stress_test_prd_stream`` core generator over the
283+
latest PRD and emits its progress events as Server-Sent Events. This is
284+
the web equivalent of ``cf prd stress-test``.
285+
286+
Declared as GET (not POST) so it is reachable from a browser
287+
``EventSource``, matching ``GET /api/v2/tasks/{task_id}/stream``. No custom
288+
auth headers are required (cookie-based auth via ``withCredentials``).
289+
290+
Event payloads (JSON in the SSE ``data:`` field, ``type`` field):
291+
- ``goals_extracted``: high-level goals parsed from the PRD
292+
- ``goal_analyzed``: one per top-level goal (classification + running
293+
ambiguity count)
294+
- ``complete``: ambiguity count + rendered tech spec / ambiguity report
295+
- ``error``: no PRD, missing API key, or decomposition failure
296+
"""
297+
return StreamingResponse(
298+
_stress_test_event_stream(workspace, max_depth, request),
299+
media_type="text/event-stream",
300+
headers={
301+
"Cache-Control": "no-cache",
302+
"Connection": "keep-alive",
303+
"X-Accel-Buffering": "no",
304+
},
305+
)
306+
307+
189308
@router.get("/{prd_id}", response_model=PrdResponse)
190309
@rate_limit_standard()
191310
async def get_prd(

docs/PHASE_2_CLI_API_MAPPING.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,9 @@ Both end up with PRD records managed by `core.prd`.
3838
| `cf prd export` | `core.prd` | `export_to_file()` | (CLI-only) | - | N/A |
3939
| `cf prd versions` | `core.prd` | `get_versions()` | `/api/v2/prd/{id}/versions` | GET | ✅ Present |
4040
| `cf prd diff` | `core.prd` | `diff_versions()` | `/api/v2/prd/{id}/diff` | GET | ✅ Present |
41+
| `cf prd stress-test` | `core.prd_stress_test` | `stress_test_prd_stream()` | `/api/v2/prd/stress-test` | GET (SSE) | ✅ Present |
4142

42-
**Note:** Both Discovery workflow and PRD CRUD are now complete ✅.
43+
**Note:** Both Discovery workflow and PRD CRUD are now complete ✅. The stress-test SSE endpoint (#561) is present; web UI results rendering (#562) is pending.
4344

4445
### Task Commands
4546

docs/PRODUCT_ROADMAP.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -147,11 +147,10 @@ Without a settings page, a new user who cannot find the env vars cannot use the
147147

148148
### 4. PRD Stress-Test Web UI
149149

150-
**Current state**: The CLI has `cf prd stress-test` for recursive decomposition — it takes the PRD and surfaces ambiguities the agent cannot resolve without human input. This is described in the vision as a core part of the THINK phase. The web UI has no equivalent; users who work exclusively in the browser never see this step.
150+
**Current state**: Phase 5.4 trigger + streaming shipped (#561). The `/prd` page now has a "Stress Test" button (enabled only when a PRD exists) that opens `StressTestModal`. The modal connects via `useStressTestStream` to `GET /api/v2/prd/stress-test` (SSE), which streams `goals_extracted`, `goal_analyzed`, `complete`, and `error` events from `core/prd_stress_test.py`. Results rendering — displaying the decomposition tree, surfacing ambiguities as answerable questions, feeding answers back to refine the PRD — is tracked in #562 and is not yet built.
151151

152-
**What to build**:
152+
**What remains (#562)**:
153153

154-
- A **[Stress Test]** button on the PRD page that triggers the stress-test process
155154
- A **results view** showing the decomposition tree with ambiguities surfaced as questions, styled similarly to the existing Discovery transcript
156155
- Each ambiguity has an inline answer field — the user's answers are fed back to refine the PRD
157156
- On completion: the refined PRD is saved and the user can proceed to task generation
@@ -204,7 +203,7 @@ These are items that were considered and excluded because they do not serve the
204203
| 5.1 | Settings page (skeleton + agent config + PROOF9/workspace tabs) | ✅ Complete | #554–556 |
205204
| 5.2 | Cost analytics | ✅ Complete | #557–558 |
206205
| 5.3 | Async notifications | ✅ Complete (browser + in-app center #559, webhook #560) | #559–560 |
207-
| 5.4 | PRD stress-test web UI | ❌ Not started | #561–562 |
206+
| 5.4 | PRD stress-test web UI | ✅ Complete (trigger + streaming #561; results rendering #562 pending) | #561–562 |
208207
| 5.5 | GitHub Issues import | ❌ Not started | #563–565 |
209208

210209
**Current focus**: Phase 4A — PR status tracking + PROOF9 merge gate.

tests/core/test_prd_stress_test.py

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,84 @@ def test_max_depth_respected(self, sample_prd, mock_provider):
367367
assert child.children == [] # No grandchildren at depth 1
368368

369369

370+
# --- Streaming Generator Tests ---
371+
372+
373+
class TestStressTestPrdStream:
374+
async def test_emits_event_sequence(self, sample_prd, mock_provider):
375+
from codeframe.core.prd_stress_test import stress_test_prd_stream
376+
377+
events = [
378+
ev async for ev in stress_test_prd_stream(
379+
sample_prd, mock_provider, max_depth=3,
380+
)
381+
]
382+
383+
types = [e["type"] for e in events]
384+
# First event announces extracted goals, last announces completion.
385+
assert types[0] == "goals_extracted"
386+
assert types[-1] == "complete"
387+
# One goal_analyzed per top-level goal (3 in the sample PRD).
388+
assert types.count("goal_analyzed") == 3
389+
390+
async def test_goals_extracted_payload(self, sample_prd, mock_provider):
391+
from codeframe.core.prd_stress_test import stress_test_prd_stream
392+
393+
events = [
394+
ev async for ev in stress_test_prd_stream(sample_prd, mock_provider)
395+
]
396+
goals_event = events[0]
397+
assert goals_event["goals"] == [
398+
"User Authentication",
399+
"Invoice Management",
400+
"PDF Export",
401+
]
402+
403+
async def test_goal_analyzed_carries_classification_and_running_count(
404+
self, sample_prd, mock_provider
405+
):
406+
from codeframe.core.prd_stress_test import stress_test_prd_stream
407+
408+
events = [
409+
ev async for ev in stress_test_prd_stream(sample_prd, mock_provider)
410+
]
411+
analyzed = [e for e in events if e["type"] == "goal_analyzed"]
412+
413+
auth = next(e for e in analyzed if e["goal"] == "User Authentication")
414+
assert auth["classification"] == "ambiguous"
415+
assert auth["ambiguities_so_far"] == 1
416+
417+
invoice = next(e for e in analyzed if e["goal"] == "Invoice Management")
418+
assert invoice["classification"] == "composite"
419+
420+
pdf = next(e for e in analyzed if e["goal"] == "PDF Export")
421+
assert pdf["classification"] == "atomic"
422+
423+
async def test_complete_payload(self, sample_prd, mock_provider):
424+
from codeframe.core.prd_stress_test import stress_test_prd_stream
425+
426+
events = [
427+
ev async for ev in stress_test_prd_stream(sample_prd, mock_provider)
428+
]
429+
complete = events[-1]
430+
assert complete["type"] == "complete"
431+
assert complete["ambiguity_count"] == 1
432+
assert "# Technical Specification" in complete["tech_spec_markdown"]
433+
assert "AUTH SCOPE" in complete["ambiguity_report"]
434+
435+
async def test_provider_failure_yields_error_event(self, sample_prd):
436+
from codeframe.core.prd_stress_test import stress_test_prd_stream
437+
438+
failing = MagicMock()
439+
failing.complete.side_effect = RuntimeError("LLM unavailable")
440+
441+
events = [
442+
ev async for ev in stress_test_prd_stream(sample_prd, failing)
443+
]
444+
assert events[-1]["type"] == "error"
445+
assert "LLM unavailable" in events[-1]["message"]
446+
447+
370448
# --- CLI Tests ---
371449

372450

0 commit comments

Comments
 (0)