Skip to content

Commit 8fee290

Browse files
authored
feat(nexus): retry strategies, verify CLI, config threshold, work_proposal, stress bridge (WI-NEXUS-016..020) (#74)
Squash-merge of WI-NEXUS-016..020
1 parent 6f22464 commit 8fee290

13 files changed

Lines changed: 1866 additions & 7 deletions

File tree

.specsmith/ledger-chain.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,8 @@ d0e80ec48ee0a854345237c2fcb8f2ad112ff4f66dc7a6732926017501d85fb4
1818
7125182a6d402b2e8022fee66cc10950e5734d21e4c40cd1410e9aaca303f5a2
1919
16eb05be2f953074e4e4c47efb0fd37e9000777db8ddedceee2eca165cf4d925
2020
01f963eb2078b1815be7cccff4d92dedf3b242c302ceba840f0eb489fe7a4f4a
21+
0a696105c598f0cd195ef138a2b78ae1e8bc780f58cdea0d0e40d36751281226
22+
b6a7f5cccf5d0e7064503f161fe685d1108b5541bf2625679ed1a9529147e07b
23+
b0caf9452cdd3cd154ab6af5d2b8c950a3b8714a5dd9bf7cd54177810e238eac
24+
32c2742d1f5b332322b25038a5cff4b4e3c25437e3dd16afa8ed24387f6935bd
25+
1b5b01b80278aabd1ad5ba1599825a7dacd7b20e65ea27dccc98d0a55fdaa84d

.specsmith/requirements.json

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -663,5 +663,40 @@
663663
"description": "A live or honestly-skipped invocation of `scripts/nexus_smoke.py` must be captured under `.specsmith/runs/WI-NEXUS-011/logs.txt` so the project ledger preserves at least one reproducible record of the broker -> preflight -> orchestrator -> vLLM end-to-end path (or a documented reason the live container could not be reached in the current environment).",
664664
"source": "ARCHITECTURE.md",
665665
"status": "defined"
666+
},
667+
{
668+
"id": "REQ-096",
669+
"title": "Bounded-Retry Harness Must Map Failures to Retry Strategies",
670+
"description": "When `execute_with_governance` exhausts its retry budget (REQ-014), it must classify the last executor report against the canonical retry strategy mapping (REQ-028): `narrow_scope`, `expand_scope`, `fix_tests`, `rollback`, or `stop`. The classification must be exposed on `RunResult.strategy` and surfaced in the clarifying question (REQ-063) so the user gets one concrete next-action label rather than only a free-form sentence.",
671+
"source": "ARCHITECTURE.md",
672+
"status": "defined"
673+
},
674+
{
675+
"id": "REQ-097",
676+
"title": "specsmith verify CLI Subcommand",
677+
"description": "The Specsmith CLI must expose a `specsmith verify` subcommand that consumes the verification input contract (REQ-027): file diffs, test results, execution logs, and changed files (paths or `--stdin` JSON). The subcommand must emit a JSON object with at least `equilibrium`, `confidence`, `summary`, `files_changed`, `test_results`, and `retry_strategy`. Exit code 0 on equilibrium with confidence ≥ the configured threshold, 2 when retry is recommended, and 3 when stop-and-align is required.",
678+
"source": "ARCHITECTURE.md",
679+
"status": "defined"
680+
},
681+
{
682+
"id": "REQ-098",
683+
"title": "Confidence Threshold Must Be Read From .specsmith/config.yml",
684+
"description": "Both `specsmith preflight` and the broker's `run_preflight` helper must consult `.specsmith/config.yml` for the `epistemic.confidence_threshold` value (REQ-058) and use it as the floor for the JSON `confidence_target` field whenever it is greater than the heuristic default. When the config file is absent or unparseable, the existing heuristic defaults must continue to apply.",
685+
"source": ".specsmith/config.yml, ARCHITECTURE.md",
686+
"status": "defined"
687+
},
688+
{
689+
"id": "REQ-099",
690+
"title": "Accepted Preflight Must Record a Distinct work_proposal Event",
691+
"description": "When `specsmith preflight` produces an `accepted` decision and assigns a brand-new `work_item_id`, the CLI must append a `work_proposal` ledger event in addition to the existing `preflight` event (REQ-044). The `work_proposal` entry must reference REQ-044 and REQ-085, include the `work_item_id` and matched `requirement_ids`, and must NOT be emitted when the underlying `work_item_id` already appears in `LEDGER.md` (no duplicate proposals).",
692+
"source": "ARCHITECTURE.md",
693+
"status": "defined"
694+
},
695+
{
696+
"id": "REQ-100",
697+
"title": "Broker Scope Inference May Surface Stress-Test Critical Failures",
698+
"description": "When the user passes `--stress` to `specsmith preflight` and the matched requirements set is non-empty, the CLI must invoke the existing AEE `StressTester` against those belief artifacts and surface any critical failures in the JSON payload as a `stress_warnings` list. The narration (verbose mode) must include a one-sentence plain-English warning when at least one critical failure is found. The flag must default off so unrelated tests continue to pass.",
699+
"source": "ARCHITECTURE.md",
700+
"status": "defined"
666701
}
667702
]
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# feat(nexus): TaskResult, preflight exit codes, ledger event, /why post-run, smoke evidence
2+
3+
Tightens the **Nexus****Specsmith** contract that landed in PR #72. Five
4+
follow-up work items, all governed by Specsmith and verified by pytest.
5+
**Suite: 247 passing, 1 skipped (live l1-nexus integration test).**
6+
7+
## Work items in this PR
8+
9+
- **WI-NEXUS-011 (REQ-095)** — Captured live `l1-nexus` smoke evidence at
10+
`.specsmith/runs/WI-NEXUS-011/logs.txt`. The smoke script ran offline and
11+
returned a structured `ok=false` transport error; the log includes a
12+
reproducible note describing how to re-run it against a live container.
13+
- **WI-NEXUS-012 (REQ-091)**`orchestrator.run_task` now returns a
14+
`TaskResult` dataclass (`equilibrium`, `confidence`, `summary`,
15+
`files_changed`, `test_results`). The Nexus REPL's bounded-retry harness
16+
consumes it directly instead of synthesizing equilibrium from
17+
`bool(summary)`. Adds a tolerant parser for the existing Nexus output
18+
contract (Plan/Commands/Files changed/Diff/Test results/Next action).
19+
- **WI-NEXUS-013 (REQ-094)** — Nexus REPL emits a `[/why]` post-run
20+
governance block when `verbose_governance` is on, listing the assigned
21+
`work_item_id`, matched `requirement_ids`/`test_case_ids`, post-run
22+
`confidence`, and harness `equilibrium`.
23+
- **WI-NEXUS-014 (REQ-092)**`specsmith preflight` exits `0` for
24+
`accepted`, `2` for `needs_clarification`, and `3` for
25+
`blocked`/`rejected`. The JSON payload continues to print on stdout for
26+
every exit code so CI pipelines can branch on intent without re-parsing.
27+
- **WI-NEXUS-015 (REQ-093)** — Every accepted `specsmith preflight` invocation
28+
appends a `preflight` ledger event tagged with `REQ-085` plus the matched
29+
`requirement_ids`, recording the utterance, assigned `work_item_id`, and
30+
`confidence_target`. Non-accepted decisions never touch the ledger.
31+
32+
## Verification
33+
34+
- `py scripts/sync_governance_state.py` → 95 requirements / 95 test cases.
35+
- `py -m pytest -q`**247 passed, 1 skipped** (≈17s; the skip is the
36+
`NEXUS_LIVE=1`-gated integration test).
37+
- Smoke evidence: `.specsmith/runs/WI-NEXUS-011/logs.txt`.
38+
- Cumulative diff + final pytest log: `.specsmith/runs/WI-NEXUS-015/`.
39+
- Five new ledger entries chained for WI-NEXUS-011..015.
40+
41+
## Notes for reviewers
42+
43+
- The post-run `[/why]` block is gated entirely behind the existing `/why`
44+
toggle; default REPL behavior remains plain English with no governance
45+
identifiers leaking to the user.
46+
- The orchestrator's heuristic confidence (0.85 on full contract, 0.4
47+
partial) is documented as a placeholder for a real verifier signal; the
48+
retry harness already honors whatever value the executor returns.
49+
- The preflight ledger writer is best-effort — ledger errors never block
50+
the CLI from emitting its JSON or returning its exit code.
51+
52+
---
53+
54+
🤖 Generated with [Warp](https://app.warp.dev) — agent conversation:
55+
[link](https://app.warp.dev/conversation/6f8aa790-049b-4ddf-9c52-4840728faee5)
56+
57+
Plan artifact: [Warp Agent Implementation Plan](https://app.warp.dev/drive/notebook/rfCwIZUgJPCakjJ2S552DX)
58+
59+
Co-Authored-By: Oz <oz-agent@warp.dev>

0 commit comments

Comments
 (0)