feat: operator-only test-rung route to verify a rung works#65
Merged
Conversation
Companion to protoAgent#1749 (coder.solve()'s force_rung). Verifying fusion
actually works required contriving a task hard enough to fail greedy,
best-of-k, AND tree-search first -- impractical for a quick check.
Add POST /api/plugins/project_board/features/{id}/test-rung: runs exactly
ONE named rung against a feature's real acceptance tests, in a throwaway
worktree that is ALWAYS reaped -- never promoted, no PR opened, no board
state touched. coder_seam.test_rung() is deliberately separate from
dispatch() -- that function's contract (promote the winner, raise
SolveExhausted on exhaustion) is shaped for the board's real per-feature
build; mixing test semantics into it would risk the real dispatch path.
Deliberately NO @tool wrapper -- the board's own lead agent has no way to
call this itself, the same boundary this router already draws around
/features/{id}/cancel and DELETE /features/{id} (both operator-only,
neither exposed as an agent tool).
Extracted loop.py's _resolve_delegate into coder_seam.resolve_delegate (a
module-level function) so api.py's new route and loop.py's real dispatch
path share one lookup instead of two copies.
269 passed (was 260; +9 in test_coder_seam.py for test_rung's
always-reap/pass/fail/exception/fusion-forwarding behavior, +9 in
test_api.py for the route's validation gates + happy path + 400-not-500
on a solve failure).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
👀 Quinn is reviewing — verdict (PASS / WARN / FAIL) + findings to follow. |
There was a problem hiding this comment.
QA Audit — PR #65 | feat: operator-only test-rung route to verify a rung works
VERDICT: WARN (preliminary — CI still queued; no blocking findings from diff review)
CI Status
- test: queued ⏳ — terminal verdict deferred until CI completes
Diff Review
- New
POST /features/{fid}/test-rungroute with comprehensive validation gates (rung whitelist, feature existence, acceptance criteria, coder plugin, test command, delegate resolution) — clean and consistent with existing cancel/delete boundary resolve_delegate()extracted fromloop.pyintocoder_seam.pyas a module-level function — eliminates duplication between dispatch and test-rung paths. Cleantest_rung()incoder_seam.pyhas an explicit always-reap contract. The full implementation body is truncated in the diff summary, so the always-reap logic (worktree cleanup, no promotion) could not be line-by-line verified — see Gap below
Observations
- LOW: clawpatch structural review unavailable (HTTP 502 on checkout cache). Diff-based review only
- MEDIUM/Gap: full
test_rung()body and new test files (test_coder_seam.py+9,test_api.py+9) were truncated beyond the 200-line diff limit. The always-reap logic,_solve/_budget_clsplumbing, and test assertions were not line-by-line reviewed. Author claims 269 passing tests — CI will confirm - No HIGH or CRITICAL findings from the visible diff. The
except Exception→ HTTP 400 pattern is deliberate and appropriate for a diagnostic endpoint that wraps an unpredictablecoder.solve()call - The
@tool-wrapper exclusion is correctly applied — consistent with/cancelandDELETE /features/{id}
— Quinn, QA Engineer
|
Submitted COMMENT review on #65. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Companion to protoAgent#1749 (
coder.solve()'s newforce_rung). Verifying fusion (rung 4) actually works required contriving a task hard enough to fail greedy, best-of-k, and tree-search first before fusion is even reached — impractical for a quick sanity check.Fix
POST /api/plugins/project_board/features/{id}/test-rung— runs exactly ONE named rung (greedy/best-of-k/tree-search/fusion) against a feature's real acceptance tests, in a throwaway worktree that's always reaped — never promoted, no PR opened, no board state touched.coder_seam.test_rung()is deliberately separate fromdispatch()— that function's contract (promote the winner, raiseSolveExhaustedon exhaustion) is shaped for the board's real per-feature build; mixing test semantics into it would risk the real dispatch path.Where this does NOT go — no
@toolwrapperDeliberately kept off the agent-facing tool surface. This repo already draws exactly this boundary:
board_create_feature/board_mark_ready/board_list/board_retro/board_create_epicare the only 5@tool-wrapped functions the board's own lead agent can call —/features/{id}/cancelandDELETE /features/{id}are HTTP-only, operator-reachable, with no tool.test-rungfollows the same rule: the board's own lead agent has no way to call it.Refactor
Extracted
loop.py's_resolve_delegateintocoder_seam.resolve_delegate(module-level) so the new route and the real dispatch path share one lookup instead of two copies.Tests
269 passed (was 260; +9 in
test_coder_seam.pyfortest_rung's always-reap/pass/fail/exception/fusion-forwarding behavior, +9 intest_api.pyfor the route's validation gates — unknown rung, unknown feature, no acceptance criteria, no coder plugin, no test command, missing delegate, missing fusion delegate — plus the happy path and a 400-not-500 on asolve()failure).Gate:
ruff check . && ruff format --check . && pytest -q— all green.Version bumped 0.28.0 → 0.29.0 (new capability).
🤖 Generated with Claude Code