fix: coder.solve() PR bodies now carry the coder's own summary (v0.29.2)#68
Merged
Conversation
… diagnostic string (v0.29.2) _WorktreeSolveAdapter.generate() dispatched the coder into its worktree but discarded the reply, keeping only the worktree path. Every coder dispatch is promised "your FINAL message becomes the PR description" (loop._build_prompt), but dispatch() had nothing real to report for a solve()-ladder win, so it fell back to its own diagnostic string (e.g. "[coder.solve rung=greedy gens=1] solved 1-shot") — which became the PR body verbatim. Same failure class as #62, on a different path. Worse than cosmetic: _verify_goal's NO_TEST_NEEDED escape hatch reads this same text, so a legitimate no-test-needed change dispatched through the solve() ladder would always look like a missing test, forcing an unnecessary re-dispatch/escalation cycle. Fix: generate() now captures the coder's reply per candidate; dispatch() uses the WINNING candidate's own reply as the result, falling back to the diagnostic string only when there isn't one (a fusion win, which returns file content, not a summary). Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
|
👀 Quinn is reviewing — verdict (PASS / WARN / FAIL) + findings to follow. |
There was a problem hiding this comment.
QA Audit — PR #68 | fix: coder.solve() PR bodies now carry the coder's own summary (v0.29.2)
VERDICT: WARN (CI still queued — re-review once terminal)
CI Status
- test: queued
Diff Review
coder_seam.py: Adds_repliesdict to capture each candidate coder's final reply ingenerate();dispatch()now uses the winning candidate's own reply asresult_text, falling back to the rung/gens diagnostic string for fusion wins. Clean, minimal change — 2 sites touched.- Version bumps:
0.29.1→0.29.2inprotoagent.plugin.yamlandpyproject.toml. - Tests: Existing test updated to assert the winner's reply is used verbatim; new
test_dispatch_falls_back_to_a_diagnostic_string_when_the_winner_has_no_replycovers the fusion-win fallback. Both well-structured.
Observations
- LOW: clawpatch structural review unavailable (500 — repo not in cache). Manual review of the diff found no cross-file issues. The change is scoped entirely within
_WorktreeSolveAdapter— no callers modified, no API contract changes. - The
(reply or "").strip()guard ingenerate()correctly prevents storing empty/whitespace-only replies, ensuring the fallback fires when the coder produces nothing useful.
— Quinn, QA Engineer
|
Submitted COMMENT review on #68. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Caught live: portfolio-plugin#38's PR body was literally
"[coder.solve rung=greedy gens=1] solved 1-shot"— a diagnostic string, not a description.Root cause:
_WorktreeSolveAdapter.generate()dispatches the coder into its worktree but discards the reply, keeping only the worktree path. Every coder dispatch is promised "your FINAL message becomes the PR description, verbatim" (loop._build_prompt) — butdispatch()had nothing real to report for acoder.solve()-ladder win, so it fell back to its own[coder.solve rung=... gens=...] {note}string. Same failure class as #62 ("coder's final reply becomes the PR body verbatim"), just on a different dispatch path that #62 never touched.Worse than cosmetic:
_verify_goal'sNO_TEST_NEEDEDescape hatch reads this same text — so a legitimate no-test-needed change (pure refactor, docs-as-code) dispatched through the solve() ladder would always look like a missing test, forcing an unnecessary re-dispatch/escalation cycle.Fix:
generate()now captures each candidate's reply (adapter._replies[wt]).dispatch()uses the winning candidate's own reply as the result, falling back to the diagnostic string only when there isn't one (a fusion win returns file content, not a natural-language summary — nothing to fall back FROM).Test plan
ruff check ./ruff format --check .— cleanpytest tests/ -q— 281 passed (2 new: winner's reply is used verbatim + the fusion-win fallback still produces the diagnostic string)🤖 Generated with Claude Code