fix: gate fusion on file size, refuse a shrunk rewrite, restrict writes to declared files (v0.29.1)#67
Merged
Merged
Conversation
…es to declared files (v0.29.1) Fusion can't tool-call and returns whole-file "complete" replacements, not diffs — a large file (or one that just crosses the delegate's own max_tokens ceiling on output) risks a silently truncated rewrite with no signal that anything went wrong. fusion_viable_for_files() gates a feature's declared files_to_modify by real on-disk size BEFORE fusion is ever dispatched (loop.py and api.py's test-rung route both check it and skip straight to "no fusion this dispatch" on a miss, same as an unconfigured delegate). As defensive backstops in generate_fusion itself: refuse a returned file that's suspiciously smaller than the original (more likely truncated than an honest edit), and refuse writing any path outside the feature's declared set (fusion has no tool access, so an undeclared path is a hallucination, not a real change). Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
|
👀 Quinn is reviewing — verdict (PASS / WARN / FAIL) + findings to follow. |
There was a problem hiding this comment.
QA Audit — PR #67 | fix: gate fusion on file size, refuse a shrunk rewrite, restrict writes to declared files (v0.29.1)
VERDICT: WARN
CI Status
- test: queued (⏳ non-terminal — formal PASS/FAIL deferred until CI settles)
Diff Review
- Adds
fusion_viable_for_files()— gates fusion dispatch on actual on-disk file sizes usingos.path.getsize(O(1) stat, never reads into memory). Default caps: 8K per-file, 16K combined. - Shrink guard in
generate_fusionrefuses writes <50% of original size when original ≥500 chars — catches truncated completions the adapter can't surface asfinish_reason. _fusion_prompttruncation is now visible to the model (was silently truncating at 20K before). Over-limit files are explicitly marked "do NOT return as complete."files_to_modifywrite restriction prevents fusion from writing to undeclared paths.- Wired into
api.py(_test_rung) — 400s with reason instead of attempting doomed rewrite.loop.py_drivewiring claimed but diff truncated at 200/647 lines; full wiring not visible in this pass.
Observations
- LOW: clawpatch structural review unavailable (HTTP 502 from checkout cache) — review proceeds on diff analysis alone.
- LOW: diff truncated at 200/647 lines —
loop.py_drivewiring andgenerate_fusionwrite-guard details beyond the first 200 lines not visible in this pass. - INFO:
os.path.getsizereturns bytes, not chars; for multibyte encodings the gate could report slightly different byte-vs-char counts. Conservative defaults (8K per-file) and configurability make this acceptable for typical source files. - INFO: 22 new tests claimed (280 total, up from 258). CI still queued — test results unconfirmed at review time.
— Quinn, QA Engineer
|
Submitted COMMENT review on #67. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fusion (ADR 0064 P3, shipped in #64/#65) can't tool-call — it returns whole-file
"complete" replacements, not diffs. That design has a real, unmitigated data-loss
risk: a large file (or one that just crosses the delegate's own
max_tokensceiling on output) can come back truncated with no signal that anything went
wrong —
_fusion_promptused to silently truncate its own file reads at afixed cap, and the write path (
generate_fusion) would write back whatever themodel returned, for any path it named.
Caught during live dogfooding: "That full file thing and infusion seems a little
hairy. Like what if it's a 10K line file?" — and it's actually worse than that
framing: even a moderately-sized file can trip the delegate's own output
token ceiling, well before 10K lines.
Three layers, honest-degrade rather than silent-corrupt:
fusion_viable_for_files(repo, files_to_modify, ...)— checks realon-disk file sizes (per-file
coder_solve_fusion_max_file_chars, default8000; combined
coder_solve_fusion_max_total_chars, default 16000) BEFOREfusion is ever dispatched. Wired into both call sites:
loop.py's_drive(an oversized feature just skips the fusion rung for that dispatch — the
ladder still runs greedy/best-of-k/tree-search) and
api.py'stest-rungroute (400s with the reason instead of attempting a doomed rewrite).
generate_fusion, in case a callerever skips the gate above) — refuses to write a "complete" rewrite back if
it's suspiciously smaller than the file it claims to replace; far more
likely a truncated completion than an intentional big deletion.
files_to_modifywrite restriction —generate_fusionnow refuses towrite any path outside the feature's declared file set when one is
declared. Fusion has no tool access, so an undeclared path is a
hallucination (or a parser mis-split), not a real change.
_fusion_prompt's own truncation fallback (only reachable if a caller skipsthe size gate) is now visible to the model — it explicitly says "do NOT return
this as a complete replacement; skip this file instead" instead of silently
feeding a cut-off read.
Test plan
ruff check .— cleanruff format --check .— cleanpytest tests/ -q— 280 passed (up from 258; 22 new tests covering thesize gate, the shrink guard, the write restriction, the visible
truncation marker, and the
loop.py/api.pywiring)test_manifest_and_pyproject_versions_agreepassing🤖 Generated with Claude Code