Bump skill turn caps (review 30→200, gen 500→1000)#789
Conversation
PR #788 (Update stacklok/toolhive to v0.24.0) failed the run when skill_review hit the 30-turn cap mid-edit. Details from run 24786157448: - skill_gen: 67 turns / $4.72 (under 500 cap) -- succeeded, committed `42267da Document local vMCP CLI mode` - skill_review: 31 turns / $1.86 -- exited with subtype `error_max_turns`, cascading workflow to `failure`, which skipped autofix and PR body augmentation My initial cap of 30 for review was sized against silent/no- changes releases (4-6 turns baseline). For a real multi-file content review the editorial pass walks each edited file and makes tightening edits; ~30-100 turns is legitimate working range. Changes: - skill_review: 30 -> 200 (2x-6x headroom over working range) - skill_gen: 500 -> 1000 (defensive doubling; 500 was never hit in production, but 1000 keeps us well clear of the 397-turn v3-test anomaly) Hitting either cap still produces a loud failure; raise deliberately if a release genuinely needs more. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Adjusts the upstream-release-docs GitHub Actions workflow to reduce failures from hitting Claude Code session turn caps during larger, multi-file release doc updates (notably the review pass that previously capped at 30 turns).
Changes:
- Increase
skill_gen(generation)--max-turnsfrom 500 to 1000. - Increase
skill_review(editorial review)--max-turnsfrom 30 to 200. - Update inline workflow comments documenting the rationale for the new caps.
| # --max-turns 1000: observed gen baselines are 20 turns | ||
| # (silent) to 152 (full content rebuild). 500 was the | ||
| # initial cap; bumped to 1000 for extra headroom on | ||
| # multi-feature releases and to stay well above the | ||
| # suspected-looping 397-turn v3-test run (still clips | ||
| # genuine runaways). Hitting the cap produces a loud | ||
| # failure -- raise deliberately if a release needs more. |
There was a problem hiding this comment.
The updated rationale mixes two different “observed baseline” ranges (20–152) while also referencing a 397-turn run. This is internally inconsistent and makes it hard to interpret the headroom goal. Consider rephrasing to separate “typical/observed” from “anomalous” runs (or update the baseline range to include the 397 outlier if it’s being treated as legitimate).
| # --max-turns 1000: observed gen baselines are 20 turns | |
| # (silent) to 152 (full content rebuild). 500 was the | |
| # initial cap; bumped to 1000 for extra headroom on | |
| # multi-feature releases and to stay well above the | |
| # suspected-looping 397-turn v3-test run (still clips | |
| # genuine runaways). Hitting the cap produces a loud | |
| # failure -- raise deliberately if a release needs more. | |
| # --max-turns 1000: typical observed gen runs span 20 | |
| # turns (silent) to 152 (full content rebuild). 500 was | |
| # the initial cap; bumped to 1000 for extra headroom on | |
| # multi-feature releases. Separately, a suspected-looping | |
| # v3-test run reached 397 turns, so 1000 stays well above | |
| # that anomalous case while still clipping genuine | |
| # runaways. Hitting the cap produces a loud failure -- | |
| # raise deliberately if a release needs more. |
| # --max-turns 200: initial cap of 30 was sized against | ||
| # silent-release baselines (4-6 turns) and was too tight | ||
| # for real content reviews. v0.24.0 (PR #788) hit it at | ||
| # turn 31 mid-review and failed the run; the editorial | ||
| # pass genuinely needs ~30-100 turns to walk a multi- | ||
| # file content PR. 200 gives 2x-6x headroom over that | ||
| # working range while still clipping a runaway. |
There was a problem hiding this comment.
The new --max-turns 200 rationale says the editorial pass can legitimately take ~30–100 turns, but an earlier nearby comment still states review is “4–5 turns in every run so far” and refers to “10 / 30” buffers. This creates conflicting documentation in the same step; please update the earlier comment (or remove/replace it) so the timeout/turn-cap guidance is consistent.
Context
PR #788 (Update stacklok/toolhive to v0.24.0) failed the augmentation run when `skill_review` hit the 30-turn cap mid-edit.
From run 24786157448:
Review's failure cascaded the workflow to `failure`, skipping autofix and PR body augmentation — so PR #788 is stuck with an incomplete rendered body and no run-cost table, even though the skill's content work actually landed.
Why the cap was too tight
My original review cap of 30 was sized against silent / no-changes release runs (baseline 4-6 turns). For a real multi-file content release like v0.24.0, the editorial pass genuinely walks each edited file and makes tightening edits — ~30-100 turns is legitimate working range.
Changes
Caps still clip genuine runaways loudly; we raise them deliberately if a release ever genuinely needs more.
Follow-up for PR #788
Once this merges, PR #788 needs a retry via `gh workflow run upstream-release-docs.yml -f pr_number=788` to complete the augmentation step. The existing skill content commit (`42267da`) stays; retry re-runs everything but should now finish review + autofix + body augmentation cleanly.