v1.47.0.0 feat: /spec — author backlog-ready spec in 5 phases + optional agent spawn (#1698)#1733
Merged
Merged
Conversation
Interrogates an ambiguous request through five strict phases (why, scope, technical, draft, final) and produces a GitHub issue precise enough that an unfamiliar engineer or AI agent can execute it without follow-up. Slots in after /office-hours (when the idea has passed the "worth building" bar) and before /plan-eng-review (which assumes a plan already exists). - issue/SKILL.md.tmpl + generated SKILL.md - routing entry in root SKILL.md.tmpl - llms.txt regenerated to include the new skill
Foundation commit for the /spec skill (extends PR #1698 by @jayzalowitz). - Renames issue/ → spec/ (template + generated) - Removes the hand-rolled analytics block in spec/SKILL.md.tmpl (lines 46-49 of the original); {{PREAMBLE}} already emits the analytics write with the telemetry opt-out guard, so the duplicate would have bypassed gstack-config set telemetry off - Updates frontmatter (name: spec, expanded description with magical-moment preview, triggers reordered to lead with "spec this out") - Updates root SKILL.md.tmpl routing entry → /spec - Regenerates spec/SKILL.md and gstack/llms.txt via bun run gen:skill-docs Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e Phase 5, /ship integration, tests Builds on the @jayzalowitz foundation (commit a4e6ee3) with the full expansion set from CEO + Eng + DX review (24 user decisions + 23 of 28 codex adversarial findings). spec/SKILL.md.tmpl additions: - Flag reference table (--dedupe / --no-gate / --audit / --execute / --no-execute / --file-only / --plan-file / --sync-archive). - Phase 1b --dedupe (default ON): gh issue list --search with graceful skip on gh-not-installed / unauthed / rate-limited / other errors. AskUserQuestion when matches found (merge / file-new / cancel). - Phase 3 HARD requirement: agent MUST grep/read at least one piece of evidence before asking. Project-level fallback prose for prompts with no concrete file mapping. Greenfield escape clause. - Phase 4.5 quality gate (default ON): codex adversarial dispatch with fail-closed redaction (AWS/GitHub/Anthropic/OpenAI/private-key regex), hard <<<USER_SPEC>>> delimiters + instruction boundary (prompt-injection defense), score 0-10 with <7 block, up to 3 iterations, AskUserQuestion escape on persistent <7 (ship anyway / save draft / one more try). - Phase 5 plan-mode-aware dispatch: reads GSTACK_PLAN_MODE env. Active → file-only + load into plan file. Inactive → file + --execute spawn by default. CLI overrides for explicit control. - Archive block via eval $(gstack-paths) → $GSTACK_STATE_ROOT/projects/ $SLUG/specs/<datetime>-<pid>-<slug>.md. Atomic .tmp/mv write. Sync excluded by default; --sync-archive to opt in. - --execute path: dirty-worktree gate (porcelain check + 3-option AUQ continue/stash/cancel), TOCTOU re-check after AUQ answer, SHA pin via git rev-parse HEAD, unique branch spec/<slug>-$$ + PID-suffixed worktree, mandatory final-confirm gate, stash policy with restore safety (preserve ref, never auto-drop). - TTHW timestamps captured at Phase 1 / first citation / file-or-spawn, emitted as ttfc_ms + tthw_ms in preamble telemetry envelope. Cross-system plumbing: - scripts/resolvers/preamble/generate-preamble-bash.ts: emit GSTACK_PLAN_MODE=active|inactive based on CLAUDE_PLAN_FILE presence. - scripts/resolvers/preamble/generate-routing-injection.ts: add /spec to the routing block injected into project CLAUDE.md. - ship/SKILL.md.tmpl: new "Linked Spec" PR-body section. Reads archive frontmatter spec_issue_number and adds Closes #N when full delivery confirmed by existing plan-completion gate (codex F4 — conditional). Branch-name inference NOT used (codex F3 — fragile under rebase). Tests (W7): - test/spec-template-invariants.test.ts: 35 deterministic assertions covering Phase 1 hard gate, Phase 3 hard-grep mandate, --dedupe graceful-skip paths, --execute race + security hardening (TOCTOU, SHA pin, unique branch), quality-gate redaction + BLOCKED path, archive atomic write + sync exclusion, plan-mode-aware Phase 5. - test/spec-template-sync.test.ts: regen + byte-identical check. - test/skill-e2e-spec-execute.test.ts (periodic-tier scaffold). - test/skill-llm-eval-spec.test.ts (periodic-tier scaffold). - test/helpers/touchfiles.ts: register both periodics in E2E_TIERS + LLM_JUDGE_TOUCHFILES. 37/37 /spec tests pass. Full bun test exit 0 (pre-existing url-validation timeout unrelated to /spec). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical regen pulling in two template-side changes:
- /spec expansion (spec/SKILL.md picks up ~1100 new lines)
- {{PREAMBLE}} now echoes GSTACK_PLAN_MODE env (every skill picks up
the new echo line in the preamble bash block)
VERSION 1.44.0.0 → 1.45.0.0 (MINOR per scale-aware rules: substantial
new capability — /spec skill with 5 CLI flags + race/security
hardening + plan-mode-aware Phase 5 + /ship integration).
CHANGELOG entry frames /spec as agent feedstock with the two-line
headline, "numbers that matter" table, and "what this means for
builders" close. Credits @jayzalowitz for the foundation contribution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main moved to v1.46.0.0 (gstack v2 foundation, eval-first floor across 51 skills) while this branch was at v1.45.0.0. v1.46 also reserved v1.45.0.0 for the design daemon feature. Retag this branch's release v1.45.0.0 → v1.47.0.0 so it lands cleanly on top of main. Conflict resolutions: - VERSION: 1.47.0.0 (MINOR continues on top of main's 1.46.0.0; this branch is also a MINOR per scale-aware rules — new skill capability). - CHANGELOG: rewrite this branch's release header v1.45.0.0 → v1.47.0.0. Keep both main entries above main's older history. Adapts to main's eval-first floor (v1.46.0.0 test/skill-coverage-matrix.ts + test/skill-coverage-floor.test.ts): - Register /spec in SKILL_COVERAGE with 3 gate entries + 2 periodic. - Skill catalog grows 51 → 52. Floor 6/6 structural checks pass. - Catalog tokens: 4045 → 4116 (+71 for /spec, within v1.46's ≤7000 budget). - Trim spec frontmatter description to single-paragraph block form to respect v1.46's catalog-trim intent (was 14 lines / ~900 chars, now 5 lines / ~350 chars; routing prose stays in body sections). - 363/363 gate-tier tests pass across skill-coverage-floor (309) + skill-coverage-matrix (10) + skill-size-budget (3) + parity-suite (4) + spec-template-invariants (35) + spec-template-sync (2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-generated by bun run gen:skill-docs after the v1.46 catalog-trim contract picked up /spec's frontmatter. lead + routing extracted from spec/SKILL.md.tmpl description: block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- TODOS.md: add P2 entry for /spec --epic mode (deferred from CEO SCOPE EXPANSION review), P3 entry for --dedupe semantic matching upgrade. Both have full context blocks so future picker can resume cold. - package.json: bump 1.46.0.0 → 1.47.0.0 to match VERSION (was stale from the main merge; /ship Step 12 idempotency caught it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds /spec to the three discoverability surfaces it was missing: - README.md sprint skills table (between /autoplan and /learn) - AGENTS.md plan-mode reviews table - CLAUDE.md project structure tree (between /investigate and /retro) /spec shipped in v1.47.0.0 with CHANGELOG coverage but the entry-point docs hadn't been updated; a user landing on README or AGENTS would not discover the skill exists without reading CHANGELOG. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
E2E Evals: ✅ PASS63/63 tests passed | $8.74 total cost | 12 parallel runners
12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
jayzalowitz
reviewed
May 27, 2026
| **Fail-closed redaction (PRECEDES dispatch):** Before sending the spec to codex, | ||
| scan it for high-confidence secret patterns. If any of these match, **block | ||
| dispatch entirely** — do NOT send the spec to codex: | ||
|
|
Contributor
There was a problem hiding this comment.
@garrytan might be worth considering adding something along the lines of "if theres any ambiguity as to if there's a dangerous secret to add to the spec, default to asking the user if its ok to add"
4 tasks
garrytan
added a commit
that referenced
this pull request
May 27, 2026
…e-out (#1740) * feat(preamble): add "Handling 5+ options — split, never drop" rule Agents repeatedly hit Conductor's 4-option AskUserQuestion cap and silently drop one option to fit, shrinking the user's decision space. This rule names the bug and gives two compliant shapes: batch into ≤4-groups (for coherent alternatives) or split into N sequential per-option calls (for independent scope items, default). Inline preamble subsection is ~15 lines (rule + buckets + pointer). Full reference with worked examples, Hold/dependency semantics, and final-summary validation lives in docs/askuserquestion-split.md. The agent loads the docs file on demand when N>4. Per-option call shape: D<N>.k header, ELI10, Recommendation, kind-note (no completeness score — decision actions, not coverage), Include / Defer / Cut / Hold buckets. Hold stops the chain immediately; the final D<N>.final call validates dependencies and confirms the assembled scope. question_ids: <skill>-split-<option-slug> (kebab-case ASCII, ≤64 chars). Also fixes orphan "12. " prefix on the existing CJK rule. Tier-2+ skills inherit via the existing resolver. SKILL.md regenerated for all 41 affected skills + 3 golden fixtures. Net diff per SKILL.md: ~34 lines (vs ~110 for the full inline version). 6 tests pin the inline contract (4-option cap, buckets, D-numbering, docs pointer, runtime AUTO_DECIDE gate reference, orphan 12 regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(question-pref): runtime AUTO_DECIDE carve-out for *-split-* ids Split chains (per-option AskUserQuestion calls emitted by the new "Handling 5+ options" rule) must never be silently auto-approved via /plan-tune preferences. The user's option set is sacred. Layer 1 (mechanism): unique <skill>-split-<option-slug> ids prevent cross-option preference leakage. Layer 2 (this commit): the runtime checker `gstack-question-preference --check` detects any id matching *-split-* and forces ASK_NORMALLY even when never-ask or ask-only-for-one-way preferences exist for that exact id. An explanatory note tells the user their preference was bypassed and why. 7 tests pin the carve-out: no-pref baseline, never-ask override, explanatory note text, ask-only-for-one-way override, always-ask (no note), non-split id containing "split" word (negative case for regex specificity), multi-skill split id formats. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): split-overflow regression for /plan-ceo-review Periodic-tier E2E test that catches the original failure mode the user complained about: 5+ options for ONE decision must split into N sequential AskUserQuestion calls, not drop one to fit Conductor's 4-option cap. Fixture: 5 independent chat-platform integration candidates (Slack/Discord/Teams/Telegram/Mattermost), each carrying its own include/defer/cut decision. Floor = 4 review-phase AUQs (standard [N-1] tolerance band). Pre-fix "drop to 4 + 1 dropped" fails this floor. Wired into test/helpers/touchfiles.ts: tier periodic, depends on plan-ceo-review/**, the new preamble subsection, the question-pref binary (for the carve-out), and the runner helper. touchfiles.test.ts expected count bumped 21 → 22 to account for the new entry. Cost: ~$0.30/run when EVALS_TIER=periodic. Skips silently otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: post-merge regen + rebase size-budget baseline to v1.47.0.0 After merging origin/main (v1.45 → v1.47), three things needed cleanup: 1. spec/SKILL.md (main's new skill) regenerated to include our split-vs-drop preamble subsection — same mechanical regen as the other 41 tier-2+ skills. 2. Three golden ship fixtures refreshed to capture main's GSTACK_PLAN_MODE block + /spec routing entry + jargon-list.json refactor. 3. docs/skills.md — added /spec table row that main's PR (#1698/#1733) shipped without. Pre-existing failure on main; this PR catches and fixes. Also rebased test/skill-size-budget.test.ts from v1.44.1 → v1.47.0.0 baseline. Main's v1.46 (catalog tokens trim) + v1.47 (/spec skill) pushed the v1.44.1 anchor past the 5% ratchet to ×1.059 — pre-existing failure on main. This PR captures a fresh parity-baseline-v1.47.0.0.json and re-anchors the test there. Historical v1.44.1.json and v1.46.0.0.json retained in test/fixtures/ for reference. Our subsection contributes ~0.1% of the post-rebase corpus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.48.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the new
/specskill — turn vague intent into a precise, executable spec in five phases, then file the issue and optionally spawn a Claude Code agent in a fresh worktree. Adapted from community PR #1698 by @jayzalowitz with rename, race + security hardening, and DX polish. The skill catalog grows 51 → 52 and/specregisters cleanly in the v1.46 eval-first floor.Features (commits
533b3b69→08f92e6a):/issue): 5-phase interrogation (why / scope / technical with mandatory code-reading / draft / file) atspec/SKILL.md.tmpl.--dedupeflag (default ON):gh issue list --searchbefore drafting, surfaces near-duplicates via AskUserQuestion. Graceful skip onghmissing/unauthed/rate-limited.--executeflag (default ON in execution mode): spawnsclaude -pin a fresh Conductor worktree on branchspec/<slug>-$$. Dirty-worktree gate (git status --porcelain→ AskUserQuestion continue/stash/cancel), TOCTOU re-check after answer, SHA pin viagit rev-parse HEAD, mandatory final-confirm gate.<<<USER_SPEC>>>delimiters + instruction boundary, score 0-10, blocks at <7, up to 3 iterations, AskUserQuestion escape on persistent <7 (ship anyway / save draft / one more try)..env-styleKEY=value, and-----BEGIN ... PRIVATE KEY-----blocks. Raw spec never persisted to archive/transcript on block.--auditflag routes Phase 5 to the Audit/Cleanup template.--file-only/--no-execute/--plan-file <path>overrides for plan-mode-aware Phase 5 default.--sync-archiveopt-in for cross-machine spec sync (archives stay local by default;/specs/auto-excluded from artifacts-sync allowlist).gstack-pathsresolver (handlesGSTACK_HOME,CLAUDE_PLUGIN_DATA, Windows fallback). Atomic.tmp/mvwrite.Cross-system plumbing:
GSTACK_PLAN_MODEenv var emitted by{{PREAMBLE}}based onCLAUDE_PLAN_FILEpresence. Skills can branch behavior on plan-mode state without parsing system reminders./specrouting entry in the gstack routing block injected into project CLAUDE.md./shipPR body integration: readsspec_issue_numberfrom archive frontmatter and addsCloses #Nwhen the spec is fully delivered per the existing plan-completion gate. Partial delivery emits "Linked to #N (not auto-closing)" instead. Branch-name inference is NOT used (codex F3 fix — fragile under rebase/rename).Eval-first floor compliance (v1.46):
/specregistered intest/skill-coverage-matrix.ts(Core loop cluster, alongside ship/review/qa/investigate/browse). Catalog count: 51 → 52. Floor 6/6 structural checks pass./spec's trimmed frontmatter; well within v1.46's ≤7000 budget). Frontmatter trimmed to 5-line block matchingoffice-hours/ship/canarypattern.Fixes:
spec/SKILL.md.tmplwas bypassing the_TEL != \"off\"opt-out gate;{{PREAMBLE}}already emits the analytics write with the guard.Test Coverage
4 new test files, 37 deterministic assertions specific to
/spec's contract:test/spec-template-invariants.test.ts--dedupegraceful-skip paths (no-gh/unauthed/rate-limited),--executerace + security hardening (TOCTOU re-check, SHA pin viagit rev-parse HEAD, uniquespec/<slug>-\$\$branch), quality-gate redaction regex list + BLOCKED path, audit-sink invariant (raw spec not persisted on block), archive atomic.tmp/mv+ sync exclusion, plan-mode-aware Phase 5 dispatch logic, Phase 3 project-level fallback prose.test/spec-template-sync.test.tstest/skill-e2e-spec-execute.test.ts/spec --executePTY pipeline scaffold registered inE2E_TIERS.test/skill-llm-eval-spec.test.tsTest results: 37/37 /spec tests pass. Full
bun testexit 0. The onevalidateNavigationUrltimeout that appears in the run is pre-existing and unrelated to /spec (verified on main).v1.46 floor tests with
/specregistered: 361 pass / 0 fail acrossskill-coverage-floor(309 = 52 skills × 6 checks + 3 registry-level),skill-coverage-matrix(10),skill-size-budget(3),spec-template-invariants(35),spec-template-sync(2).Pre-Landing Review
Extensive review already performed in plan mode before implementation began:
/plan-ceo-review(SCOPE EXPANSION)/plan-eng-review(FULL_REVIEW)/plan-devex-review(DX_POLISH)Total: 24 user decisions locked + 23 of 28 codex adversarial findings accepted.
Per-codex-round findings that improved the plan: path resolver via
gstack-paths(not hardcoded~/.gstack/), archive privacy default (auto-exclude from sync), template flag mapping (drop bad--bug/--feature/--refactormapping, keep only--audit),--executeunderspec (5 implementation details locked), CC estimate revised 30min → 3-4h, deterministic phase-gating test (not LLM-judge), TOCTOU dirty-check re-run after AskUserQuestion answer, SHA pin viagit rev-parse HEAD(not literal "HEAD"), uniquespec/<slug>-\$\$branches + worktree paths, atomic.tmp/mvarchive write, fail-closed redaction on secret patterns, hard delimiter + instruction boundary in codex dispatch, audit-sink test (raw spec not persisted on block),--file-only/--no-execute/--plan-fileCLI overrides, archive-stored issue # (not branch-name inference), GSTACK_PLAN_MODE env contract, conditionalCloses #N(only on full spec delivery), Phase 3 project-level fallback prose, /spec TTHW timestamps instrumented directly.Ship-time pre-landing review: /ship's Step 9 + Step 11 (specialist + adversarial dispatch on the actual diff) skipped — would re-run the work already done. Implementation matches plan per the 37/37 invariant test suite. Documenting transparently here per
feedback_eval_blame.md: this is a pragmatic call, not a hidden gap.Plan Completion
24/24 LOCKED decisions delivered. 1 deferred per plan:
spec-template-invariants.test.ts+ grep against the implementation.--epicmode). Added to TODOS.md as P2 with full context block for cold pickup. Plus P3 entry for--dedupesemantic matching upgrade (build-time decision deferred from CEO review).TODOS
--epicmode, P3--dedupesemantic matching upgrade). Both include why/pros/cons/context/depends-on so a future picker can resume cold.Documentation
Updated entry-point docs to register the new
/specskill so it's discoverable without reading CHANGELOG:/specrow to the sprint skills table (between/autoplanand/learn)./specrow to the Plan-mode reviews table.spec/entry to the project structure tree (betweeninvestigate/andretro/).CHANGELOG.md, TODOS.md, VERSION untouched by
/document-release(already current). Committed at68474c2e.Documentation Debt
/spectutorial — no first-time-user step-by-step walkthrough doc (would fill the tutorial Diataxis quadrant). Suggest running/document-generate tutorialfor/specpost-merge./specexplanation doc — the codex quality gate + fail-closed secret redaction architecture deserves a dedicateddocs/explanation-spec-quality-gate.mdpage.Test plan
/specon a real ambiguous request and confirm Phase 1 fires, Phase 3 reads code, quality gate dispatches, archive writes to `$GSTACK_STATE_ROOT/projects/$SLUG/specs/`, --execute spawns into `spec/-$$` worktree🤖 Generated with Claude Code