You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* fix: prevent /autoplan from compressing review sections to one-liners
Adds explicit auto-decide contract, per-phase execution checklists,
pre-gate verification, and test review emphasis.
* chore: bump version and changelog (v0.10.2.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
_TEL=$(~/.codex/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
@@ -127,6 +130,18 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
127
130
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
128
131
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
129
132
133
+
## Repo Ownership Mode — See Something, Say Something
134
+
135
+
`REPO_MODE` from the preamble tells you who owns issues in this repo:
136
+
137
+
-**`solo`** — One person does 80%+ of the work. They own everything. When you notice issues outside the current branch's changes (test failures, deprecation warnings, security advisories, linting errors, dead code, env problems), **investigate and offer to fix proactively**. The solo dev is the only person who will fix it. Default to action.
138
+
-**`collaborative`** — Multiple active contributors. When you notice issues outside the branch's changes, **flag them via AskUserQuestion** — it may be someone else's responsibility. Default to asking, not fixing.
139
+
-**`unknown`** — Treat as collaborative (safer default — ask before fixing).
140
+
141
+
**See Something, Say Something:** Whenever you notice something that looks wrong during ANY workflow step — not just test failures — flag it briefly. One sentence: what you noticed and its impact. In solo mode, follow up with "Want me to fix it?" In collaborative mode, just flag it and move on.
142
+
143
+
Never let a noticed issue silently pass. The whole point is proactive communication.
144
+
130
145
## Search Before Building
131
146
132
147
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
@@ -319,6 +334,34 @@ Examples: run codex (always yes), run evals (always yes), reduce scope on a comp
319
334
320
335
---
321
336
337
+
## What "Auto-Decide" Means
338
+
339
+
Auto-decide replaces the USER'S judgment with the 6 principles. It does NOT replace
340
+
the ANALYSIS. Every section in the loaded skill files must still be executed at the
341
+
same depth as the interactive version. The only thing that changes is who answers the
342
+
AskUserQuestion: you do, using the 6 principles, instead of the user.
343
+
344
+
**You MUST still:**
345
+
- READ the actual code, diffs, and files each section references
346
+
- PRODUCE every output the section requires (diagrams, tables, registries, artifacts)
347
+
- IDENTIFY every issue the section is designed to catch
348
+
- DECIDE each issue using the 6 principles (instead of asking the user)
349
+
- LOG each decision in the audit trail
350
+
- WRITE all required artifacts to disk
351
+
352
+
**You MUST NOT:**
353
+
- Compress a review section into a one-liner table row
354
+
- Write "no issues found" without showing what you examined
355
+
- Skip a section because "it doesn't apply" without stating what you checked and why
356
+
- Produce a summary instead of the required output (e.g., "architecture looks good"
357
+
instead of the ASCII dependency graph the section requires)
358
+
359
+
"No issues found" is a valid output for a section — but only after doing the analysis.
360
+
State what you examined and why nothing was flagged (1-2 sentences minimum).
361
+
"Skipped" is never valid for a non-skip-listed section.
362
+
363
+
---
364
+
322
365
## Phase 0: Intake + Restore Point
323
366
324
367
### Step 1: Capture restore point
@@ -400,6 +443,31 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
- Test diagram mapping codepaths to coverage (Section 3)
532
+
- Test plan artifact written to disk (Section 3)
533
+
- Failure modes registry with critical gap flags
534
+
- Completion Summary (the full summary from the Eng skill)
535
+
- TODOS.md updates (collected from all phases)
536
+
433
537
---
434
538
435
539
## Decision Audit Trail
@@ -449,6 +553,44 @@ not accumulated in conversation context.
449
553
450
554
---
451
555
556
+
## Pre-Gate Verification
557
+
558
+
Before presenting the Final Approval Gate, verify that required outputs were actually
559
+
produced. Check the plan file and conversation for each item.
560
+
561
+
**Phase 1 (CEO) outputs:**
562
+
-[ ] Premise challenge with specific premises named (not just "premises accepted")
563
+
-[ ] All applicable review sections have findings OR explicit "examined X, nothing flagged"
564
+
-[ ] Error & Rescue Registry table produced (or noted N/A with reason)
565
+
-[ ] Failure Modes Registry table produced (or noted N/A with reason)
566
+
-[ ] "NOT in scope" section written
567
+
-[ ] "What already exists" section written
568
+
-[ ] Dream state delta written
569
+
-[ ] Completion Summary produced
570
+
571
+
**Phase 2 (Design) outputs — only if UI scope detected:**
572
+
-[ ] All 7 dimensions evaluated with scores
573
+
-[ ] Issues identified and auto-decided
574
+
575
+
**Phase 3 (Eng) outputs:**
576
+
-[ ] Scope challenge with actual code analysis (not just "scope is fine")
577
+
-[ ] Architecture ASCII diagram produced
578
+
-[ ] Test diagram mapping codepaths to test coverage
579
+
-[ ] Test plan artifact written to disk at ~/.gstack/projects/$SLUG/
580
+
-[ ] "NOT in scope" section written
581
+
-[ ] "What already exists" section written
582
+
-[ ] Failure modes registry with critical gap assessment
583
+
-[ ] Completion Summary produced
584
+
585
+
**Audit trail:**
586
+
-[ ] Decision Audit Trail has at least one row per auto-decision (not empty)
587
+
588
+
If ANY checkbox above is missing, go back and produce the missing output. Max 2
589
+
attempts — if still missing after retrying twice, proceed to the gate with a warning
590
+
noting which items are incomplete. Do not loop indefinitely.
591
+
592
+
---
593
+
452
594
## Phase 4: Final Approval Gate
453
595
454
596
**STOP here and present the final state to the user.**
@@ -531,5 +673,6 @@ Suggest next step: `/ship` when ready to create the PR.
531
673
-**Never abort.** The user chose /autoplan. Respect that choice. Surface all taste decisions, never redirect to interactive review.
532
674
-**Premises are the one gate.** The only non-auto-decided AskUserQuestion is the premise confirmation in Phase 1.
533
675
-**Log every decision.** No silent auto-decisions. Every choice gets a row in the audit trail.
534
-
-**Full depth.** Do not compress or skip sections from the loaded skill files (except the skip list in Phase 0).
676
+
-**Full depth means full depth.** Do not compress or skip sections from the loaded skill files (except the skip list in Phase 0). "Full depth" means: read the code the section asks you to read, produce the outputs the section requires, identify every issue, and decide each one. A one-sentence summary of a section is not "full depth" — it is a skip. If you catch yourself writing fewer than 3 sentences for any review section, you are likely compressing.
677
+
-**Artifacts are deliverables.** Test plan artifact, failure modes registry, error/rescue table, ASCII diagrams — these must exist on disk or in the plan file when the review completes. If they don't exist, the review is incomplete.
535
678
-**Sequential order.** CEO → Design → Eng. Each phase builds on the last.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+9Lines changed: 9 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,14 @@
1
1
# Changelog
2
2
3
+
## [0.10.2.0] - 2026-03-22 — Autoplan Depth Fix
4
+
5
+
### Fixed
6
+
7
+
-**`/autoplan` now produces full-depth reviews instead of compressing everything to one-liners.** When autoplan said "auto-decide," it meant "decide FOR the user using principles" — but the agent interpreted it as "skip the analysis entirely." Now autoplan explicitly defines the contract: auto-decide replaces your judgment, not the analysis. Every review section still gets read, diagrammed, and evaluated. You get the same depth as running each review manually.
8
+
-**Execution checklists for CEO and Eng phases.** Each phase now enumerates exactly what must be produced — premise challenges, architecture diagrams, test coverage maps, failure registries, artifacts on disk. No more "follow that file at full depth" without saying what "full depth" means.
9
+
-**Pre-gate verification catches skipped outputs.** Before presenting the final approval gate, autoplan now checks a concrete checklist of required outputs. Missing items get produced before the gate opens (max 2 retries, then warns).
10
+
-**Test review can never be skipped.** The Eng review's test diagram section — the highest-value output — is explicitly marked NEVER SKIP OR COMPRESS with instructions to read actual diffs, map every codepath to coverage, and write the test plan artifact.
11
+
3
12
## [0.10.1.0] - 2026-03-22 — Test Coverage Catalog
0 commit comments