Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,59 @@
# Changelog

## [1.48.0.0] - 2026-05-26

## **Agents stop dropping AskUserQuestion options when there are 5+.** A new canonical preamble rule + runtime gate makes Conductor's 4-option cap a split-or-batch decision, not a silent trim.

The failure mode looked like this in real transcripts:

> "I'm hitting Conductor's limit of 4 options in the AUQ, so I need to cut one. E4 is the biggest lift and probably beyond scope for v0.42 anyway. Trimming: E4. Moving to TODOs without asking. Re-firing with 4."

The agent unilaterally cut an option from the user's decision set. This release names that as a bug and gives the agent two compliant shapes for any 5+ option scenario: batch into ≤4-groups when the options are coherent alternatives (version bumps, layout variants), or split into N sequential per-option calls when the options are independent scope items (which platforms to ship, which TODOs to land). Default-to-split when unsure. The inline preamble subsection is intentionally compressed; full reference with worked examples, Hold/dependency semantics, and final-summary validation lives in `docs/askuserquestion-split.md` — loaded on demand when N>4.

A two-layer defense protects the option set from being silently auto-approved: per-option `question_id`s of shape `<skill>-split-<option-slug>` are unique per option (preferences can't leak across the chain), and the runtime checker `bin/gstack-question-preference --check` refuses `never-ask` on any `*-split-*` id with an explanatory note. The user's option set is sacred — the whole point of splitting is restoring user sovereignty over the decision space.

### The numbers that matter

Source: this branch's 4 commits + the post-merge regen against `main` at v1.47.0.0. Net diff: 57 files, ~2800 lines (most of it mechanical SKILL.md regen across all 41 tier-2+ skills, ~34 lines added per skill from the inline subsection injection).

| Capability | Before this PR | After this PR |
|---|---|---|
| 5+ options for ONE decision | drop one to fit cap, hope user doesn't notice | split into N per-option calls OR batch into ≤4-groups, name the rule in the preamble |
| Per-option call shape | n/a (couldn't reliably produce one) | `D<N>.k` header, Include / Defer / Cut / Hold buckets, kind-note (no completeness score), recommendation per option |
| Hold semantics | undefined (chain might queue, might stop, agent-dependent) | stop chain immediately, resume on user "continue" |
| Final summary | n/a | `D<N>.final` validates dependencies, reprompts conflicts, confirms assembled scope |
| `D<N>.0` meta-question (N>6) | n/a | tool-call meta-question first: proceed / narrow / batch |
| AUTO_DECIDE on split per-option calls | possible if user tuned the pattern via /plan-tune | runtime checker forces `ASK_NORMALLY` on any `*-split-*` id, with explanatory note |
| Regression coverage | n/a | 6 inline-contract tests + 7 runtime-gate tests + 1 periodic-tier E2E behavior test (5-option scope fixture) |
| Inline preamble bytes per SKILL.md | n/a | ~1.6 KB (vs ~4 KB if the full rule were inlined; deeper reference loaded on demand) |

### What this means for builders

Next time you give an agent a scope question with 5+ candidates, you'll see one of three shapes: a single batched AskUserQuestion with ≤4 buckets (if the options are coherent alternatives), N sequential per-option calls each with Include/Defer/Cut/Hold (if they're independent), or a `D<N>.0` meta-question first asking whether to proceed/narrow/batch (if N>6). You'll never see a silent trim. If you've ever wired up a /plan-tune `never-ask` preference, it now refuses to apply to split chains — the runtime check forces ASK_NORMALLY with a note explaining why.

### Itemized changes

#### Added
- New canonical preamble subsection in `scripts/resolvers/preamble/generate-ask-user-format.ts`: "Handling 5+ options — split, never drop". Inline-compressed (~1.6 KB per tier-2+ skill); full reference at `docs/askuserquestion-split.md`.
- `docs/askuserquestion-split.md`: ~200-line deep reference covering both compliant shapes (batched / split), per-option call mechanics with `D<N>.k` numbering, Hold-means-stop semantics, final-summary dependency validation with conflict reprompt, `D<N>.0` meta-question for N>6, `<skill>-split-<option-slug>` question_id format, and the two-layer AUTO_DECIDE defense.
- Runtime AUTO_DECIDE gate in `bin/gstack-question-preference --check`: detects any `question_id` matching `*-split-*` and forces `ASK_NORMALLY` regardless of stored `never-ask` or `ask-only-for-one-way` preferences. Emits an explanatory note so the user knows their preference was bypassed and why.
- `test/skill-e2e-plan-ceo-split-overflow.test.ts`: periodic-tier regression test using a 5-option chat-platform integration fixture. Floor 4 review-phase AUQs (N-1 tolerance). Catches the original drop-to-fit-4 failure mode.

#### Changed
- Test baseline anchor rebased v1.44.1 → v1.47.0.0 in `test/skill-size-budget.test.ts`. Main's v1.46 (catalog tokens) + v1.47 (/spec) growth pushed the v1.44.1 anchor past the 5% ratchet; rebasing absorbs that growth at HEAD. Historical `parity-baseline-v1.44.1.json` and `parity-baseline-v1.46.0.0.json` retained in `test/fixtures/` for reference.
- 3 self-check items added to the AskUserQuestion preamble checklist (split-not-drop, dependency-check-before-chain, Hold-stops-immediately).
- 41 tier-2+ skills regenerated to inherit the new subsection (~34 lines each via the preamble resolver).
- 3 golden ship fixtures refreshed for the new preamble.

#### Fixed
- Orphan `12.` numbered-list prefix on the existing CJK rule in `generate-ask-user-format.ts` — refactoring artifact, no items 1-11 above it. Removed.
- `docs/skills.md` missing `/spec` table row (pre-existing miss from PR #1698/#1733 that landed `/spec` to main without updating the doc inventory). Added.

#### For contributors
- 6 resolver tests pin the inline-subsection contract (4-option cap text, Include/Defer/Cut/Hold buckets, D-numbering shape, AUTO_DECIDE runtime gate reference, docs pointer, orphan-12 regression).
- 7 runtime-gate tests in `test/gstack-question-preference.test.ts` cover the carve-out: no-pref baseline, never-ask override, explanatory note text, ask-only-for-one-way override, always-ask (no note), non-split id containing "split" word (negative regex specificity), multi-skill split id formats.
- `parity-baseline-v1.47.0.0.json` captured via `bun run scripts/capture-baseline.ts --tag v1.47.0.0`.

## [1.47.0.0] - 2026-05-26

## **`/spec` ships: turn vague intent into a precise, executable spec in five phases.** Pipe the spec into a spawned Claude Code agent, dedupe against existing issues, archive locally for the team corpus, and let `/ship` close the source issue on merge.
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.47.0.0
1.48.0.0
34 changes: 33 additions & 1 deletion autoplan/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,36 @@ Effort both-scales: when an option involves effort, label both human-team and CC

Net line closes the tradeoff. Per-skill instructions may add stricter rules.

12. **Non-ASCII characters — write directly, never \u-escape.** When any
### Handling 5+ options — split, never drop

AskUserQuestion caps every call at **4 options**. With 5+ real options, NEVER
drop, merge, or silently defer one to fit. Pick a compliant shape:

- **Batch into ≤4-groups** — for coherent alternatives (e.g. version bumps,
layout variants). One call, 5th surfaced only if first 4 don't fit.
- **Split per-option** — for independent scope items (e.g. "ship E1..E6?").
Fire N sequential calls, one per option. Default to this when unsure.

Per-option call shape: `D<N>.k` header (e.g. D3.1..D3.5), ELI10 per option,
Recommendation, kind-note (no completeness score — Include/Defer/Cut/Hold are
decision actions), and 4 buckets:
**A) Include**, **B) Defer**, **C) Cut**, **D) Hold** (stop chain, discuss).

After the chain, fire `D<N>.final` to validate the assembled set (reprompt
dependency conflicts) and confirm shipping it. Use `D<N>.revise-<k>` to
revise one option without re-running the chain.

For N>6, fire a `D<N>.0` meta-AskUserQuestion first (proceed / narrow / batch).

question_ids for split chains: `<skill>-split-<option-slug>` (kebab-case ASCII,
≤64 chars, `-2`/`-3` suffix on collision). The runtime checker
(`bin/gstack-question-preference`) refuses `never-ask` on any `*-split-*` id,
so split chains are never AUTO_DECIDE-eligible — the user's option set is sacred.

**Full rule + worked examples + Hold/dependency semantics:** see
`docs/askuserquestion-split.md` in the gstack repo. Read on demand when N>4.

**Non-ASCII characters — write directly, never \u-escape.** When any
string field (question, option label, option description) contains
Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit
the literal UTF-8 characters in the JSON string. **Never escape them
Expand Down Expand Up @@ -375,6 +404,9 @@ Before calling AskUserQuestion, verify:
- [ ] Net line closes the decision
- [ ] You are calling the tool, not writing prose
- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped
- [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any
- [ ] If you split, you checked dependencies between options before firing the chain
- [ ] If a per-option Hold fires, you stopped the chain immediately (didn't queue)


## Artifacts Sync (skill start)
Expand Down
15 changes: 15 additions & 0 deletions bin/gstack-question-preference
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,21 @@ do_check() {
return;
}

// Split-chain carve-out: per-option calls in N-option splits emit
// question_ids of the form <skill>-split-<option-slug>. These are
// NEVER AUTO_DECIDE-eligible regardless of stored preferences — the
// whole point of splitting is restoring user sovereignty over the
// option set. See scripts/resolvers/preamble/generate-ask-user-format.ts
// \"Handling 5+ options — split, never drop\" for the surrounding
// mechanism that generates these ids.
if (/-split-/.test(qid)) {
console.log('ASK_NORMALLY');
if (pref === 'never-ask' || pref === 'ask-only-for-one-way') {
console.log('NOTE: split-chain per-option calls always ASK_NORMALLY; your ' + pref + ' preference does not apply to options inside a sequential split.');
}
return;
}

switch (pref) {
case 'never-ask':
console.log('AUTO_DECIDE');
Expand Down
34 changes: 33 additions & 1 deletion canary/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,36 @@ Effort both-scales: when an option involves effort, label both human-team and CC

Net line closes the tradeoff. Per-skill instructions may add stricter rules.

12. **Non-ASCII characters — write directly, never \u-escape.** When any
### Handling 5+ options — split, never drop

AskUserQuestion caps every call at **4 options**. With 5+ real options, NEVER
drop, merge, or silently defer one to fit. Pick a compliant shape:

- **Batch into ≤4-groups** — for coherent alternatives (e.g. version bumps,
layout variants). One call, 5th surfaced only if first 4 don't fit.
- **Split per-option** — for independent scope items (e.g. "ship E1..E6?").
Fire N sequential calls, one per option. Default to this when unsure.

Per-option call shape: `D<N>.k` header (e.g. D3.1..D3.5), ELI10 per option,
Recommendation, kind-note (no completeness score — Include/Defer/Cut/Hold are
decision actions), and 4 buckets:
**A) Include**, **B) Defer**, **C) Cut**, **D) Hold** (stop chain, discuss).

After the chain, fire `D<N>.final` to validate the assembled set (reprompt
dependency conflicts) and confirm shipping it. Use `D<N>.revise-<k>` to
revise one option without re-running the chain.

For N>6, fire a `D<N>.0` meta-AskUserQuestion first (proceed / narrow / batch).

question_ids for split chains: `<skill>-split-<option-slug>` (kebab-case ASCII,
≤64 chars, `-2`/`-3` suffix on collision). The runtime checker
(`bin/gstack-question-preference`) refuses `never-ask` on any `*-split-*` id,
so split chains are never AUTO_DECIDE-eligible — the user's option set is sacred.

**Full rule + worked examples + Hold/dependency semantics:** see
`docs/askuserquestion-split.md` in the gstack repo. Read on demand when N>4.

**Non-ASCII characters — write directly, never \u-escape.** When any
string field (question, option label, option description) contains
Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit
the literal UTF-8 characters in the JSON string. **Never escape them
Expand Down Expand Up @@ -367,6 +396,9 @@ Before calling AskUserQuestion, verify:
- [ ] Net line closes the decision
- [ ] You are calling the tool, not writing prose
- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped
- [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any
- [ ] If you split, you checked dependencies between options before firing the chain
- [ ] If a per-option Hold fires, you stopped the chain immediately (didn't queue)


## Artifacts Sync (skill start)
Expand Down
34 changes: 33 additions & 1 deletion codex/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,36 @@ Effort both-scales: when an option involves effort, label both human-team and CC

Net line closes the tradeoff. Per-skill instructions may add stricter rules.

12. **Non-ASCII characters — write directly, never \u-escape.** When any
### Handling 5+ options — split, never drop

AskUserQuestion caps every call at **4 options**. With 5+ real options, NEVER
drop, merge, or silently defer one to fit. Pick a compliant shape:

- **Batch into ≤4-groups** — for coherent alternatives (e.g. version bumps,
layout variants). One call, 5th surfaced only if first 4 don't fit.
- **Split per-option** — for independent scope items (e.g. "ship E1..E6?").
Fire N sequential calls, one per option. Default to this when unsure.

Per-option call shape: `D<N>.k` header (e.g. D3.1..D3.5), ELI10 per option,
Recommendation, kind-note (no completeness score — Include/Defer/Cut/Hold are
decision actions), and 4 buckets:
**A) Include**, **B) Defer**, **C) Cut**, **D) Hold** (stop chain, discuss).

After the chain, fire `D<N>.final` to validate the assembled set (reprompt
dependency conflicts) and confirm shipping it. Use `D<N>.revise-<k>` to
revise one option without re-running the chain.

For N>6, fire a `D<N>.0` meta-AskUserQuestion first (proceed / narrow / batch).

question_ids for split chains: `<skill>-split-<option-slug>` (kebab-case ASCII,
≤64 chars, `-2`/`-3` suffix on collision). The runtime checker
(`bin/gstack-question-preference`) refuses `never-ask` on any `*-split-*` id,
so split chains are never AUTO_DECIDE-eligible — the user's option set is sacred.

**Full rule + worked examples + Hold/dependency semantics:** see
`docs/askuserquestion-split.md` in the gstack repo. Read on demand when N>4.

**Non-ASCII characters — write directly, never \u-escape.** When any
string field (question, option label, option description) contains
Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit
the literal UTF-8 characters in the JSON string. **Never escape them
Expand Down Expand Up @@ -370,6 +399,9 @@ Before calling AskUserQuestion, verify:
- [ ] Net line closes the decision
- [ ] You are calling the tool, not writing prose
- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped
- [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any
- [ ] If you split, you checked dependencies between options before firing the chain
- [ ] If a per-option Hold fires, you stopped the chain immediately (didn't queue)


## Artifacts Sync (skill start)
Expand Down
34 changes: 33 additions & 1 deletion context-restore/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,36 @@ Effort both-scales: when an option involves effort, label both human-team and CC

Net line closes the tradeoff. Per-skill instructions may add stricter rules.

12. **Non-ASCII characters — write directly, never \u-escape.** When any
### Handling 5+ options — split, never drop

AskUserQuestion caps every call at **4 options**. With 5+ real options, NEVER
drop, merge, or silently defer one to fit. Pick a compliant shape:

- **Batch into ≤4-groups** — for coherent alternatives (e.g. version bumps,
layout variants). One call, 5th surfaced only if first 4 don't fit.
- **Split per-option** — for independent scope items (e.g. "ship E1..E6?").
Fire N sequential calls, one per option. Default to this when unsure.

Per-option call shape: `D<N>.k` header (e.g. D3.1..D3.5), ELI10 per option,
Recommendation, kind-note (no completeness score — Include/Defer/Cut/Hold are
decision actions), and 4 buckets:
**A) Include**, **B) Defer**, **C) Cut**, **D) Hold** (stop chain, discuss).

After the chain, fire `D<N>.final` to validate the assembled set (reprompt
dependency conflicts) and confirm shipping it. Use `D<N>.revise-<k>` to
revise one option without re-running the chain.

For N>6, fire a `D<N>.0` meta-AskUserQuestion first (proceed / narrow / batch).

question_ids for split chains: `<skill>-split-<option-slug>` (kebab-case ASCII,
≤64 chars, `-2`/`-3` suffix on collision). The runtime checker
(`bin/gstack-question-preference`) refuses `never-ask` on any `*-split-*` id,
so split chains are never AUTO_DECIDE-eligible — the user's option set is sacred.

**Full rule + worked examples + Hold/dependency semantics:** see
`docs/askuserquestion-split.md` in the gstack repo. Read on demand when N>4.

**Non-ASCII characters — write directly, never \u-escape.** When any
string field (question, option label, option description) contains
Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit
the literal UTF-8 characters in the JSON string. **Never escape them
Expand Down Expand Up @@ -371,6 +400,9 @@ Before calling AskUserQuestion, verify:
- [ ] Net line closes the decision
- [ ] You are calling the tool, not writing prose
- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped
- [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any
- [ ] If you split, you checked dependencies between options before firing the chain
- [ ] If a per-option Hold fires, you stopped the chain immediately (didn't queue)


## Artifacts Sync (skill start)
Expand Down
Loading
Loading