You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v1.57.2.0 feat: AskUserQuestion prose fallback when the tool fails at runtime (garrytan#1908)
* feat(auq): add gstack-session-kind + echo SESSION_KIND in preamble
Classifies the session as spawned | headless | interactive from env markers
(OPENCLAW_SESSION / GSTACK_HEADLESS / CONDUCTOR_* / CLAUDE_CODE_ENTRYPOINT / CI),
defaulting to interactive. Echoed once at skill start alongside BRANCH/REPO_MODE
so the AskUserQuestion-failure fallback can branch without a shell-out at failure
time. Degrade-safe: empty/error => interactive.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(auq): prose fallback when AskUserQuestion fails (interactive sessions)
On a genuine AUQ failure (tool absent, or present-but-erroring like Conductor's
flaky MCP returning '[Tool result missing due to internal error]'): retry once,
then branch on SESSION_KIND — spawned auto-chooses, headless BLOCKs, interactive
renders a prose decision brief the user answers by typing a letter.
The prose fallback MUST surface the triad: a clear ELI10 of the issue, a
per-choice Completeness score, and a recommendation+why (one paragraph per
choice). Carves out the [plan-tune auto-decide] denial as NOT a failure, and
qualifies the former 'tool_use, not prose' assertions so the rule isn't
self-contradicting. Tests pin the triad, the SESSION_KIND branch, the OV2
collision guard, the always-loaded guarantee, and a cross-file invariant on the
auto-decide prefix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(auq): default GSTACK_HEADLESS=1 in eval/E2E runners
Headless harness runs classify as headless (BLOCK on AUQ failure rather than
emit a prose question no one reads). SDK runner uses ambient mutation, not the
Options.env object, to avoid breaking the SDK auth pipeline. Interactive-path
suites opt out by overriding the env per-run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(auq): defensive PostToolUse error-fallback hook (OV3:B)
When an AskUserQuestion call returns an error/missing result, this hook injects
additionalContext reminding the model to run the prose fallback for the current
SESSION_KIND. It does not render prose itself — it guarantees the reminder fires
at the moment of failure instead of relying on the model recalling SESSION_KIND.
Inert on success and inert if the platform never invokes PostToolUse on tool
errors (unverified — could not force the Conductor MCP error in a harness; see
the spike doc). The prompt-level fallback covers the case regardless. Decision
logic is unit-tested deterministically; registered in setup beside the existing
AUQ hooks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore(auq): regenerate SKILL.md for all hosts + refresh ship goldens
Regenerated from the resolver changes (gen:skill-docs --host all). Refreshes the
byte-exact ship golden fixtures (claude/codex/factory). Spec prose tightened so
the cross-cutting preamble addition stays under the 5% per-skill parity ceiling
(investigate 4.8%) — guard unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(test): kebab testNames for section-loading E2Es to match TOUCHFILES keys
The two section-loading E2E tests used display-form testNames ('/ship
section-loading', '/plan-ceo-review section-loading') while every other E2E
testName and their E2E_TOUCHFILES keys are kebab. The completeness gate does an
exact `name in E2E_TOUCHFILES` check, so it failed (pre-existing on main); diff-
based selection also couldn't match them. Align to ship-section-loading /
plan-ceo-section-loading.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(test): make external-host freshness checks deterministic
The parameterized host smoke + --host all freshness tests assumed an external
`gen:skill-docs --host all` had run first (it never does in `bun test`), so which
host reported STALE varied by sibling-test timing — flaky. Regenerate the
gitignored external host dirs in a beforeAll so the --dry-run check is
deterministic. It still catches non-deterministic generation (the real bug class
for regenerated outputs); the tracked-claude freshness test runs earlier and is
unaffected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(parity): headroom for AUQ cross-cutting addition on carved document-release
Merging main brought the carve of document-release (smaller skeleton); the AUQ
prose-fallback adds ~2KB to every skill's always-loaded preamble, landing
document-release at ~5.9% over the pre-carve v1.53.0.0 baseline. Add a per-carve
maxSizeRatio override (CARVE_GUARDS single source of truth) and bump only this
skill to 1.08. All other skills keep the strict 1.05 ceiling.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(auq): harden error-fallback hook + harness per adversarial review
Codex pre-landing review found three real issues:
- The PostToolUse fallback hook shared source 'plan-tune-cathedral' with the
question-log hook (same event+matcher); gstack-settings-hook replaces the entry,
so it would have clobbered plan-tune capture. Give it its own 'auq-error-fallback'
source (separate entry, both run); ALREADY_INSTALLED now requires both sources.
- isErrorResponse triggered on any string containing 'internal error'/'is_error',
so a real answer or a {"is_error": false} payload could fire the fallback after a
successful question. Narrow it to the missing-result sentinel + boolean is_error.
- The SDK runner mutated process.env.GSTACK_HEADLESS process-wide (leaked headless
into later tests). Removed; GSTACK_HEADLESS=1 now lives in the eval package.json
scripts, scoped to the invocation and inherited by the SDK child.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.57.2.0)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
@@ -124,7 +127,7 @@ In plan mode, allowed because they inform the plan: `$B`, `$D`, `codex exec`/`co
124
127
125
128
## Skill Invocation During Plan Mode
126
129
127
-
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — `mcp__*__AskUserQuestion` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If no variant is callable, the skill is BLOCKED — stop and report `BLOCKED — AskUserQuestion unavailable` per the AskUserQuestion Format rule. At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.
130
+
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — `mcp__*__AskUserQuestion` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If AskUserQuestion is unavailable or a call fails, follow the AskUserQuestion Format failure fallback: `headless` → BLOCKED; `interactive` → the prose fallback (also satisfies end-of-turn). At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.
128
131
129
132
If `PROACTIVE` is `"false"`, do not auto-invoke or proactively suggest skills. If a skill seems useful, ask: "I think /skillname might help here — want me to run it?"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
@@ -133,7 +136,7 @@ In plan mode, allowed because they inform the plan: `$B`, `$D`, `codex exec`/`co
133
136
134
137
## Skill Invocation During Plan Mode
135
138
136
-
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — `mcp__*__AskUserQuestion` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If no variant is callable, the skill is BLOCKED — stop and report `BLOCKED — AskUserQuestion unavailable` per the AskUserQuestion Format rule. At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.
139
+
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — `mcp__*__AskUserQuestion` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If AskUserQuestion is unavailable or a call fails, follow the AskUserQuestion Format failure fallback: `headless` → BLOCKED; `interactive` → the prose fallback (also satisfies end-of-turn). At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.
137
140
138
141
If `PROACTIVE` is `"false"`, do not auto-invoke or proactively suggest skills. If a skill seems useful, ask: "I think /skillname might help here — want me to run it?"
139
142
@@ -305,11 +308,31 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
305
308
306
309
**Rule:** if any `mcp__*__AskUserQuestion` variant is in your tool list, prefer it. Hosts may disable native AUQ via `--disallowedTools AskUserQuestion` (Conductor does, by default) and route through their MCP variant; calling native there silently fails. Same questions/options shape; same decision-brief format applies.
307
310
308
-
**If no AskUserQuestion variant appears in your tool list, this skill is BLOCKED.** Stop, report `BLOCKED — AskUserQuestion unavailable`, and wait for the user. Do not write decisions to the plan file as a substitute, do not emit them as prose and stop, and do not silently auto-decide (only `/plan-tune` AUTO_DECIDE opt-ins authorize auto-picking).
311
+
If AskUserQuestion is unavailable (no variant in your tool list) OR a call to it fails, do NOT silently auto-decide or write the decision to the plan file as a substitute. Follow the **failure fallback** below.
312
+
313
+
### When AskUserQuestion is unavailable or a call fails
314
+
315
+
Tell three outcomes apart:
316
+
317
+
1.**Auto-decide denial (NOT a failure).** The result contains `[plan-tune auto-decide] <id> → <option>` — the preference hook working as designed. Proceed with that option. Do NOT retry, do NOT fall back to prose.
318
+
2.**Genuine failure** — no variant in your tool list, OR the variant is present but the call returns an error / missing result (MCP transport error, empty result, host bug — e.g. Conductor's MCP AskUserQuestion is flaky and returns `[Tool result missing due to internal error]`).
319
+
- If it was present and **errored** (not absent), retry the SAME call **once** — but only if no answer could have surfaced (a missing-result error can arrive after the user already saw the question; retrying would double-prompt, so if it may have reached them, treat as pending, don't retry).
320
+
- Then branch on `SESSION_KIND` (echoed by the preamble; empty/absent ⇒ `interactive`):
321
+
-`spawned` → defer to the **Spawned session** block: auto-choose the recommended option. Never prose, never BLOCKED.
322
+
-`headless` → `BLOCKED — AskUserQuestion unavailable`; stop and wait (no human can answer).
323
+
-`interactive` → **prose fallback** (below).
324
+
325
+
**Prose fallback — render the decision brief as a markdown message, not a tool call.** Same information as the tool format below, different structure (paragraphs, not ✅/❌ bullets). It MUST surface this triad:
326
+
327
+
1.**A clear ELI10 of the issue itself** — plain English on what's being decided and why it matters (the question, not per-choice), naming the stakes. Lead with it.
328
+
2.**Completeness scores per choice** — explicit `Completeness: X/10` on EACH choice (10 complete, 7 happy-path, 3 shortcut); use the kind-note when options differ in kind not coverage, but never silently drop the score.
329
+
3.**The recommendation and why** — a `Recommendation: <choice> because <reason>` line plus the `(recommended)` marker on that choice.
330
+
331
+
Layout: a `D<N>` title + a one-line note that AskUserQuestion failed and to reply with a letter; the issue ELI10; the Recommendation line; then ONE paragraph per choice carrying its `(recommended)` marker, its `Completeness: X/10`, and 2-4 sentences of reasoning — never a bare bullet list; a closing `Net:` line. Split chains / 5+ options: one prose block per per-option call, in sequence. Then STOP and wait — the user's typed answer is the decision. In plan mode this satisfies end-of-turn like a tool call.
309
332
310
333
### Format
311
334
312
-
Every AskUserQuestion is a decision brief and must be sent as tool_use, not prose.
335
+
Every AskUserQuestion is a decision brief and must be sent as tool_use, not prose — unless the documented failure fallback above applies (interactive session + the call is unavailable/erroring), in which case the prose fallback is the correct output.
313
336
314
337
```
315
338
D<N> — <one-line question title>
@@ -389,7 +412,7 @@ Before calling AskUserQuestion, verify:
389
412
-[ ] (recommended) label on one option (even for neutral-posture)
-[ ] You are calling the tool, not writing prose — unless the documented failure fallback applies (then: prose with the mandatory triad — issue ELI10, per-choice Completeness, Recommendation + `(recommended)` — and a "reply with a letter" instruction, then STOP)
393
416
-[ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped
394
417
-[ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any
395
418
-[ ] If you split, you checked dependencies between options before firing the chain
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
@@ -127,7 +130,7 @@ In plan mode, allowed because they inform the plan: `$B`, `$D`, `codex exec`/`co
127
130
128
131
## Skill Invocation During Plan Mode
129
132
130
-
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — `mcp__*__AskUserQuestion` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If no variant is callable, the skill is BLOCKED — stop and report `BLOCKED — AskUserQuestion unavailable` per the AskUserQuestion Format rule. At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.
133
+
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — `mcp__*__AskUserQuestion` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If AskUserQuestion is unavailable or a call fails, follow the AskUserQuestion Format failure fallback: `headless` → BLOCKED; `interactive` → the prose fallback (also satisfies end-of-turn). At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.
131
134
132
135
If `PROACTIVE` is `"false"`, do not auto-invoke or proactively suggest skills. If a skill seems useful, ask: "I think /skillname might help here — want me to run it?"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
@@ -127,7 +130,7 @@ In plan mode, allowed because they inform the plan: `$B`, `$D`, `codex exec`/`co
127
130
128
131
## Skill Invocation During Plan Mode
129
132
130
-
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — `mcp__*__AskUserQuestion` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If no variant is callable, the skill is BLOCKED — stop and report `BLOCKED — AskUserQuestion unavailable` per the AskUserQuestion Format rule. At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.
133
+
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — `mcp__*__AskUserQuestion` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If AskUserQuestion is unavailable or a call fails, follow the AskUserQuestion Format failure fallback: `headless` → BLOCKED; `interactive` → the prose fallback (also satisfies end-of-turn). At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.
131
134
132
135
If `PROACTIVE` is `"false"`, do not auto-invoke or proactively suggest skills. If a skill seems useful, ask: "I think /skillname might help here — want me to run it?"
0 commit comments