Skip to content

feat(cli): add session-scoped /goal command with judge-driven turn continuation#4123

Open
qqqys wants to merge 11 commits into
QwenLM:mainfrom
qqqys:feat/goal-command-4074-rebuild
Open

feat(cli): add session-scoped /goal command with judge-driven turn continuation#4123
qqqys wants to merge 11 commits into
QwenLM:mainfrom
qqqys:feat/goal-command-4074-rebuild

Conversation

@qqqys
Copy link
Copy Markdown
Collaborator

@qqqys qqqys commented May 13, 2026

Summary

  • What changed: added a built-in /goal <condition> slash command that pins a free-form session-scoped objective. While a goal is active, an LLM judge runs at every Stop boundary and either lets the turn end (condition met) or feeds the judge's reason back as the next user prompt so the model keeps working. Auto-clears on success; /goal clear cancels early. Same primitive as Anthropic's Claude Code 2.1.140 /goal.
  • Why it changed: implements Add /goal slash command (session-scoped Stop hook + LLM-judge auto-clear) #4074. Collapses the "user re-prompts 'keep going' 3-6 times per long task" UX into a single /goal … action. Built entirely on qwen-code's existing Stop-hook + function-hook plumbing — no new subsystem.
  • Reviewer focus:
    1. packages/core/src/goals/ — the new module. goalJudge.ts (transcript-grounded judge with structured-output + disabled thinking), goalHook.ts (function hook on Stop), activeGoalStore.ts (per-session state + terminal-observer channel for UI bridging).
    2. packages/core/src/hooks/hookSystem.ts + sessionHooksManager.tshasHooksForEvent(eventName, sessionId?) now also counts session-scoped function hooks. Pre-existing latent bug: SDK / programmatic Stop function hooks were silently gated out by the fast-path check in client.ts, so they never fired.
    3. packages/core/src/core/client.ts (StopHookLoop emission) — yields on every Stop-hook continuation, not just iter > 1. First not-met turn is now visible.
    4. packages/cli/src/ui/hooks/useGeminiStream.ts (Finished handler) — commits the pending history item + clears thoughtBuffer / geminiMessageBuffer at every turn boundary so a Stop-hook continuation doesn't bleed its first chunk into the previous turn's pending UI buffer (the persisted transcript was always clean — this was a stream-buffer issue).
    5. packages/cli/src/ui/commands/goalCommand.ts + restoreGoal.ts — the command surface, plus resume-time rehydration of the active hook and the last-completed-goal summary cache from transcript.
newgoal

Validation

  • Commands run:
    npx tsc --noEmit -p packages/core              #
    npx tsc --noEmit -p packages/cli               #
    cd packages/core && npx vitest run             # all green (1181 tests, incl. new goals/ suite)
    cd packages/cli  && npx vitest run \
      src/ui/commands/goalCommand.test.ts \
      src/ui/utils/restoreGoal.test.ts \
      src/ui/components/GoalPill.test.tsx \
      src/services/BuiltinCommandLoader.test.ts \
      src/ui/utils/historyUtils.test.ts \
      src/i18n/mustTranslateKeys.test.ts            # 67 tests ✓
    npx eslint <touched-files>                      # clean
  • Prompts used (end-to-end smoke):
    > /goal 输出test,每个对话单独输出一个字母,直到完成
    ◎ Goal set
      Goal: 输出test,每个对话单独输出一个字母,直到完成
    
    ✦ t
    ○ Goal check · turn 1 · not yet met
    
    ✦ e
    ○ Goal check · turn 2 · not yet met
    
    ✦ s
    ○ Goal check · turn 3 · not yet met
    
    ✦ t
    ✓ Goal achieved · 3 turns · 22s
      Goal: 输出test,每个对话单独输出一个字母,直到完成
    
    > /goal
    ✓ Goal achieved · 3 turns · 22s
      Goal: 输出test,每个对话单独输出一个字母,直到完成
    
  • Expected result: judge gates each Stop boundary; the model emits letters t / e / s / t across four assistant turns; the goal auto-clears; subsequent empty /goal shows the achievement card.
  • Observed result: matches.
  • Quickest reviewer verification path:
    1. cd packages/core && npx vitest run src/goals/
    2. In an interactive session: /goal 输出A、B、C,每个对话单独输出一个字母 → three turns each producing one letter + auto-clear.

Scope / Risk

  • Main risk / tradeoff: HookSystem.hasHooksForEvent now considers session-scoped function hooks. Code that registered a Stop function hook expecting the prior "silently no-ops" behavior would now see its hook actually fire. Realistically that prior behavior was a latent bug; any code that registered a function hook wanted it to run. The two affected tests in config.test.ts are updated to pin the new (eventName, sessionId) signature.
  • Finished-event buffer reset: the new commit-pending + clear-buffers logic in useGeminiStream runs on every turn boundary. Verified against the existing useGeminiStream.test.tsx suite (91 tests) — no regressions.
  • StopHookLoop on first iteration: removing the > 1 guard means user-configured Stop hooks now also surface their blocking reason on the first fire. Previously the first not-met was silent, which masked configured hooks too. The UI rendering for non-goal Stop hooks is unchanged (● Ran N stop hooks ⎿ Stop hook error: …); only the visibility threshold dropped.
  • Persisting goals across /resume: in scope. restoreGoalFromHistory scans the loaded transcript for the most recent set (to re-arm the hook) and the most recent achieved/aborted (to repopulate the summary cache). /clear, /branch, /new continue to wipe goal state via session-id rotation.
  • Subagents: out of scope. The hook is registered against the main session id on the Stop event only; subagents have their own session id and fire SubagentStop, so they cannot trigger or interact with an active goal.
  • Breaking changes / migration: none. New command, new exports, additive optional sessionId arg on HookSystem.hasHooksForEvent (existing single-arg call sites keep working).

Testing Matrix

🍏 🪟 🐧
npm run ⚠️ ⚠️
npx ⚠️ ⚠️ ⚠️
Docker ⚠️ ⚠️ ⚠️
Podman ⚠️ N/A N/A
Seatbelt ⚠️ N/A N/A

macOS arm64 (M-series) interactive run + the unit test suite were exercised locally. The only platform-specific surface added is the Footer pill glyph () and the history card glyphs (, , !) — plain Unicode, no platform-specific behavior expected.

Linked Issues / Bugs

Closes #4074

* last goal_status was a terminal state (achieved / cleared / aborted) or
* none exists.
*/
export function findGoalToRestore(history: HistoryItem[]): string | null {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] findGoalToRestore() only restores a goal when the latest goal_status item is kind === 'set'. During a normal active goal loop, each not-met Stop hook iteration appends a checking status, so resuming a session after any check treats the goal as terminal and unregisters it even though there is no achieved, cleared, or aborted record. This silently drops an in-progress /goal across /resume. Please treat checking as a non-terminal active state, or scan backward past checking entries until reaching a real terminal state or the matching set.

— gpt-5.5 via Qwen Code /review

Comment thread packages/core/src/goals/goalHook.ts Outdated
}

if (current.iterations >= MAX_GOAL_ITERATIONS) {
debugLogger.debug(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The max-iteration guard runs before judgeGoal(), so when the next Stop event starts with iterations >= MAX_GOAL_ITERATIONS, the hook aborts without judging the latest assistant output. That means the final generated turn can satisfy the goal but still be reported as Goal max iterations reached and cleared as aborted. Please run the judge first and only abort after a still-not-met verdict at the limit, or otherwise allow one final evaluation before aborting.

— gpt-5.5 via Qwen Code /review

// Stop hook callback. The core side clears the observer on terminal /
// unregister so we don't accumulate stale closures across goals.
const { addItem } = context.ui;
setGoalTerminalObserver(sessionId, (event: GoalTerminalEvent) => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Terminal goal events emitted later from the Stop hook are only added to in-memory UI history via addItem(). They do not go through the slash-command processor's recorded-items path and are not written to ChatRecordingService, unlike other async UI events such as auto /recap. As a result, /resume can miss the achieved/aborted card and may see only the earlier set/checking entries, causing a completed goal to be restored as active. Please also persist these terminal goal_status items through recordSlashCommand({ phase: 'result', ... }) when the observer fires.

— gpt-5.5 via Qwen Code /review

* silently skip restoration rather than re-register a goal the user can no
* longer cancel.
*/
export function restoreGoalFromHistory(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Restoring a goal only re-registers the core Stop hook; it does not reinstall the terminal observer that fresh /goal setup installs. A restored goal can later achieve or abort, but no observer will add/persist the terminal goal_status card, so future resume/summary behavior loses the actual terminal state. Please share the observer setup with the fresh /goal path or pass an observer/addItem callback into restoration so restored goals have the same terminal lifecycle.

— gpt-5.5 via Qwen Code /review

const signal = context?.signal ?? new AbortController().signal;
const verdict = await judgeGoal(config, {
condition,
lastAssistantText,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The automatic achieved and max-iteration aborted paths clear the active goal store and observer, but they never remove the registered Stop function hook from HookSystem. After completion, later Stop events still execute the stale callback (it just returns continue: true because the active goal is gone), and repeated goals can accumulate dead session hooks. Please remove the function hook before clearing the active goal, or use a cleanup helper that removes by hookId and then clears the store/observer.

— gpt-5.5 via Qwen Code /review

},
argumentHint: '[<condition> | clear]',
kind: CommandKind.BUILT_IN,
supportedModes: ['interactive', 'non_interactive'] as const,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] /goal is advertised as supported in non_interactive, but the implementation depends on interactive UI history, terminal observers, and a continuing turn loop. In headless usage such as qwen -p "/goal ...", the command can appear supported while arming a session hook and producing UI-only lifecycle state that the caller cannot inspect or clear. Please restrict this command to interactive mode, or add a tested non-interactive lifecycle with deterministic recording and cleanup semantics.

— gpt-5.5 via Qwen Code /review

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two cross-file Critical issues (not mappable to a single diff line):

  1. Function hook timeout silently kills goal loop. The FunctionHookRunner.executeWithTimeout (in functionHookRunner.ts, not in this diff) wraps the goal callback in Promise.race with a 30s timeout. When the judge API call exceeds 30s, the timeout reject wins — the callback never returns {decision:'block', reason:…}, so stopOutput is undefined and the Stop hook loop never continues. The goal stays active, and the same timeout repeats on every subsequent Stop event in an invisible infinite loop. No error surfaces to the user — just ◎ /goal active forever.

  2. Judge failures have no observable production traces. Every failure path in judgeGoal (API errors, parse failures, empty responses) uses only debugLogger.debug. There is no reportError, no telemetry span, no user-visible warning. If getFastModel() is misconfigured and the judge API fails 50 times in a row, the goal silently aborts with the fallback reason — and the operator has no way to know why.

These are in addition to the inline findings below.

function capContent(content: Content): Content {
if (!content.parts) return content;
return {
...content,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] capContent only truncates text parts. functionResponse and functionCall parts pass through at full size, which means large tool outputs (e.g., run_shell_command returning 50KB logs, read_file loading source files) are sent verbatim to the judge model on every Stop boundary.

This inflates token consumption per judge call by up to an order of magnitude on turns with tool use. Since the judge already disables extended thinking (thinkingBudget: 0), the dominant cost driver is input tokens — and those are uncontrolled for non-text parts.

Suggested change
...content,
function capContent(content: Content): Content {
if (!content.parts) return content;
return {
...content,
parts: content.parts.map((p) => {
if (typeof p.text === 'string') {
return p.text.length > TRANSCRIPT_PART_CHAR_CAP
? { ...p, text: p.text.slice(0, TRANSCRIPT_PART_CHAR_CAP) + '…[truncated]' }
: p;
}
// Truncate function responses to avoid blowing up judge context
if (p.functionResponse) {
const resp = p.functionResponse;
if (typeof resp.response === 'object' && resp.response !== null) {
const text = JSON.stringify(resp.response);
if (text.length > TRANSCRIPT_PART_CHAR_CAP) {
return {
...p,
functionResponse: {
...resp,
response: { truncated: true, preview: text.slice(0, TRANSCRIPT_PART_CHAR_CAP) },
},
};
}
}
}
return p;
}),
};
}

— deepseek-v4-pro via Qwen Code /review

Comment thread packages/core/src/goals/goalHook.ts Outdated
const hookId = system.addFunctionHook(
sessionId,
HookEventName.Stop,
'',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The goal hook is registered with an empty string matcher '', which relies on matchesPattern('', '') returning true (via new RegExp('^$').test('')). If the Stop event firing path ever passes a non-empty toolName/target (e.g., a future refactor that distinguishes Stop-by-tool from generic Stop), the hook would silently stop matching — matchesPattern('', 'Bash') returns false.

The hook would remain registered, hasHooksForEvent would still return true (because the hook IS registered), but the event would never call the callback. No error, no log — the goal just silently stops evaluating.

Suggested change
'',
const hookId = system.addFunctionHook(
sessionId,
HookEventName.Stop,
'*', // Match any Stop event regardless of toolName
callback,
'Goal evaluator failed',

Alternatively, handle empty-string matcher explicitly in SessionHooksManager.getMatchingHooks as a wildcard.

— deepseek-v4-pro via Qwen Code /review

Comment thread packages/core/src/goals/goalJudge.ts Outdated
const parsed = parseJudgeReply(text);
if (!parsed) {
debugLogger.debug(
`Goal judge reply not parseable as JSON: ${text.slice(0, 200)}`,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] debugLogger.debug logs the raw judge model response's first 200 characters when JSON parsing fails:

debugLogger.debug(
  `Goal judge reply not parseable as JSON: ${text.slice(0, 200)}`,
);

The judge's system prompt instructs it to "quote specific text from the transcript" as evidence. If the model outputs a text preamble before the JSON (despite structured-output mode), those 200 characters may contain snippets of the actual conversation — which could include file contents, keys, or other sensitive data.

Suggested change
`Goal judge reply not parseable as JSON: ${text.slice(0, 200)}`,
debugLogger.debug(
`Goal judge reply not parseable as JSON (first 80 chars): ${JSON.stringify(text.slice(0, 80))}`,
);

Using JSON.stringify prevents injection of newlines/log-forgery into the log stream, and reducing to 80 chars still gives enough context for debugging.

— deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeScript Type Errors (pre-existing lines affected by type changes)

These tsc --noEmit errors are in unchanged code but triggered by type changes elsewhere:

  • packages/cli/src/config/config.test.ts:1867,1872mcpServers is possibly undefined. Add null check or optional chaining.
  • packages/cli/src/config/config.test.ts:1885,1899 — Object is possibly undefined. Add undefined guard.
  • packages/cli/src/ui/components/InputPrompt.test.tsx:184 — Missing required property midInputGhostText in mock. Add midInputGhostText: ''.
  • packages/cli/src/ui/components/InputPrompt.test.tsx:2057,2073vimModeEnabled does not exist on InputPromptProps. Remove from test props or add to interface.
  • packages/cli/src/ui/components/InputPrompt.test.tsx:3511 — Invalid cast to UIState. Cast via as unknown as UIState or provide all required fields.
  • packages/cli/src/ui/components/ModelStatsDisplay.test.tsx:437tool does not exist in the stats type. Fix the property name.

— deepseek-v4-pro via Qwen Code /review

.getSessionHooksManager()
.getHooksForEvent(SESSION, HookEventName.Stop)[0];
expect(sessionHook).toBeDefined();
expect(sessionHook.matcher).toBe('');
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Test expects sessionHook.matcher to be '', but registerGoalHook now passes '*' as the matcher. This causes a test failure.

Suggested change
expect(sessionHook.matcher).toBe('');
expect(sessionHook.matcher).toBe('*');

— deepseek-v4-pro via Qwen Code /review


const goalInstructionPrompt = (condition: string): string =>
`A session-scoped Stop hook is now active with condition: "${condition}". ` +
`Briefly acknowledge the goal, then immediately start (or continue) working ` +
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The goal condition is interpolated directly into the main model prompt via "${condition}" without sanitization. Conditions containing unescaped double quotes or newlines break the quoting structure, potentially allowing prompt injection into the main model (which has no structured-output barrier like the judge).

Suggested change
`Briefly acknowledge the goal, then immediately start (or continue) working ` +
const goalInstructionPrompt = (condition: string): string => {
const sanitized = condition.replace(/"/g, '\\"').replace(/\n/g, ' ');
return `A session-scoped Stop hook is now active with condition: "${sanitized}". ` +
`Briefly acknowledge the goal, then immediately start (or continue) working ` +
`toward it — treat the condition itself as your directive...`;
};

— deepseek-v4-pro via Qwen Code /review

@qqqys
Copy link
Copy Markdown
Collaborator Author

qqqys commented May 14, 2026

Pushed 02173b7 to address the latest review:

  • goalCommand.ts:39 condition sanitization — added sanitizeConditionForPrompt (collapse newlines → space, downgrade embedded "') before splicing into the instruction template. Surrounding quote structure now stays intact regardless of user input.
  • goalLoop.integration.test.ts:111 matcher assertion — updated to '*' to match the current registerGoalHook contract; the doc comment above also switched from "empty matcher" to "wildcard matcher".

On the TS errors in the review body — I cannot reproduce any of them:

$ npm run typecheck --workspace packages/cli --workspace packages/core
> tsc --noEmit
(clean exit)

Quick spot-check of the cited lines on the tip of this branch:

  • packages/cli/src/config/config.test.ts:1867,1872,1885,1899mcpServers accessor is typed correctly; no possibly-undefined error.
  • packages/cli/src/ui/components/InputPrompt.test.tsx:184 — no midInputGhostText requirement triggered on that line in the current source.
  • packages/cli/src/ui/components/InputPrompt.test.tsx:2057,2073vimModeEnabled is a valid prop on InputPromptProps (referenced in two places).
  • packages/cli/src/ui/components/InputPrompt.test.tsx:3511 — current cast site does not exist at that line; nothing untyped.
  • packages/cli/src/ui/components/ModelStatsDisplay.test.tsx:437tool field still exists in the stats shape.

Treating those 6 entries as a /review false positive. Happy to dig further if anyone can reproduce locally with the exact tsc invocation that surfaced them.


const formatTurns = (n: number) => `${n} ${n === 1 ? 'turn' : 'turns'}`;

function formatDuration(ms: number): string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] formatDuration is duplicated in two files (goalCommand.ts:54 and GoalStatusMessage.tsx:22) instead of reusing the shared utility at packages/cli/src/ui/utils/formatters.ts. The shared version already supports hideTrailingZeros option (for integer-second rendering like 5s vs 5.0s) and has its own test suite.

Suggested change
function formatDuration(ms: number): string {
// In goalCommand.ts, replace the local formatDuration with:
import { formatDuration } from '../utils/formatters.js';
// Call: formatDuration(ms, { hideTrailingZeros: true })
// In GoalStatusMessage.tsx, replace the local formatDuration with:
import { formatDuration } from '../../utils/formatters.js';
// Call: formatDuration(ms, { hideTrailingZeros: true })

— DeepSeek/deepseek-v4-pro via Qwen Code /review

};

export interface JudgeResult {
ok: boolean;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Judge prompt does not sanitize the goal condition before interpolation.

sanitizeConditionForPrompt is applied to the main model prompt in goalCommand.ts:46, but the judge path in userJudgementPrompt (line 57) interpolates the raw condition: Condition: ${condition}. Conditions containing newlines or embedded double-quotes can break the judge prompt structure, potentially causing format errors or confusing the judge model.

Suggested change
ok: boolean;
const userJudgementPrompt = (condition: string): string =>
`Based on the conversation transcript above, has the following stopping ` +
`condition been satisfied? Answer based on transcript evidence only.\n` +
`Condition: ${condition.replace(/[\r\n]+/g, ' ').replace(/"/g, "'")}`;

— DeepSeek/deepseek-v4-pro via Qwen Code /review

): Promise<Awaited<ReturnType<typeof judgeGoal>>> {
let timeoutId: ReturnType<typeof setTimeout> | undefined;
try {
return await Promise.race([
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] judgeGoalWithTimeout returns a fallback on timeout but does not abort the underlying generateContent API call.

The Promise.race resolves with {ok: false} when the 25s timeout fires, but judgeGoal → client.generateContent() continues running in the background. The args.signal passed through is the hook context signal — never aborted by the timeout path. Each timeout wastes an API call, and accumulated timeouts in a goal loop can leave multiple dangling requests.

Suggested change
return await Promise.race([
const judgeController = new AbortController();
const linkedSignal = AbortSignal.any([args.signal, judgeController.signal]);
// Pass linkedSignal to judgeGoal; abort controller in timeout path:
timeoutId = setTimeout(() => {
judgeController.abort();
resolve({ ok: false, reason: GOAL_JUDGE_TIMEOUT_REASON });
}, GOAL_JUDGE_TIMEOUT_MS);

— DeepSeek/deepseek-v4-pro via Qwen Code /review

* Default budget (seconds) for a single goal-judge LLM call. Mirrors Claude
* Code 2.1.140's prompt-hook default of 30s (see `cRK` in the binary, which
* reads `H.timeout ? H.timeout * 1000 : 30000`).
*
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Nested timeouts (25s inner resolve vs 30s outer reject) can invert order under event-loop congestion.

judgeGoalWithTimeout races an inner 25s fallback (GOAL_JUDGE_TIMEOUT_MS) against the judge call, while FunctionHookRunner.executeWithTimeout races a 30s reject (GOAL_HOOK_TIMEOUT_MS) against the hook callback. If the event loop is congested and delays the inner timer by more than 5s, the outer reject fires first, returning {success: false, outcome: 'non_blocking_error'} — the goal loop dies silently with no terminal card and no user-facing error.

Consider increasing the outer timeout to provide a larger buffer (e.g., 35s) or relying solely on the outer timeout instead of nesting two independent Promise.race calls.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

// a per-iteration history trail (familiar to its other progress
// indicators) but drops the `error:` framing — a not-met judge is the
// *expected* outcome of every continuation, not a failure.
const activeGoal = getActiveGoal(config.getSessionId());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Non-goal Stop hook blocking reasons are discarded when a goal is active.

When getActiveGoal(sessionId) returns a truthy value, handleStopHookLoopEvent renders a goal_status kind:'checking' item and returns early, dropping value.reasons from any user-configured Stop hooks. Users who combine /goal with custom Stop hooks (e.g., tool gates) lose visibility into their own hooks' blocking reasons for the entire goal loop.

Suggested change
const activeGoal = getActiveGoal(config.getSessionId());
const activeGoal = getActiveGoal(config.getSessionId());
if (activeGoal && activeGoal.condition) {
addItem({ type: MessageType.GOAL_STATUS, kind: 'checking', ... }, now);
// Also surface non-goal hook reasons when they co-exist
const nonGoalReasons = value.stopHookCount > 1
? value.reasons.filter((_, i) => i < value.reasons.length - 1) : [];
if (nonGoalReasons.length > 0) {
addItem({ type: 'stop_hook_loop', reasons: nonGoalReasons, ... }, now);
}
return;
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

useEffect(() => {
const id = setInterval(() => {
const next = getActiveGoal(sessionId);
setGoal(next);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] GoalPill polling interval runs continuously even when no goal is active.

useActiveGoal creates a setInterval at 1s granularity that only clears on component unmount or sessionId change. Once a goal completes (activeGoal becomes undefined), the interval keeps polling indefinitely with no benefit. In a long-running session, this wastes timer and React reconciliation cycles.

Suggested change
setGoal(next);
useEffect(() => {
const current = getActiveGoal(sessionId);
if (!current) { setGoal(undefined); return; }
setGoal(current);
const id = setInterval(() => {
const next = getActiveGoal(sessionId);
setGoal(next);
if (next) setTick((t) => (t + 1) % 1_000_000);
}, POLL_INTERVAL_MS);
return () => clearInterval(id);
}, [sessionId]);

— DeepSeek/deepseek-v4-pro via Qwen Code /review

truncated: true,
originalLength: serialized.length,
preview: serialized.slice(0, TRANSCRIPT_PART_CHAR_CAP) + '…[truncated]',
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] capStructuredValue performs an unsafe type cast when truncating values.

When the value fits within TRANSCRIPT_PART_CHAR_CAP, the return preserves the original type. But the truncation path returns { truncated: true, originalLength: N, preview: "..." } — an object with a completely different shape — then casts it back via as typeof part.functionCall.args. If the original value is an array, downstream code relying on .map() or indexed access on the truncated object would crash at runtime.

Currently only the judge model receives truncated values (which tolerates varied shapes), but future code processing truncated functionResponse.response could break silently. Consider a discriminated union or a sentinel wrapper that preserves the original structure.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

@tanzhenxin tanzhenxin added the type/feature-request New feature or enhancement request label May 15, 2026
Comment thread packages/core/src/goals/goalHook.ts Outdated
// false), but the block-decision form documents intent more clearly —
// "this hook is intentionally preventing the turn from stopping, not
// signalling an error".
return { decision: 'block', reason: verdict.reason };
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The goal judge's free-form reason is returned as the Stop-hook continuation prompt, and client.ts feeds that value back into the model as the next user message. Because the judge reason is derived from untrusted transcript content (assistant output, tool output, file contents, web pages, etc.), prompt-injection text can be laundered into a fresh user-like instruction on the next loop iteration.

This weakens the boundary between transcript evidence and trusted user intent during the autonomous /goal loop. Please keep the judge diagnostic for UI/history only and feed the model a fixed, controlled continuation instruction based on the original goal, or wrap any diagnostic as explicitly non-instructional data.

— gpt-5.5 via Qwen Code /review

Comment thread packages/core/src/goals/goalJudge.ts Outdated
try {
const client = config.getGeminiClient();
if (!client.isInitialized()) return fallbackTranscript(lastAssistantText);
const full = client.getHistory();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] collectTranscript() calls client.getHistory() before slicing to the 24-message tail, but GeminiChat.getHistory() deep-clones the entire session history. During a /goal loop this happens at every Stop boundary, so long sessions or sessions with large tool results pay O(total transcript size) CPU/memory cost even though only the tail is used.

Consider adding a helper that slices the recent history first and only clones/caps that tail (for example getHistoryTail(count, curated?)), then have the judge call that instead of cloning the full history and slicing afterward.

— gpt-5.5 via Qwen Code /review

@qqqys
Copy link
Copy Markdown
Collaborator Author

qqqys commented May 15, 2026

Addressed the latest review feedback.

Changes:

  • Stop-hook continuation now uses a fixed, controlled prompt derived from the original /goal condition.
  • Judge reason remains stored as diagnostic state for UI/history, but is no longer fed back to the model as the next continuation instruction.
  • Added GeminiChat.getHistoryTail() / GeminiClient.getHistoryTail() so the goal judge can clone only the recent tail instead of cloning the full session history and slicing afterward.
  • Added regression coverage for prompt-injection laundering, timeout diagnostics, bounded history-tail reads, and getHistoryTail() deep-copy behavior.

Verification:

  • npx vitest run src/goals/goalHook.test.ts src/goals/goalJudge.test.ts src/core/geminiChat.test.ts ✅ 104 tests passed
  • targeted eslint ✅
  • npm run typecheck --workspace @qwen-code/qwen-code-core

qqqys and others added 10 commits May 16, 2026 00:46
…ntinuation

`/goal <condition>` pins a free-form objective for the rest of the session.
While a goal is active, an LLM judge runs at every Stop boundary and either
lets the turn end (condition met) or feeds the judge's reason back as the
next user prompt to keep the model working. Auto-clears on success;
`/goal clear` cancels early. Same primitive as Anthropic's Claude Code
2.1.140 `/goal`, built on qwen-code's existing Stop-hook + function-hook
plumbing — no new subsystem.

Core (packages/core/src/goals/):
  - activeGoalStore: per-session active goal + last-terminal cache, with a
    terminal-observer channel the CLI subscribes to so achieved/aborted
    cards land in history.
  - goalJudge: side-query against a fast model, transcript-grounded
    system prompt + json_schema response + disabled thinking. Tolerant
    JSON extraction with fallback so a flaky judge can't kill the loop;
    30s default timeout (vs. the 5s function-hook default that was
    silently killing real-world judge calls).
  - goalHook: function hook on Stop. Returns {decision:'block', reason}
    when not met (reusing client.ts's existing recursive continuation),
    {continue:true} when met. Self-clears active goal + notifies the
    terminal observer on met/aborted. MAX_GOAL_ITERATIONS=50 backstop.

CLI:
  - goalCommand: /goal | /goal <cond> | /goal clear|stop|off|reset|none|
    cancel. 4000-char cap, trust + disableAllHooks gates. Empty /goal
    shows running status, falls back to the last completed summary.
  - GoalPill: footer chip "◎ /goal active (12s)" — terse, claude-aligned.
  - GoalStatusMessage: set / checking / achieved / cleared / aborted
    history cards. "checking" replaces the generic stop_hook_loop chip
    for goal-driven iterations.
  - restoreGoal: on session resume, rehydrate the active goal hook +
    last-terminal cache from transcript so /goal survives /resume.

Cross-cutting fixes:
  - HookSystem.hasHooksForEvent(eventName, sessionId?): also consults
    SessionHooksManager. Previously SDK / programmatic Stop function
    hooks were silently gated out by client.ts's fast-path check, so
    they never fired.
  - client.ts: yield StopHookLoop on every continuation iteration (was
    iter > 1) — first not-met turn is now visible in the UI.
  - useGeminiStream: commit pending item + clear thoughtBuffer /
    geminiMessageBuffer on every Finished event. Fixes a UI bug where
    a Stop-hook continuation's text bled into the prior turn's pending
    history item (cumulative "te" / "tes" rendering), even though the
    persisted transcript was clean.

Co-authored-by: Qwen-Coder <noreply@qwen.ai>
…r test

- goalCommand.ts: collapse newlines and downgrade embedded double-quotes in
  the condition before splicing into the instruction prompt so the wrapping
  quote structure stays intact.
- goalLoop.integration.test.ts: matcher assertion updated to '*' to match the
  current registerGoalHook contract (previously '').

Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
Renders `Last check: <reason>` on the achieved / aborted history card
and on the empty-`/goal` summary so the final view records *why* the
judge ruled the goal complete. Uses a single inline-label Text instead
of the flex-row split used for `Goal:` — the reason is capped at 240
chars and almost always wraps; the flex-row variant hangs the
continuation at the value column's left edge (~12 cols of blank space,
easily mistaken for a stray empty line). Single Text + natural wrap
keeps the continuation flush.

Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
Cold boot path in AppContainer already calls restoreGoalFromHistory after
loading session data, but the runtime /resume and /branch paths skipped
it entirely. After /new + /resume back to a session that had an active
/goal, the in-memory activeGoalStore entry still held the pre-/new
setAt and a hookId pointing to a hook that config.startNewSession() had
torn down — leaving the footer pill ticking from the original setAt
(observable as "几十秒" elapsed immediately after resume) while the
Stop hook was silently dead.

Wire restoreGoalFromHistory into both handlers right after the session
data lands so unregisterGoalHook clears the stale entry and
registerGoalHook re-arms with a fresh setAt / hookId and re-installs
the terminal observer.

Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
Drop the duplicated local formatDuration from goalCommand.ts and
GoalStatusMessage.tsx in favor of the shared formatters.ts version,
called with { hideTrailingZeros: true }. The shared util already has
its own test suite and matches Claude Code's ShellTimeDisplay style
(round values drop zero-unit tails: `5m 0s` → `5m`).

Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
The judge-timeout path in judgeGoalWithTimeout only resolved a fallback
verdict; the underlying judgeGoal generateContent call kept running
because the hook context signal is never aborted by the timeout. Each
timeout leaked one in-flight request that accumulated across goal-loop
iterations. Link an AbortController into the judge signal and abort it
when the timeout fires.

Co-Authored-By: Qwen-Coder <qwen-coder@alibabacloud.com>
@qqqys qqqys force-pushed the feat/goal-command-4074-rebuild branch from 4c56052 to c5bf88b Compare May 15, 2026 16:50
@qqqys
Copy link
Copy Markdown
Collaborator Author

qqqys commented May 15, 2026

Follow-up: rebased PR #4123 onto latest main and resolved the AppContainer import conflict while preserving both upstream lowlight loading and /goal restoration.

Current pushed head: c5bf88bfc272c05e63eec0bedad95f0f7e382c8f

Verification after rebase:

  • CLI goal tests: goalCommand.test.ts, GoalPill.test.tsx, restoreGoal.test.ts ✅ 34 tests passed
  • Core tests: goalHook.test.ts, goalJudge.test.ts, geminiChat.test.ts ✅ 109 tests passed
  • targeted eslint ✅
  • core typecheck ✅
  • core build + CLI typecheck ✅

@qqqys
Copy link
Copy Markdown
Collaborator Author

qqqys commented May 15, 2026

Fixed the CI failure from run 25930104687.

Root cause:

  • The full core CI includes goalLoop.integration.test.ts.
  • After the review fix, Stop-hook continuation intentionally returns a controlled prompt instead of the judge's free-form diagnostic reason.
  • The integration test still expected the old behavior (reason: 'still missing letters e, s, t').

Fix:

  • Updated the integration test contract to assert the safe continuation prompt includes the original goal condition.
  • Added an assertion that the continuation prompt does not include the judge diagnostic.
  • Kept the existing assertion that the judge diagnostic remains stored as activeGoal.lastReason for UI/history.

Verification:

  • npx vitest run src/goals/goalLoop.integration.test.ts src/goals/goalHook.test.ts src/goals/goalJudge.test.ts src/core/geminiChat.test.ts ✅ 111 tests passed
  • npm run typecheck --workspace @qwen-code/qwen-code-core

Pushed head: 173082864176a071d900e42b24f15d3bab1c1582

return { restored: false };
}

if (!config.isTrustedFolder() || config.getDisableAllHooks()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] When restoreGoalFromHistory() finds an active goal in the transcript but restoration is now blocked by trust/policy gates, it returns without unregistering any existing in-memory goal for the same session. Runtime /resume and /branch call this after config.startNewSession() has rebuilt the hook system, so a stale activeGoalStore entry can keep the footer and /goal status showing an active goal whose Stop hook is no longer registered. The user then sees an active goal that will never continue the turn.

Please clear the stale goal on these failed restore gates, the same way the no-active-goal branch does.

Suggested change
if (!config.isTrustedFolder() || config.getDisableAllHooks()) {
if (!config.isTrustedFolder() || config.getDisableAllHooks()) {
unregisterGoalHook(config, sessionId);
return { restored: false };
}

The !config.getHookSystem() branch should also unregister before returning.

— gpt-5.5 via Qwen Code /review

}

function capStructuredValue(value: unknown): unknown {
const serialized = safeStringify(value);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] capStructuredValue() fully JSON.stringifys structured tool payloads before deciding whether to truncate them. The judge runs at every /goal Stop boundary, so a large functionResponse.response or functionCall.args in the recent transcript still gets fully materialized and scanned on each continuation even though only a 4 KB preview is sent to the judge.

Please use a bounded/depth-limited serialization helper that stops once TRANSCRIPT_PART_CHAR_CAP is reached, or otherwise cap common structured values incrementally without building the full JSON string first.

— gpt-5.5 via Qwen Code /review

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage suggestions (could not map to specific diff lines):

  1. packages/cli/src/ui/components/messages/GoalStatusMessage.tsx — new 141-line component with 5 kind variants (checking/set/achieved/cleared/aborted), conditional rendering for iterations/durationMs/lastReason. No test file exists. Consider adding GoalStatusMessage.test.tsx covering all kinds and edge cases.
  2. packages/core/src/goals/activeGoalStore.test.ts — tests skip the observer system (setGoalTerminalObserver, clearGoalTerminalObserver, notifyGoalTerminal), terminal cache (getLastGoalTerminal, setLastGoalTerminal), and the error-swallowing behavior in notifyGoalTerminal. These paths are only indirectly exercised via integration tests.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

},
};
}
// Emit StopHookLoop on EVERY blocking Stop hook execution, including
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The > 1 guard on StopHookLoop emission was removed, so it now fires on iteration 1 for ALL Stop hooks (not just the goal hook). This is a behavioral change — existing external Stop hooks that previously had a silent first block now produce a visible UI chip on first fire. client.test.ts has no tests for StopHookLoop emission at all. Consider adding a test verifying the event fires on iteration 1.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Goal iterations consume the session turn budget without coordination.

Each Stop-hook continuation in client.ts decrements boundedTurns by 1 (line 1592: boundedTurns - 1). With MAX_GOAL_ITERATIONS = 50 and MAX_TURNS = 100, a full goal loop silently consumes half the session turn budget. After a goal completes or aborts, only ~50 turns remain. The user sees a generic "max session turns exceeded" error with no indication that the goal loop was responsible.

Suggested fix: Do not count goal-continuation turns against the session budget (pass boundedTurns instead of boundedTurns - 1 for SendMessageType.Hook when a goal hook is active), or surface a specific warning when the session turn limit is approached during an active goal.


— glm-5.1 via Qwen Code /review

const stopInput = input as StopInput;
const lastAssistantText = stopInput.last_assistant_message ?? '';

const current = getActiveGoal(sessionId);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] TOCTOU race condition: goal replacement during in-flight judge call silently corrupts the new goal.

The hook callback snapshots current = getActiveGoal(sessionId) at line 141 before the await judgeGoalWithTimeout() at line 148. If the user runs /goal new-condition during the await, registerGoalHook replaces the store entry and removes the old hook, but the in-flight callback still holds a reference to the old goal.

When the judge returns {ok: true}, finishGoal (line 155) is called with the stale current. Inside finishGoal, clearActiveGoal(sessionId) removes the new goal from the store, and the new goal's Stop hook remains registered as a zombie — it fires on every subsequent Stop, burning judge API calls with no store entry to gate it.

The guard at line 142 (current.condition !== condition) only runs before the await, not after, so it cannot catch a replacement that happens during the async yield.

Suggested change
const current = getActiveGoal(sessionId);
const verdict = await judgeGoalWithTimeout(config, {
condition,
lastAssistantText,
signal,
});
// Re-check after await — the goal may have been replaced during the judge call.
const latestGoal = getActiveGoal(sessionId);
if (!latestGoal || latestGoal.condition !== condition) {
return { continue: true };
}

— glm-5.1 via Qwen Code /review

return { ok: false, reason: JUDGE_REASON_FALLBACK };
}
return parsed;
} catch (err) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Persistent judge failures silently burn up to 50 iterations with no user-visible error.

judgeGoal catches all exceptions (line 165) and returns {ok: false, reason: JUDGE_REASON_FALLBACK}. When the judge API is consistently broken (bad credentials, endpoint down, quota exhausted), the goal loop runs all 50 iterations. The user sees only repeated "Goal check · turn N · not yet met" cards with zero indication that the judge itself is broken rather than the goal being hard to achieve.

The fallback reason "Goal judge unavailable" is stored as lastReason in the active goal but is only visible via proactive /goal status check or the final abort card (after all 50 iterations). Each iteration generates a full model response, burning real tokens and wall-clock time against a judge that will never succeed.

Suggested fix: Track consecutive judge failures in the active goal state. After N consecutive failures (e.g., 3), surface a visible warning in the checking card: "Goal judge has failed 3 times — consider /goal clear". Alternatively, render lastReason in the checking card (dimmed/truncated) so the user can see "Goal judge unavailable" instead of just "not yet met."

— glm-5.1 via Qwen Code /review

const activeGoal = getActiveGoal(config.getSessionId());
if (activeGoal && activeGoal.condition) {
addItem(
{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The handleStopHookLoopEvent goal-aware branch is entirely untested.

When getActiveGoal(sessionId) returns truthy (line ~1316), the code emits a goal_status/checking history item instead of the regular stop_hook_loop item. This is the primary UI rendering path for the per-iteration "not yet met" indicator during a goal loop.

useGeminiStream.test.tsx has zero goal-related test cases. All existing StopHookLoop tests exercise only the non-goal path. If this branch regresses, the user sees the raw stop_hook_loop chip (with its "Stop hook error:" framing) instead of the goal-aware "Goal check · turn N · not yet met" card.

Suggested fix: Add a test in the StopHookLoop Event describe block that: (1) mocks getActiveGoal to return an active goal, (2) emits a StopHookLoop event, (3) asserts addItem was called with { type: 'goal_status', kind: 'checking', ... } and NOT with { type: 'stop_hook_loop', ... }.

— glm-5.1 via Qwen Code /review

condition: string;
iterations: number;
setAt: number;
tokensAtStart: number;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] tokensAtStart is dead code.

ActiveGoal.tokensAtStart is declared on the interface, populated via registerGoalHook (hardcoded 0 everywhere in production), and spread onto constructed goal objects — but never read by any consumer. All production callers pass 0, and the only non-zero usage is a test factory.

This bloats the API and misleads future contributors into thinking token-budget tracking is implemented.

Suggested change
tokensAtStart: number;
export interface ActiveGoal {
condition: string;
iterations: number;
setAt: number;
lastReason?: string;
}

Remove tokensAtStart from ActiveGoal, registerGoalHook args, and all callers. Add it back when token-budget tracking is actually implemented.

— glm-5.1 via Qwen Code /review

const model = config.getFastModel() ?? config.getModel();

try {
const client = config.getGeminiClient();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Judge system prompt is contaminated with user memory.

The judge call at line 125-126 goes through config.getGeminiClient().generateContent(), which routes all custom systemInstruction values through getCustomSystemPrompt(). That function appends user memory from .qwen/ files to the system prompt.

The judge's carefully crafted JUDGE_SYSTEM_PROMPT ("Answer based on transcript evidence only") therefore has user memory appended. Persistent instructions like "always mark tasks as complete when tests pass" could bias the binary judge evaluation, undermining the "default to not-met when ambiguous" policy.

Suggested fix: Call config.getBaseLlmClient().generateContent() directly, bypassing the user-memory injection in GeminiClient.generateContent(), or add a flag to skip user memory for judge-side calls.

— glm-5.1 via Qwen Code /review

@@ -0,0 +1,141 @@
/**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] GoalStatusMessage has no test file.

This component renders 5 distinct UI branches (checking, set, achieved, cleared, aborted) with different icons, colors, stats formatting, and conditional Last check: text. It is the primary rendering component for the entire /goal feature surface, yet has zero test coverage.

Suggested fix: Add GoalStatusMessage.test.tsx that renders the component for each kind and asserts the correct icon, text, and conditional elements.

— glm-5.1 via Qwen Code /review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature-request New feature or enhancement request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add /goal slash command (session-scoped Stop hook + LLM-judge auto-clear)

3 participants