feat(cli): add session-scoped /goal command with judge-driven turn continuation#4123
feat(cli): add session-scoped /goal command with judge-driven turn continuation#4123qqqys wants to merge 11 commits into
Conversation
| * last goal_status was a terminal state (achieved / cleared / aborted) or | ||
| * none exists. | ||
| */ | ||
| export function findGoalToRestore(history: HistoryItem[]): string | null { |
There was a problem hiding this comment.
[Critical] findGoalToRestore() only restores a goal when the latest goal_status item is kind === 'set'. During a normal active goal loop, each not-met Stop hook iteration appends a checking status, so resuming a session after any check treats the goal as terminal and unregisters it even though there is no achieved, cleared, or aborted record. This silently drops an in-progress /goal across /resume. Please treat checking as a non-terminal active state, or scan backward past checking entries until reaching a real terminal state or the matching set.
— gpt-5.5 via Qwen Code /review
| } | ||
|
|
||
| if (current.iterations >= MAX_GOAL_ITERATIONS) { | ||
| debugLogger.debug( |
There was a problem hiding this comment.
[Critical] The max-iteration guard runs before judgeGoal(), so when the next Stop event starts with iterations >= MAX_GOAL_ITERATIONS, the hook aborts without judging the latest assistant output. That means the final generated turn can satisfy the goal but still be reported as Goal max iterations reached and cleared as aborted. Please run the judge first and only abort after a still-not-met verdict at the limit, or otherwise allow one final evaluation before aborting.
— gpt-5.5 via Qwen Code /review
| // Stop hook callback. The core side clears the observer on terminal / | ||
| // unregister so we don't accumulate stale closures across goals. | ||
| const { addItem } = context.ui; | ||
| setGoalTerminalObserver(sessionId, (event: GoalTerminalEvent) => { |
There was a problem hiding this comment.
[Critical] Terminal goal events emitted later from the Stop hook are only added to in-memory UI history via addItem(). They do not go through the slash-command processor's recorded-items path and are not written to ChatRecordingService, unlike other async UI events such as auto /recap. As a result, /resume can miss the achieved/aborted card and may see only the earlier set/checking entries, causing a completed goal to be restored as active. Please also persist these terminal goal_status items through recordSlashCommand({ phase: 'result', ... }) when the observer fires.
— gpt-5.5 via Qwen Code /review
| * silently skip restoration rather than re-register a goal the user can no | ||
| * longer cancel. | ||
| */ | ||
| export function restoreGoalFromHistory( |
There was a problem hiding this comment.
[Critical] Restoring a goal only re-registers the core Stop hook; it does not reinstall the terminal observer that fresh /goal setup installs. A restored goal can later achieve or abort, but no observer will add/persist the terminal goal_status card, so future resume/summary behavior loses the actual terminal state. Please share the observer setup with the fresh /goal path or pass an observer/addItem callback into restoration so restored goals have the same terminal lifecycle.
— gpt-5.5 via Qwen Code /review
| const signal = context?.signal ?? new AbortController().signal; | ||
| const verdict = await judgeGoal(config, { | ||
| condition, | ||
| lastAssistantText, |
There was a problem hiding this comment.
[Critical] The automatic achieved and max-iteration aborted paths clear the active goal store and observer, but they never remove the registered Stop function hook from HookSystem. After completion, later Stop events still execute the stale callback (it just returns continue: true because the active goal is gone), and repeated goals can accumulate dead session hooks. Please remove the function hook before clearing the active goal, or use a cleanup helper that removes by hookId and then clears the store/observer.
— gpt-5.5 via Qwen Code /review
| }, | ||
| argumentHint: '[<condition> | clear]', | ||
| kind: CommandKind.BUILT_IN, | ||
| supportedModes: ['interactive', 'non_interactive'] as const, |
There was a problem hiding this comment.
[Suggestion] /goal is advertised as supported in non_interactive, but the implementation depends on interactive UI history, terminal observers, and a continuing turn loop. In headless usage such as qwen -p "/goal ...", the command can appear supported while arming a session hook and producing UI-only lifecycle state that the caller cannot inspect or clear. Please restrict this command to interactive mode, or add a tested non-interactive lifecycle with deterministic recording and cleanup semantics.
— gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Two cross-file Critical issues (not mappable to a single diff line):
-
Function hook timeout silently kills goal loop. The
FunctionHookRunner.executeWithTimeout(infunctionHookRunner.ts, not in this diff) wraps the goal callback inPromise.racewith a 30s timeout. When the judge API call exceeds 30s, the timeout reject wins — the callback never returns{decision:'block', reason:…}, sostopOutputisundefinedand the Stop hook loop never continues. The goal stays active, and the same timeout repeats on every subsequent Stop event in an invisible infinite loop. No error surfaces to the user — just◎ /goal activeforever. -
Judge failures have no observable production traces. Every failure path in
judgeGoal(API errors, parse failures, empty responses) uses onlydebugLogger.debug. There is noreportError, no telemetry span, no user-visible warning. IfgetFastModel()is misconfigured and the judge API fails 50 times in a row, the goal silently aborts with the fallback reason — and the operator has no way to know why.
These are in addition to the inline findings below.
| function capContent(content: Content): Content { | ||
| if (!content.parts) return content; | ||
| return { | ||
| ...content, |
There was a problem hiding this comment.
[Critical] capContent only truncates text parts. functionResponse and functionCall parts pass through at full size, which means large tool outputs (e.g., run_shell_command returning 50KB logs, read_file loading source files) are sent verbatim to the judge model on every Stop boundary.
This inflates token consumption per judge call by up to an order of magnitude on turns with tool use. Since the judge already disables extended thinking (thinkingBudget: 0), the dominant cost driver is input tokens — and those are uncontrolled for non-text parts.
| ...content, | |
| function capContent(content: Content): Content { | |
| if (!content.parts) return content; | |
| return { | |
| ...content, | |
| parts: content.parts.map((p) => { | |
| if (typeof p.text === 'string') { | |
| return p.text.length > TRANSCRIPT_PART_CHAR_CAP | |
| ? { ...p, text: p.text.slice(0, TRANSCRIPT_PART_CHAR_CAP) + '…[truncated]' } | |
| : p; | |
| } | |
| // Truncate function responses to avoid blowing up judge context | |
| if (p.functionResponse) { | |
| const resp = p.functionResponse; | |
| if (typeof resp.response === 'object' && resp.response !== null) { | |
| const text = JSON.stringify(resp.response); | |
| if (text.length > TRANSCRIPT_PART_CHAR_CAP) { | |
| return { | |
| ...p, | |
| functionResponse: { | |
| ...resp, | |
| response: { truncated: true, preview: text.slice(0, TRANSCRIPT_PART_CHAR_CAP) }, | |
| }, | |
| }; | |
| } | |
| } | |
| } | |
| return p; | |
| }), | |
| }; | |
| } |
— deepseek-v4-pro via Qwen Code /review
| const hookId = system.addFunctionHook( | ||
| sessionId, | ||
| HookEventName.Stop, | ||
| '', |
There was a problem hiding this comment.
[Suggestion] The goal hook is registered with an empty string matcher '', which relies on matchesPattern('', '') returning true (via new RegExp('^$').test('')). If the Stop event firing path ever passes a non-empty toolName/target (e.g., a future refactor that distinguishes Stop-by-tool from generic Stop), the hook would silently stop matching — matchesPattern('', 'Bash') returns false.
The hook would remain registered, hasHooksForEvent would still return true (because the hook IS registered), but the event would never call the callback. No error, no log — the goal just silently stops evaluating.
| '', | |
| const hookId = system.addFunctionHook( | |
| sessionId, | |
| HookEventName.Stop, | |
| '*', // Match any Stop event regardless of toolName | |
| callback, | |
| 'Goal evaluator failed', |
Alternatively, handle empty-string matcher explicitly in SessionHooksManager.getMatchingHooks as a wildcard.
— deepseek-v4-pro via Qwen Code /review
| const parsed = parseJudgeReply(text); | ||
| if (!parsed) { | ||
| debugLogger.debug( | ||
| `Goal judge reply not parseable as JSON: ${text.slice(0, 200)}`, |
There was a problem hiding this comment.
[Suggestion] debugLogger.debug logs the raw judge model response's first 200 characters when JSON parsing fails:
debugLogger.debug(
`Goal judge reply not parseable as JSON: ${text.slice(0, 200)}`,
);The judge's system prompt instructs it to "quote specific text from the transcript" as evidence. If the model outputs a text preamble before the JSON (despite structured-output mode), those 200 characters may contain snippets of the actual conversation — which could include file contents, keys, or other sensitive data.
| `Goal judge reply not parseable as JSON: ${text.slice(0, 200)}`, | |
| debugLogger.debug( | |
| `Goal judge reply not parseable as JSON (first 80 chars): ${JSON.stringify(text.slice(0, 80))}`, | |
| ); |
Using JSON.stringify prevents injection of newlines/log-forgery into the log stream, and reducing to 80 chars still gives enough context for debugging.
— deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
TypeScript Type Errors (pre-existing lines affected by type changes)
These tsc --noEmit errors are in unchanged code but triggered by type changes elsewhere:
packages/cli/src/config/config.test.ts:1867,1872—mcpServersis possiblyundefined. Add null check or optional chaining.packages/cli/src/config/config.test.ts:1885,1899— Object is possiblyundefined. Add undefined guard.packages/cli/src/ui/components/InputPrompt.test.tsx:184— Missing required propertymidInputGhostTextin mock. AddmidInputGhostText: ''.packages/cli/src/ui/components/InputPrompt.test.tsx:2057,2073—vimModeEnableddoes not exist onInputPromptProps. Remove from test props or add to interface.packages/cli/src/ui/components/InputPrompt.test.tsx:3511— Invalid cast toUIState. Cast viaas unknown as UIStateor provide all required fields.packages/cli/src/ui/components/ModelStatsDisplay.test.tsx:437—tooldoes not exist in the stats type. Fix the property name.
— deepseek-v4-pro via Qwen Code /review
| .getSessionHooksManager() | ||
| .getHooksForEvent(SESSION, HookEventName.Stop)[0]; | ||
| expect(sessionHook).toBeDefined(); | ||
| expect(sessionHook.matcher).toBe(''); |
There was a problem hiding this comment.
[Critical] Test expects sessionHook.matcher to be '', but registerGoalHook now passes '*' as the matcher. This causes a test failure.
| expect(sessionHook.matcher).toBe(''); | |
| expect(sessionHook.matcher).toBe('*'); |
— deepseek-v4-pro via Qwen Code /review
|
|
||
| const goalInstructionPrompt = (condition: string): string => | ||
| `A session-scoped Stop hook is now active with condition: "${condition}". ` + | ||
| `Briefly acknowledge the goal, then immediately start (or continue) working ` + |
There was a problem hiding this comment.
[Critical] The goal condition is interpolated directly into the main model prompt via "${condition}" without sanitization. Conditions containing unescaped double quotes or newlines break the quoting structure, potentially allowing prompt injection into the main model (which has no structured-output barrier like the judge).
| `Briefly acknowledge the goal, then immediately start (or continue) working ` + | |
| const goalInstructionPrompt = (condition: string): string => { | |
| const sanitized = condition.replace(/"/g, '\\"').replace(/\n/g, ' '); | |
| return `A session-scoped Stop hook is now active with condition: "${sanitized}". ` + | |
| `Briefly acknowledge the goal, then immediately start (or continue) working ` + | |
| `toward it — treat the condition itself as your directive...`; | |
| }; |
— deepseek-v4-pro via Qwen Code /review
|
Pushed
On the TS errors in the review body — I cannot reproduce any of them: Quick spot-check of the cited lines on the tip of this branch:
Treating those 6 entries as a |
|
|
||
| const formatTurns = (n: number) => `${n} ${n === 1 ? 'turn' : 'turns'}`; | ||
|
|
||
| function formatDuration(ms: number): string { |
There was a problem hiding this comment.
[Suggestion] formatDuration is duplicated in two files (goalCommand.ts:54 and GoalStatusMessage.tsx:22) instead of reusing the shared utility at packages/cli/src/ui/utils/formatters.ts. The shared version already supports hideTrailingZeros option (for integer-second rendering like 5s vs 5.0s) and has its own test suite.
| function formatDuration(ms: number): string { | |
| // In goalCommand.ts, replace the local formatDuration with: | |
| import { formatDuration } from '../utils/formatters.js'; | |
| // Call: formatDuration(ms, { hideTrailingZeros: true }) | |
| // In GoalStatusMessage.tsx, replace the local formatDuration with: | |
| import { formatDuration } from '../../utils/formatters.js'; | |
| // Call: formatDuration(ms, { hideTrailingZeros: true }) |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| }; | ||
|
|
||
| export interface JudgeResult { | ||
| ok: boolean; |
There was a problem hiding this comment.
[Suggestion] Judge prompt does not sanitize the goal condition before interpolation.
sanitizeConditionForPrompt is applied to the main model prompt in goalCommand.ts:46, but the judge path in userJudgementPrompt (line 57) interpolates the raw condition: Condition: ${condition}. Conditions containing newlines or embedded double-quotes can break the judge prompt structure, potentially causing format errors or confusing the judge model.
| ok: boolean; | |
| const userJudgementPrompt = (condition: string): string => | |
| `Based on the conversation transcript above, has the following stopping ` + | |
| `condition been satisfied? Answer based on transcript evidence only.\n` + | |
| `Condition: ${condition.replace(/[\r\n]+/g, ' ').replace(/"/g, "'")}`; |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| ): Promise<Awaited<ReturnType<typeof judgeGoal>>> { | ||
| let timeoutId: ReturnType<typeof setTimeout> | undefined; | ||
| try { | ||
| return await Promise.race([ |
There was a problem hiding this comment.
[Suggestion] judgeGoalWithTimeout returns a fallback on timeout but does not abort the underlying generateContent API call.
The Promise.race resolves with {ok: false} when the 25s timeout fires, but judgeGoal → client.generateContent() continues running in the background. The args.signal passed through is the hook context signal — never aborted by the timeout path. Each timeout wastes an API call, and accumulated timeouts in a goal loop can leave multiple dangling requests.
| return await Promise.race([ | |
| const judgeController = new AbortController(); | |
| const linkedSignal = AbortSignal.any([args.signal, judgeController.signal]); | |
| // Pass linkedSignal to judgeGoal; abort controller in timeout path: | |
| timeoutId = setTimeout(() => { | |
| judgeController.abort(); | |
| resolve({ ok: false, reason: GOAL_JUDGE_TIMEOUT_REASON }); | |
| }, GOAL_JUDGE_TIMEOUT_MS); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| * Default budget (seconds) for a single goal-judge LLM call. Mirrors Claude | ||
| * Code 2.1.140's prompt-hook default of 30s (see `cRK` in the binary, which | ||
| * reads `H.timeout ? H.timeout * 1000 : 30000`). | ||
| * |
There was a problem hiding this comment.
[Suggestion] Nested timeouts (25s inner resolve vs 30s outer reject) can invert order under event-loop congestion.
judgeGoalWithTimeout races an inner 25s fallback (GOAL_JUDGE_TIMEOUT_MS) against the judge call, while FunctionHookRunner.executeWithTimeout races a 30s reject (GOAL_HOOK_TIMEOUT_MS) against the hook callback. If the event loop is congested and delays the inner timer by more than 5s, the outer reject fires first, returning {success: false, outcome: 'non_blocking_error'} — the goal loop dies silently with no terminal card and no user-facing error.
Consider increasing the outer timeout to provide a larger buffer (e.g., 35s) or relying solely on the outer timeout instead of nesting two independent Promise.race calls.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| // a per-iteration history trail (familiar to its other progress | ||
| // indicators) but drops the `error:` framing — a not-met judge is the | ||
| // *expected* outcome of every continuation, not a failure. | ||
| const activeGoal = getActiveGoal(config.getSessionId()); |
There was a problem hiding this comment.
[Suggestion] Non-goal Stop hook blocking reasons are discarded when a goal is active.
When getActiveGoal(sessionId) returns a truthy value, handleStopHookLoopEvent renders a goal_status kind:'checking' item and returns early, dropping value.reasons from any user-configured Stop hooks. Users who combine /goal with custom Stop hooks (e.g., tool gates) lose visibility into their own hooks' blocking reasons for the entire goal loop.
| const activeGoal = getActiveGoal(config.getSessionId()); | |
| const activeGoal = getActiveGoal(config.getSessionId()); | |
| if (activeGoal && activeGoal.condition) { | |
| addItem({ type: MessageType.GOAL_STATUS, kind: 'checking', ... }, now); | |
| // Also surface non-goal hook reasons when they co-exist | |
| const nonGoalReasons = value.stopHookCount > 1 | |
| ? value.reasons.filter((_, i) => i < value.reasons.length - 1) : []; | |
| if (nonGoalReasons.length > 0) { | |
| addItem({ type: 'stop_hook_loop', reasons: nonGoalReasons, ... }, now); | |
| } | |
| return; | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| useEffect(() => { | ||
| const id = setInterval(() => { | ||
| const next = getActiveGoal(sessionId); | ||
| setGoal(next); |
There was a problem hiding this comment.
[Suggestion] GoalPill polling interval runs continuously even when no goal is active.
useActiveGoal creates a setInterval at 1s granularity that only clears on component unmount or sessionId change. Once a goal completes (activeGoal becomes undefined), the interval keeps polling indefinitely with no benefit. In a long-running session, this wastes timer and React reconciliation cycles.
| setGoal(next); | |
| useEffect(() => { | |
| const current = getActiveGoal(sessionId); | |
| if (!current) { setGoal(undefined); return; } | |
| setGoal(current); | |
| const id = setInterval(() => { | |
| const next = getActiveGoal(sessionId); | |
| setGoal(next); | |
| if (next) setTick((t) => (t + 1) % 1_000_000); | |
| }, POLL_INTERVAL_MS); | |
| return () => clearInterval(id); | |
| }, [sessionId]); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| truncated: true, | ||
| originalLength: serialized.length, | ||
| preview: serialized.slice(0, TRANSCRIPT_PART_CHAR_CAP) + '…[truncated]', | ||
| }; |
There was a problem hiding this comment.
[Suggestion] capStructuredValue performs an unsafe type cast when truncating values.
When the value fits within TRANSCRIPT_PART_CHAR_CAP, the return preserves the original type. But the truncation path returns { truncated: true, originalLength: N, preview: "..." } — an object with a completely different shape — then casts it back via as typeof part.functionCall.args. If the original value is an array, downstream code relying on .map() or indexed access on the truncated object would crash at runtime.
Currently only the judge model receives truncated values (which tolerates varied shapes), but future code processing truncated functionResponse.response could break silently. Consider a discriminated union or a sentinel wrapper that preserves the original structure.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| // false), but the block-decision form documents intent more clearly — | ||
| // "this hook is intentionally preventing the turn from stopping, not | ||
| // signalling an error". | ||
| return { decision: 'block', reason: verdict.reason }; |
There was a problem hiding this comment.
[Critical] The goal judge's free-form reason is returned as the Stop-hook continuation prompt, and client.ts feeds that value back into the model as the next user message. Because the judge reason is derived from untrusted transcript content (assistant output, tool output, file contents, web pages, etc.), prompt-injection text can be laundered into a fresh user-like instruction on the next loop iteration.
This weakens the boundary between transcript evidence and trusted user intent during the autonomous /goal loop. Please keep the judge diagnostic for UI/history only and feed the model a fixed, controlled continuation instruction based on the original goal, or wrap any diagnostic as explicitly non-instructional data.
— gpt-5.5 via Qwen Code /review
| try { | ||
| const client = config.getGeminiClient(); | ||
| if (!client.isInitialized()) return fallbackTranscript(lastAssistantText); | ||
| const full = client.getHistory(); |
There was a problem hiding this comment.
[Suggestion] collectTranscript() calls client.getHistory() before slicing to the 24-message tail, but GeminiChat.getHistory() deep-clones the entire session history. During a /goal loop this happens at every Stop boundary, so long sessions or sessions with large tool results pay O(total transcript size) CPU/memory cost even though only the tail is used.
Consider adding a helper that slices the recent history first and only clones/caps that tail (for example getHistoryTail(count, curated?)), then have the judge call that instead of cloning the full history and slicing afterward.
— gpt-5.5 via Qwen Code /review
|
Addressed the latest review feedback. Changes:
Verification:
|
…ntinuation
`/goal <condition>` pins a free-form objective for the rest of the session.
While a goal is active, an LLM judge runs at every Stop boundary and either
lets the turn end (condition met) or feeds the judge's reason back as the
next user prompt to keep the model working. Auto-clears on success;
`/goal clear` cancels early. Same primitive as Anthropic's Claude Code
2.1.140 `/goal`, built on qwen-code's existing Stop-hook + function-hook
plumbing — no new subsystem.
Core (packages/core/src/goals/):
- activeGoalStore: per-session active goal + last-terminal cache, with a
terminal-observer channel the CLI subscribes to so achieved/aborted
cards land in history.
- goalJudge: side-query against a fast model, transcript-grounded
system prompt + json_schema response + disabled thinking. Tolerant
JSON extraction with fallback so a flaky judge can't kill the loop;
30s default timeout (vs. the 5s function-hook default that was
silently killing real-world judge calls).
- goalHook: function hook on Stop. Returns {decision:'block', reason}
when not met (reusing client.ts's existing recursive continuation),
{continue:true} when met. Self-clears active goal + notifies the
terminal observer on met/aborted. MAX_GOAL_ITERATIONS=50 backstop.
CLI:
- goalCommand: /goal | /goal <cond> | /goal clear|stop|off|reset|none|
cancel. 4000-char cap, trust + disableAllHooks gates. Empty /goal
shows running status, falls back to the last completed summary.
- GoalPill: footer chip "◎ /goal active (12s)" — terse, claude-aligned.
- GoalStatusMessage: set / checking / achieved / cleared / aborted
history cards. "checking" replaces the generic stop_hook_loop chip
for goal-driven iterations.
- restoreGoal: on session resume, rehydrate the active goal hook +
last-terminal cache from transcript so /goal survives /resume.
Cross-cutting fixes:
- HookSystem.hasHooksForEvent(eventName, sessionId?): also consults
SessionHooksManager. Previously SDK / programmatic Stop function
hooks were silently gated out by client.ts's fast-path check, so
they never fired.
- client.ts: yield StopHookLoop on every continuation iteration (was
iter > 1) — first not-met turn is now visible in the UI.
- useGeminiStream: commit pending item + clear thoughtBuffer /
geminiMessageBuffer on every Finished event. Fixes a UI bug where
a Stop-hook continuation's text bled into the prior turn's pending
history item (cumulative "te" / "tes" rendering), even though the
persisted transcript was clean.
Co-authored-by: Qwen-Coder <noreply@qwen.ai>
…r test - goalCommand.ts: collapse newlines and downgrade embedded double-quotes in the condition before splicing into the instruction prompt so the wrapping quote structure stays intact. - goalLoop.integration.test.ts: matcher assertion updated to '*' to match the current registerGoalHook contract (previously ''). Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
Renders `Last check: <reason>` on the achieved / aborted history card and on the empty-`/goal` summary so the final view records *why* the judge ruled the goal complete. Uses a single inline-label Text instead of the flex-row split used for `Goal:` — the reason is capped at 240 chars and almost always wraps; the flex-row variant hangs the continuation at the value column's left edge (~12 cols of blank space, easily mistaken for a stray empty line). Single Text + natural wrap keeps the continuation flush. Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
Cold boot path in AppContainer already calls restoreGoalFromHistory after loading session data, but the runtime /resume and /branch paths skipped it entirely. After /new + /resume back to a session that had an active /goal, the in-memory activeGoalStore entry still held the pre-/new setAt and a hookId pointing to a hook that config.startNewSession() had torn down — leaving the footer pill ticking from the original setAt (observable as "几十秒" elapsed immediately after resume) while the Stop hook was silently dead. Wire restoreGoalFromHistory into both handlers right after the session data lands so unregisterGoalHook clears the stale entry and registerGoalHook re-arms with a fresh setAt / hookId and re-installs the terminal observer. Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
Drop the duplicated local formatDuration from goalCommand.ts and
GoalStatusMessage.tsx in favor of the shared formatters.ts version,
called with { hideTrailingZeros: true }. The shared util already has
its own test suite and matches Claude Code's ShellTimeDisplay style
(round values drop zero-unit tails: `5m 0s` → `5m`).
Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
The judge-timeout path in judgeGoalWithTimeout only resolved a fallback verdict; the underlying judgeGoal generateContent call kept running because the hook context signal is never aborted by the timeout. Each timeout leaked one in-flight request that accumulated across goal-loop iterations. Link an AbortController into the judge signal and abort it when the timeout fires. Co-Authored-By: Qwen-Coder <qwen-coder@alibabacloud.com>
4c56052 to
c5bf88b
Compare
|
Follow-up: rebased PR #4123 onto latest Current pushed head: Verification after rebase:
|
|
Fixed the CI failure from run Root cause:
Fix:
Verification:
Pushed head: |
| return { restored: false }; | ||
| } | ||
|
|
||
| if (!config.isTrustedFolder() || config.getDisableAllHooks()) { |
There was a problem hiding this comment.
[Critical] When restoreGoalFromHistory() finds an active goal in the transcript but restoration is now blocked by trust/policy gates, it returns without unregistering any existing in-memory goal for the same session. Runtime /resume and /branch call this after config.startNewSession() has rebuilt the hook system, so a stale activeGoalStore entry can keep the footer and /goal status showing an active goal whose Stop hook is no longer registered. The user then sees an active goal that will never continue the turn.
Please clear the stale goal on these failed restore gates, the same way the no-active-goal branch does.
| if (!config.isTrustedFolder() || config.getDisableAllHooks()) { | |
| if (!config.isTrustedFolder() || config.getDisableAllHooks()) { | |
| unregisterGoalHook(config, sessionId); | |
| return { restored: false }; | |
| } |
The !config.getHookSystem() branch should also unregister before returning.
— gpt-5.5 via Qwen Code /review
| } | ||
|
|
||
| function capStructuredValue(value: unknown): unknown { | ||
| const serialized = safeStringify(value); |
There was a problem hiding this comment.
[Suggestion] capStructuredValue() fully JSON.stringifys structured tool payloads before deciding whether to truncate them. The judge runs at every /goal Stop boundary, so a large functionResponse.response or functionCall.args in the recent transcript still gets fully materialized and scanned on each continuation even though only a 4 KB preview is sent to the judge.
Please use a bounded/depth-limited serialization helper that stops once TRANSCRIPT_PART_CHAR_CAP is reached, or otherwise cap common structured values incrementally without building the full JSON string first.
— gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Test coverage suggestions (could not map to specific diff lines):
packages/cli/src/ui/components/messages/GoalStatusMessage.tsx— new 141-line component with 5 kind variants (checking/set/achieved/cleared/aborted), conditional rendering for iterations/durationMs/lastReason. No test file exists. Consider addingGoalStatusMessage.test.tsxcovering all kinds and edge cases.packages/core/src/goals/activeGoalStore.test.ts— tests skip the observer system (setGoalTerminalObserver,clearGoalTerminalObserver,notifyGoalTerminal), terminal cache (getLastGoalTerminal,setLastGoalTerminal), and the error-swallowing behavior innotifyGoalTerminal. These paths are only indirectly exercised via integration tests.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| }, | ||
| }; | ||
| } | ||
| // Emit StopHookLoop on EVERY blocking Stop hook execution, including |
There was a problem hiding this comment.
[Suggestion] The > 1 guard on StopHookLoop emission was removed, so it now fires on iteration 1 for ALL Stop hooks (not just the goal hook). This is a behavioral change — existing external Stop hooks that previously had a silent first block now produce a visible UI chip on first fire. client.test.ts has no tests for StopHookLoop emission at all. Consider adding a test verifying the event fires on iteration 1.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
[Critical] Goal iterations consume the session turn budget without coordination.
Each Stop-hook continuation in client.ts decrements boundedTurns by 1 (line 1592: boundedTurns - 1). With MAX_GOAL_ITERATIONS = 50 and MAX_TURNS = 100, a full goal loop silently consumes half the session turn budget. After a goal completes or aborts, only ~50 turns remain. The user sees a generic "max session turns exceeded" error with no indication that the goal loop was responsible.
Suggested fix: Do not count goal-continuation turns against the session budget (pass boundedTurns instead of boundedTurns - 1 for SendMessageType.Hook when a goal hook is active), or surface a specific warning when the session turn limit is approached during an active goal.
— glm-5.1 via Qwen Code /review
| const stopInput = input as StopInput; | ||
| const lastAssistantText = stopInput.last_assistant_message ?? ''; | ||
|
|
||
| const current = getActiveGoal(sessionId); |
There was a problem hiding this comment.
[Critical] TOCTOU race condition: goal replacement during in-flight judge call silently corrupts the new goal.
The hook callback snapshots current = getActiveGoal(sessionId) at line 141 before the await judgeGoalWithTimeout() at line 148. If the user runs /goal new-condition during the await, registerGoalHook replaces the store entry and removes the old hook, but the in-flight callback still holds a reference to the old goal.
When the judge returns {ok: true}, finishGoal (line 155) is called with the stale current. Inside finishGoal, clearActiveGoal(sessionId) removes the new goal from the store, and the new goal's Stop hook remains registered as a zombie — it fires on every subsequent Stop, burning judge API calls with no store entry to gate it.
The guard at line 142 (current.condition !== condition) only runs before the await, not after, so it cannot catch a replacement that happens during the async yield.
| const current = getActiveGoal(sessionId); | |
| const verdict = await judgeGoalWithTimeout(config, { | |
| condition, | |
| lastAssistantText, | |
| signal, | |
| }); | |
| // Re-check after await — the goal may have been replaced during the judge call. | |
| const latestGoal = getActiveGoal(sessionId); | |
| if (!latestGoal || latestGoal.condition !== condition) { | |
| return { continue: true }; | |
| } |
— glm-5.1 via Qwen Code /review
| return { ok: false, reason: JUDGE_REASON_FALLBACK }; | ||
| } | ||
| return parsed; | ||
| } catch (err) { |
There was a problem hiding this comment.
[Critical] Persistent judge failures silently burn up to 50 iterations with no user-visible error.
judgeGoal catches all exceptions (line 165) and returns {ok: false, reason: JUDGE_REASON_FALLBACK}. When the judge API is consistently broken (bad credentials, endpoint down, quota exhausted), the goal loop runs all 50 iterations. The user sees only repeated "Goal check · turn N · not yet met" cards with zero indication that the judge itself is broken rather than the goal being hard to achieve.
The fallback reason "Goal judge unavailable" is stored as lastReason in the active goal but is only visible via proactive /goal status check or the final abort card (after all 50 iterations). Each iteration generates a full model response, burning real tokens and wall-clock time against a judge that will never succeed.
Suggested fix: Track consecutive judge failures in the active goal state. After N consecutive failures (e.g., 3), surface a visible warning in the checking card: "Goal judge has failed 3 times — consider /goal clear". Alternatively, render lastReason in the checking card (dimmed/truncated) so the user can see "Goal judge unavailable" instead of just "not yet met."
— glm-5.1 via Qwen Code /review
| const activeGoal = getActiveGoal(config.getSessionId()); | ||
| if (activeGoal && activeGoal.condition) { | ||
| addItem( | ||
| { |
There was a problem hiding this comment.
[Critical] The handleStopHookLoopEvent goal-aware branch is entirely untested.
When getActiveGoal(sessionId) returns truthy (line ~1316), the code emits a goal_status/checking history item instead of the regular stop_hook_loop item. This is the primary UI rendering path for the per-iteration "not yet met" indicator during a goal loop.
useGeminiStream.test.tsx has zero goal-related test cases. All existing StopHookLoop tests exercise only the non-goal path. If this branch regresses, the user sees the raw stop_hook_loop chip (with its "Stop hook error:" framing) instead of the goal-aware "Goal check · turn N · not yet met" card.
Suggested fix: Add a test in the StopHookLoop Event describe block that: (1) mocks getActiveGoal to return an active goal, (2) emits a StopHookLoop event, (3) asserts addItem was called with { type: 'goal_status', kind: 'checking', ... } and NOT with { type: 'stop_hook_loop', ... }.
— glm-5.1 via Qwen Code /review
| condition: string; | ||
| iterations: number; | ||
| setAt: number; | ||
| tokensAtStart: number; |
There was a problem hiding this comment.
[Suggestion] tokensAtStart is dead code.
ActiveGoal.tokensAtStart is declared on the interface, populated via registerGoalHook (hardcoded 0 everywhere in production), and spread onto constructed goal objects — but never read by any consumer. All production callers pass 0, and the only non-zero usage is a test factory.
This bloats the API and misleads future contributors into thinking token-budget tracking is implemented.
| tokensAtStart: number; | |
| export interface ActiveGoal { | |
| condition: string; | |
| iterations: number; | |
| setAt: number; | |
| lastReason?: string; | |
| } |
Remove tokensAtStart from ActiveGoal, registerGoalHook args, and all callers. Add it back when token-budget tracking is actually implemented.
— glm-5.1 via Qwen Code /review
| const model = config.getFastModel() ?? config.getModel(); | ||
|
|
||
| try { | ||
| const client = config.getGeminiClient(); |
There was a problem hiding this comment.
[Suggestion] Judge system prompt is contaminated with user memory.
The judge call at line 125-126 goes through config.getGeminiClient().generateContent(), which routes all custom systemInstruction values through getCustomSystemPrompt(). That function appends user memory from .qwen/ files to the system prompt.
The judge's carefully crafted JUDGE_SYSTEM_PROMPT ("Answer based on transcript evidence only") therefore has user memory appended. Persistent instructions like "always mark tasks as complete when tests pass" could bias the binary judge evaluation, undermining the "default to not-met when ambiguous" policy.
Suggested fix: Call config.getBaseLlmClient().generateContent() directly, bypassing the user-memory injection in GeminiClient.generateContent(), or add a flag to skip user memory for judge-side calls.
— glm-5.1 via Qwen Code /review
| @@ -0,0 +1,141 @@ | |||
| /** | |||
There was a problem hiding this comment.
[Suggestion] GoalStatusMessage has no test file.
This component renders 5 distinct UI branches (checking, set, achieved, cleared, aborted) with different icons, colors, stats formatting, and conditional Last check: text. It is the primary rendering component for the entire /goal feature surface, yet has zero test coverage.
Suggested fix: Add GoalStatusMessage.test.tsx that renders the component for each kind and asserts the correct icon, text, and conditional elements.
— glm-5.1 via Qwen Code /review
Summary
/goal <condition>slash command that pins a free-form session-scoped objective. While a goal is active, an LLM judge runs at every Stop boundary and either lets the turn end (condition met) or feeds the judge's reason back as the next user prompt so the model keeps working. Auto-clears on success;/goal clearcancels early. Same primitive as Anthropic's Claude Code 2.1.140/goal./goal …action. Built entirely on qwen-code's existing Stop-hook + function-hook plumbing — no new subsystem.packages/core/src/goals/— the new module.goalJudge.ts(transcript-grounded judge with structured-output + disabled thinking),goalHook.ts(function hook onStop),activeGoalStore.ts(per-session state + terminal-observer channel for UI bridging).packages/core/src/hooks/hookSystem.ts+sessionHooksManager.ts—hasHooksForEvent(eventName, sessionId?)now also counts session-scoped function hooks. Pre-existing latent bug: SDK / programmatic Stop function hooks were silently gated out by the fast-path check inclient.ts, so they never fired.packages/core/src/core/client.ts(StopHookLoopemission) — yields on every Stop-hook continuation, not just iter > 1. First not-met turn is now visible.packages/cli/src/ui/hooks/useGeminiStream.ts(Finishedhandler) — commits the pending history item + clearsthoughtBuffer/geminiMessageBufferat every turn boundary so a Stop-hook continuation doesn't bleed its first chunk into the previous turn's pending UI buffer (the persisted transcript was always clean — this was a stream-buffer issue).packages/cli/src/ui/commands/goalCommand.ts+restoreGoal.ts— the command surface, plus resume-time rehydration of the active hook and the last-completed-goal summary cache from transcript.Validation
/goalshows the achievement card.cd packages/core && npx vitest run src/goals//goal 输出A、B、C,每个对话单独输出一个字母→ three turns each producing one letter + auto-clear.Scope / Risk
HookSystem.hasHooksForEventnow considers session-scoped function hooks. Code that registered a Stop function hook expecting the prior "silently no-ops" behavior would now see its hook actually fire. Realistically that prior behavior was a latent bug; any code that registered a function hook wanted it to run. The two affected tests inconfig.test.tsare updated to pin the new(eventName, sessionId)signature.Finished-event buffer reset: the new commit-pending + clear-buffers logic inuseGeminiStreamruns on every turn boundary. Verified against the existinguseGeminiStream.test.tsxsuite (91 tests) — no regressions.StopHookLoopon first iteration: removing the> 1guard means user-configured Stop hooks now also surface their blocking reason on the first fire. Previously the first not-met was silent, which masked configured hooks too. The UI rendering for non-goal Stop hooks is unchanged (● Ran N stop hooks ⎿ Stop hook error: …); only the visibility threshold dropped./resume: in scope.restoreGoalFromHistoryscans the loaded transcript for the most recentset(to re-arm the hook) and the most recentachieved/aborted(to repopulate the summary cache)./clear,/branch,/newcontinue to wipe goal state via session-id rotation.Stopevent only; subagents have their own session id and fireSubagentStop, so they cannot trigger or interact with an active goal.sessionIdarg onHookSystem.hasHooksForEvent(existing single-arg call sites keep working).Testing Matrix
macOS arm64 (M-series) interactive run + the unit test suite were exercised locally. The only platform-specific surface added is the Footer pill glyph (
◎) and the history card glyphs (○,✓,!) — plain Unicode, no platform-specific behavior expected.Linked Issues / Bugs
Closes #4074