Conversation
…fications (#140) Fixes eval-loss bug. Previously, `is_background: true` shell commands detached via `&` and the OS sent their stdout/stderr to nowhere — protoCLI's data callback only fires while the wrapper is attached. Long-running processes like eval suites would print results that nothing ever saw. Mirrors cc-2.18's task framework, scoped to local shells: - packages/core/src/backgroundShells/: new module * registry.ts — BackgroundShellRegistry: in-memory map of taskId → {status, command, outputPath, pid, exitCode, ...}, with drainPendingNotifications() for the next-turn injection. * diskOutput.ts — path helpers + readBackgroundTaskOutput / Exit / Pid. Files live at <projectTempDir>/<sessionId>/tasks/<taskId>.{output,exit,pid}. * watcher.ts — polls the .exit sentinel (with PID-liveness fallback), marks the task complete/failed in the registry once the bg process exits. * notifications.ts — builds <task_notification> blocks (task_id, output_file, status, exit_code, summary). - shell.ts: when is_background=true on non-Windows, generate a taskId, redirect stdout/stderr at the shell level into the per-task output file, capture the bg PID and exit code via sentinel files, register the task, and start the watcher. Tool result returns the file path + task ID so the model can `Read` it later. - core/client.ts: drain completed-but-unnotified tasks at the start of each user query and prepend <task_notification> blocks to the request, matching how plan/subagent/arena reminders are added. - bg-stop.ts: new bg_stop tool. SIGTERM the process group, escalate to SIGKILL after 3s, mark the task killed in the registry. Registered as a core tool alongside the existing task-* family. - bgCommand.ts: new /bg slash command listing running and recent background tasks with id, status, age, command, output path, pid. Tests: shell.test.ts updated for new wrapper format; 55/55 pass. Full core suite (5,337 tests) and cli suite (3,767 tests) pass. Deferred (worth follow-up PRs): - Auto-background of long foreground commands and Ctrl+B keybinding. Both need foreground shell execution rearchitected so output tees to disk from second 1 and the await can be interrupted mid-flight without killing the process. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a background-shell subsystem: types, on-disk task outputs, an in-memory per-session registry, a file watcher and notifications, config/client integration, a bg-stop tool, shell tool changes to launch/register background tasks, a CLI Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Client
participant ShellTool as Shell Tool
participant Disk as Disk (task dir/files)
participant Registry
participant Watcher
participant Notification
User->>Client: Request to run background command
Client->>ShellTool: Execute shell tool (is_background)
ShellTool->>Disk: create task dir & redirect stdout/stderr to output file
ShellTool->>Disk: write exit sentinel and pid sentinel
ShellTool->>Registry: register(task metadata, pid?)
ShellTool->>Watcher: startBackgroundShellWatcher(config, taskId)
ShellTool-->>Client: Return "Background command started" with task id & output path
Client-->>User: Acknowledge background task started
par Watcher polling loop
Watcher->>Registry: read task state (running?)
Watcher->>Disk: read exit sentinel
alt exit sentinel present
Watcher->>Registry: markExit(taskId, completed/failed, exitCode)
Watcher->>Notification: buildBackgroundTaskNotification(task)
else no sentinel
Watcher->>Disk: check pid liveness (best-effort)
alt pid dead and no sentinel after wait
Watcher->>Registry: markExit(taskId, failed, null)
Watcher->>Notification: buildBackgroundTaskNotification(task)
end
end
end
Client->>Registry: drainPendingNotifications()
Registry-->>Client: drained notifications (atomically flagged)
Client->>User: prepend notifications to next request
sequenceDiagram
autonumber
participant User
participant Client
participant BgStopTool as Bg Stop Tool
participant Registry
participant OS
User->>Client: Request bg_stop(task_id)
Client->>BgStopTool: invoke with task_id
BgStopTool->>Registry: get(task_id)
alt task missing or not running
BgStopTool-->>Client: Return "task missing/not running"
else task running
alt pid present
BgStopTool->>OS: send SIGTERM to process group (fallback to leader)
BgStopTool->>OS: check liveness
BgStopTool-->>BgStopTool: schedule SIGKILL in 3s if still alive
else no pid available
BgStopTool-->>Registry: mark task killed (no signal)
end
BgStopTool->>Registry: mark task killed
BgStopTool-->>Client: Return success message
end
Client-->>User: Confirm termination status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/backgroundShells/notifications.ts`:
- Around line 44-53: The XML builder currently only escapes the summary but
interpolates task.id, task.outputPath, task.status (and any exitLine content)
raw into tags like <task_id>, <output_file>, <status>, which can break XML;
update the code that constructs the notification string to run task.id,
task.outputPath, task.status (and exitLine if it can contain dynamic text)
through the existing escapeXml helper before interpolation so all dynamic fields
are XML-escaped (preserve the same tag names: <task_id>, <output_file>,
<status>, <summary>, <task_notification>).
In `@packages/core/src/config/config.ts`:
- Around line 1225-1235: The cached BackgroundShellRegistry instance stored on
the Config object can leak tasks across sessions; update startNewSession (the
method that begins a new session) to reset/clear the
Config.backgroundShellRegistry (or set it to undefined/null) so
getBackgroundShellRegistry() will construct a fresh BackgroundShellRegistry for
the new session; ensure you reference the backgroundShellRegistry property and
the getBackgroundShellRegistry() and startNewSession() methods when making this
change and also clear any session-specific disk capture paths associated with
tasks if present.
In `@packages/core/src/tools/bg-stop.ts`:
- Around line 99-104: The current logic in bg-stop.ts marks the task killed when
task.pid is falsy; instead, do not call registry.markExit or change state if no
PID is captured—return a message indicating the PID isn't available yet and that
the stop will be retried once PID is present; when a PID is present use the
local pid variable (not task.pid) for signaling. Specifically, update the branch
that checks task.pid to avoid transitioning to 'killed' (remove or skip
registry.markExit(task.id, 'killed', ...)), return a clear
llmContent/returnDisplay explaining PID not captured yet, and ensure later code
uses the pid variable for sending signals and for any markExit calls.
In `@packages/core/src/tools/shell.ts`:
- Line 316: The redirect target tempFilePath is injected unquoted into the shell
command ("pgrep -g 0 > ${tempFilePath} 2>&1"), which breaks when TMPDIR contains
spaces or metacharacters; update the command construction in
packages/core/src/tools/shell.ts to shell-quote or escape tempFilePath when
building that command (e.g., wrap the redirect target in single quotes or use a
shell-escaping helper) so the spawned shell sees a single safe filename token
for the redirection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3a89fb8d-5fc8-457b-995f-2f0e27f67c69
📒 Files selected for processing (15)
packages/cli/src/services/BuiltinCommandLoader.tspackages/cli/src/ui/commands/bgCommand.tspackages/core/src/backgroundShells/diskOutput.tspackages/core/src/backgroundShells/index.tspackages/core/src/backgroundShells/notifications.tspackages/core/src/backgroundShells/registry.tspackages/core/src/backgroundShells/types.tspackages/core/src/backgroundShells/watcher.tspackages/core/src/config/config.tspackages/core/src/core/client.tspackages/core/src/index.tspackages/core/src/tools/bg-stop.tspackages/core/src/tools/shell.test.tspackages/core/src/tools/shell.tspackages/core/src/tools/tool-names.ts
| return [ | ||
| '<task_notification>', | ||
| `<task_id>${task.id}</task_id>`, | ||
| `<output_file>${task.outputPath}</output_file>`, | ||
| `<status>${task.status}</status>${exitLine}`, | ||
| `<summary>${escapeXml(summary)}</summary>`, | ||
| '</task_notification>', | ||
| '', | ||
| `Read ${task.outputPath} to see the full output.`, | ||
| ].join('\n'); |
There was a problem hiding this comment.
Escape all dynamic XML fields, not just the summary.
Lines 46–48 interpolate task.id, task.outputPath, and task.status directly into XML. If any value contains XML metacharacters, <task_notification> can become malformed.
💡 Suggested fix
export function buildBackgroundTaskNotification(
task: BackgroundShellTask,
): string {
+ const safeTaskId = escapeXml(task.id);
+ const safeOutputPath = escapeXml(task.outputPath);
+ const safeStatus = escapeXml(task.status);
const exitLine =
task.exitCode !== undefined && task.exitCode !== null
? `\n<exit_code>${task.exitCode}</exit_code>`
: '';
@@
return [
'<task_notification>',
- `<task_id>${task.id}</task_id>`,
- `<output_file>${task.outputPath}</output_file>`,
- `<status>${task.status}</status>${exitLine}`,
+ `<task_id>${safeTaskId}</task_id>`,
+ `<output_file>${safeOutputPath}</output_file>`,
+ `<status>${safeStatus}</status>${exitLine}`,
`<summary>${escapeXml(summary)}</summary>`,
'</task_notification>',
'',
`Read ${task.outputPath} to see the full output.`,
].join('\n');
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/backgroundShells/notifications.ts` around lines 44 - 53,
The XML builder currently only escapes the summary but interpolates task.id,
task.outputPath, task.status (and any exitLine content) raw into tags like
<task_id>, <output_file>, <status>, which can break XML; update the code that
constructs the notification string to run task.id, task.outputPath, task.status
(and exitLine if it can contain dynamic text) through the existing escapeXml
helper before interpolation so all dynamic fields are XML-escaped (preserve the
same tag names: <task_id>, <output_file>, <status>, <summary>,
<task_notification>).
| /** | ||
| * Returns the registry of long-running background shell tasks. Lazy | ||
| * — only constructed once a task is registered. Disk capture lives at | ||
| * <projectTempDir>/<sessionId>/tasks/<taskId>.output. | ||
| */ | ||
| getBackgroundShellRegistry(): BackgroundShellRegistry { | ||
| if (!this.backgroundShellRegistry) { | ||
| this.backgroundShellRegistry = new BackgroundShellRegistry(); | ||
| } | ||
| return this.backgroundShellRegistry; | ||
| } |
There was a problem hiding this comment.
Background task state can leak across sessions.
Line 1230 returns a cached registry instance, but startNewSession (Line 1253+) does not clear it. Reusing Config can carry old tasks/notifications into a new session.
💡 Suggested fix
startNewSession(
sessionId?: string,
sessionData?: ResumedSessionData,
): string {
this.sessionId = sessionId ?? randomUUID();
this.sessionData = sessionData;
+ // Reset session-scoped background task state to prevent cross-session leakage.
+ this.backgroundShellRegistry = undefined;
setDebugLogSession(this);
this.debugLogger = createDebugLogger();
this.chatRecordingService = this.chatRecordingEnabled
? new ChatRecordingService(this)
: undefined;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/config/config.ts` around lines 1225 - 1235, The cached
BackgroundShellRegistry instance stored on the Config object can leak tasks
across sessions; update startNewSession (the method that begins a new session)
to reset/clear the Config.backgroundShellRegistry (or set it to undefined/null)
so getBackgroundShellRegistry() will construct a fresh BackgroundShellRegistry
for the new session; ensure you reference the backgroundShellRegistry property
and the getBackgroundShellRegistry() and startNewSession() methods when making
this change and also clear any session-specific disk capture paths associated
with tasks if present.
| if (!task.pid) { | ||
| // No PID captured — can't signal. Mark killed so the next-turn | ||
| // notification fires anyway. | ||
| registry.markExit(task.id, 'killed', null); | ||
| const msg = `Background task "${task.id}" had no captured PID; marked killed without signal.`; | ||
| return { llmContent: msg, returnDisplay: msg }; |
There was a problem hiding this comment.
Don't transition a task to killed before you have a PID to signal.
Line 102 treats “PID not captured yet” as “task successfully killed.” Because shell.ts registers from a best-effort .pid read, an immediate bg_stop can report success, stop the watcher on its next poll, and leave the underlying process running with no further tracking.
Proposed fix
+import { readBackgroundTaskPid } from '../backgroundShells/diskOutput.js';
...
- if (!task.pid) {
- // No PID captured — can't signal. Mark killed so the next-turn
- // notification fires anyway.
- registry.markExit(task.id, 'killed', null);
- const msg = `Background task "${task.id}" had no captured PID; marked killed without signal.`;
- return { llmContent: msg, returnDisplay: msg };
- }
+ let pid =
+ task.pid ?? (await readBackgroundTaskPid(this.config, task.id)) ?? undefined;
+ if (pid == null) {
+ const msg = `Background task "${task.id}" is still starting; PID is not available yet. Try again shortly.`;
+ return {
+ llmContent: msg,
+ returnDisplay: msg,
+ error: { message: msg },
+ };
+ }
+ registry.update(task.id, { pid });Then use pid below instead of task.pid.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/tools/bg-stop.ts` around lines 99 - 104, The current logic
in bg-stop.ts marks the task killed when task.pid is falsy; instead, do not call
registry.markExit or change state if no PID is captured—return a message
indicating the PID isn't available yet and that the stop will be retried once
PID is present; when a PID is present use the local pid variable (not task.pid)
for signaling. Specifically, update the branch that checks task.pid to avoid
transitioning to 'killed' (remove or skip registry.markExit(task.id, 'killed',
...)), return a clear llmContent/returnDisplay explaining PID not captured yet,
and ensure later code uses the pid variable for sending signals and for any
markExit calls.
| `__bgpid=$!`, | ||
| `echo $__bgpid > "${backgroundPidPath}"`, | ||
| // pgrep for legacy "child PIDs" reporting | ||
| `pgrep -g 0 > ${tempFilePath} 2>&1`, |
There was a problem hiding this comment.
Shell-quote the temp-file redirect path.
Line 316 injects tempFilePath directly into shell source. A valid TMPDIR containing spaces or shell metacharacters will break the wrapper before pgrep writes anything, so child PID capture becomes environment-dependent.
#!/bin/bash
# Verifies that an unquoted redirect target with spaces fails while a quoted one succeeds.
set -euo pipefail
tmpdir="$(mktemp -d)"
target="$tmpdir/dir with space/out.txt"
mkdir -p "$(dirname "$target")"
if bash -lc "echo ok > $target" 2>/dev/null; then
echo "unexpected success"
exit 1
fi
bash -lc "echo ok > '$target'"
test -f "$target"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/tools/shell.ts` at line 316, The redirect target
tempFilePath is injected unquoted into the shell command ("pgrep -g 0 >
${tempFilePath} 2>&1"), which breaks when TMPDIR contains spaces or
metacharacters; update the command construction in
packages/core/src/tools/shell.ts to shell-quote or escape tempFilePath when
building that command (e.g., wrap the redirect target in single quotes or use a
shell-escaping helper) so the spawned shell sees a single safe filename token
for the redirection.
…143) Updates the surface that tells the model and the user how the new background-shell pipeline actually behaves: - Shell tool description (surfaced in the system prompt): explicitly states that backgrounded commands return a Task ID + Output file path, that the OS keeps writing to that file after the wrapper exits, that <task_notification> blocks land on the next turn, and that bg_stop is the way to kill a runaway. Adds long evals / batch jobs to the recommended is_background:true cases. - is_background parameter description: rewritten to mention task ID, output file, the no-poll contract, and bg_stop. - Core system prompt's "Background Processes" bullet: rewritten in the same shape so the agent knows the contract from turn 1. - docs/reference/tools/shell.md: separate Background-tasks subsection documenting the file path, task notification format, /bg, and the Windows fallback caveat. - docs/reference/tools/bg-stop.md (new): full reference page for the bg_stop tool, registered in _meta.ts. - docs/architecture/divergence-from-upstream.md: adds the background-shell capture entry under "Net-new tools" so future maintainers can spot it as fork-unique. - README.md: adds a "Long-running Background Shells" section that motivates the feature and points at /bg + bg_stop. - Snapshots updated for prompts.test.ts and shell.test.ts; all 5,337 core tests pass. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/reference/tools/bg-stop.md`:
- Around line 23-25: Add a language tag (use "text") to each fenced code block
examples that are currently untagged: the block containing `Background task
"7f9c…" stopped (SIGTERM: <reason if provided>).`, the block containing `No
background task with ID "<task_id>".`, and the multi-line block beginning with
`Background command started.` (including Task ID/Output file/PID lines); update
each opening ``` to ```text so markdownlint MD040 is resolved.
In `@docs/reference/tools/shell.md`:
- Line 37: The docs entry incorrectly lists separate Stdout and Stderr for
foreground runs; update the documentation to match the actual keys emitted by
packages/core/src/tools/shell.ts (it emits Output and Error for foreground
commands). Edit the line in docs/reference/tools/shell.md to replace `Stdout`,
`Stderr` with `Output`, `Error` and ensure the rest of the sentence references
the same set (`Command`, `Directory`, `Output`, `Error`, `Exit Code`, `Signal`,
`Process Group PGID`) so docs match the shell tool's behavior.
In `@packages/core/src/core/prompts.ts`:
- Line 372: The documentation/example text in packages/core/src/core/prompts.ts
still shows shell backgrounding like "node server.js &" which conflicts with the
new rule that background tasks must use is_background: true for ToolNames.SHELL;
update every embedded example in this file (search for occurrences of "&" in
example commands and examples referencing backgrounding) to instead demonstrate
passing is_background: true to ToolNames.SHELL and to use the corresponding flow
with ToolNames.READ_FILE, ToolNames.BG_STOP, and the <task_notification> token;
keep the same explanatory text but replace the ampersand-based examples with
examples that call the shell tool with is_background: true and show reading the
output file and stopping via BG_STOP using the task_id.
In `@packages/core/src/tools/shell.ts`:
- Around line 719-723: The description for the is_background option in the Shell
tool currently promises cross-platform behavior (Task ID, output-file
persistence, <task_notification>, bg_stop) but the Windows implementation only
returns a PID and doesn't register tasks; update the is_background description
in packages/core/src/tools/shell.ts to be platform-specific by either (A)
clarifying that the full Task ID/notification/bg_stop/output-file behavior
applies only on non-Windows platforms and Windows returns only a PID without
notifications, or (B) change wording to state differences explicitly for Windows
vs non-Windows; locate the is_background property in the Shell tool schema and
modify its description text to reflect platform-specific behavior so callers
won’t assume Windows support for task notifications or bg_stop.
- Around line 429-443: The registry currently stores the child's PID (realPid)
but bg_stop needs the process-group leader (the wrapper) PID; update the
registry.register call in the background-start path to set pid to the wrapper's
PID (result.pid) instead of realPid — e.g., use result.pid (or result.pid ??
realPid as a safe fallback) for the pid field when calling
getBackgroundShellRegistry().register, leaving other fields (id, command,
description, cwd, outputPath) unchanged and keeping
startBackgroundShellWatcher(...) as is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0b04e36a-28d7-4a53-a0fd-4173e6386c59
⛔ Files ignored due to path filters (2)
packages/core/src/core/__snapshots__/prompts.test.ts.snapis excluded by!**/*.snappackages/core/src/tools/__snapshots__/shell.test.ts.snapis excluded by!**/*.snap
📒 Files selected for processing (7)
README.mddocs/architecture/divergence-from-upstream.mddocs/reference/tools/_meta.tsdocs/reference/tools/bg-stop.mddocs/reference/tools/shell.mdpackages/core/src/core/prompts.tspackages/core/src/tools/shell.ts
✅ Files skipped from review due to trivial changes (3)
- docs/reference/tools/_meta.ts
- README.md
- docs/architecture/divergence-from-upstream.md
| ``` | ||
| Background task "7f9c…" stopped (SIGTERM: <reason if provided>). | ||
| ``` |
There was a problem hiding this comment.
Add languages to the fenced examples.
markdownlint-cli2 is already flagging these three blocks with MD040, so this page will stay lint-noisy until each fence has a language (text is enough here).
Suggested fix
-```
+```text
Background task "7f9c…" stopped (SIGTERM: <reason if provided>).- +text
No background task with ID "<task_id>".
-```
+```text
Background command started.
Task ID: 7f9c…
Output file: /tmp/proto/<project-hash>/<session>/tasks/7f9c…output
PID: 54322
</details>
Also applies to: 29-31, 37-42
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 23-23: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @docs/reference/tools/bg-stop.md around lines 23 - 25, Add a language tag
(use "text") to each fenced code block examples that are currently untagged: the
block containing Background task "7f9c…" stopped (SIGTERM: <reason if provided>)., the block containing No background task with ID "<task_id>".,
and the multi-line block beginning with Background command started. (including
Task ID/Output file/PID lines); update each opening totext so
markdownlint MD040 is resolved.
</details>
<!-- fingerprinting:phantom:medusa:grasshopper:a20f7c3f-f901-41ac-9101-2b9115f8b8cd -->
<!-- d98c2f50 -->
<!-- This is an auto-generated comment by CodeRabbit -->
| ## Output | ||
|
|
||
| Returns `Command`, `Directory`, `Stdout`, `Stderr`, `Error`, `Exit Code`, `Signal`, and `Background PIDs`. | ||
| For **foreground** commands, the tool returns `Command`, `Directory`, `Stdout`, `Stderr`, `Error`, `Exit Code`, `Signal`, and `Process Group PGID`. |
There was a problem hiding this comment.
Foreground output keys are documented incorrectly.
The tool does not return separate Stdout and Stderr fields here; packages/core/src/tools/shell.ts still emits Output and Error for foreground runs. Documenting the wrong keys will send users looking for fields that never appear.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/reference/tools/shell.md` at line 37, The docs entry incorrectly lists
separate Stdout and Stderr for foreground runs; update the documentation to
match the actual keys emitted by packages/core/src/tools/shell.ts (it emits
Output and Error for foreground commands). Edit the line in
docs/reference/tools/shell.md to replace `Stdout`, `Stderr` with `Output`,
`Error` and ensure the rest of the sentence references the same set (`Command`,
`Directory`, `Output`, `Error`, `Exit Code`, `Signal`, `Process Group PGID`) so
docs match the shell tool's behavior.
| - **Parallelism:** Execute multiple independent tool calls in parallel when feasible (i.e. searching the codebase). | ||
| - **Command Execution:** Use the '${ToolNames.SHELL}' tool for running shell commands, remembering the safety rule to explain modifying commands first. | ||
| - **Background Processes:** Use background processes (via \`&\`) for commands that are unlikely to stop on their own, e.g. \`node server.js &\`. If unsure, ask the user. | ||
| - **Background Processes:** For commands that are unlikely to stop on their own (servers, watchers) or that you'll want to inspect later (long evals, batch jobs), pass \`is_background: true\` to '${ToolNames.SHELL}'. The tool returns a \`Task ID\` and an \`Output file\` path; stdout/stderr is redirected to that file at the shell level so output is never lost. A \`<task_notification>\` arrives on the next turn when the task exits — do not poll. Read the output file with '${ToolNames.READ_FILE}' to inspect progress or final results, and stop a runaway task with '${ToolNames.BG_STOP}' (passing the \`task_id\`). |
There was a problem hiding this comment.
Update the embedded examples to match this new rule.
This sentence says backgrounding should be expressed via is_background: true, but the examples later in this file still show commands like node server.js &. Those examples will keep priming the model to emit & unless they are updated too.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/core/prompts.ts` at line 372, The documentation/example
text in packages/core/src/core/prompts.ts still shows shell backgrounding like
"node server.js &" which conflicts with the new rule that background tasks must
use is_background: true for ToolNames.SHELL; update every embedded example in
this file (search for occurrences of "&" in example commands and examples
referencing backgrounding) to instead demonstrate passing is_background: true to
ToolNames.SHELL and to use the corresponding flow with ToolNames.READ_FILE,
ToolNames.BG_STOP, and the <task_notification> token; keep the same explanatory
text but replace the ampersand-based examples with examples that call the shell
tool with is_background: true and show reading the output file and stopping via
BG_STOP using the task_id.
| if (backgroundTaskId && backgroundOutputPath) { | ||
| const realPid = await readBackgroundTaskPid( | ||
| this.config, | ||
| backgroundTaskId, | ||
| ); | ||
| const registry = this.config.getBackgroundShellRegistry(); | ||
| registry.register({ | ||
| id: backgroundTaskId, | ||
| command: this.params.command, | ||
| description: this.params.description ?? this.params.command, | ||
| cwd, | ||
| outputPath: backgroundOutputPath, | ||
| pid: realPid ?? undefined, | ||
| }); | ||
| startBackgroundShellWatcher(this.config, backgroundTaskId); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
bash -lc '
wrapper_pid=$$
wrapper_pgid=$(ps -o pgid= -p "$$" | tr -d " ")
(sleep 30) &
bgpid=$!
bg_pgid=$(ps -o pgid= -p "$bgpid" | tr -d " ")
echo "wrapper_pid=$wrapper_pid wrapper_pgid=$wrapper_pgid"
echo "bgpid=$bgpid bg_pgid=$bg_pgid"
kill "$bgpid"
wait "$bgpid" 2>/dev/null || true
'Repository: protoLabsAI/protoCLI
Length of output: 298
🏁 Script executed:
# First, find the relevant files
find packages -name "bg-stop.ts" -o -name "shell.ts" | head -20Repository: protoLabsAI/protoCLI
Length of output: 132
🏁 Script executed:
# Read bg-stop.ts to verify kill(-pid, ...) usage
cat -n packages/core/src/tools/bg-stop.ts | head -100Repository: protoLabsAI/protoCLI
Length of output: 3483
🏁 Script executed:
# Read the specific section of shell.ts mentioned in review (lines 295-319 for wrapper, 429-443 for registry)
sed -n '295,443p' packages/core/src/tools/shell.ts | cat -nRepository: protoLabsAI/protoCLI
Length of output: 6861
🏁 Script executed:
# Find ShellExecutionService definition
rg -t ts "class ShellExecutionService|export.*ShellExecutionService" --max-count=10Repository: protoLabsAI/protoCLI
Length of output: 153
🏁 Script executed:
# Look for the execute method signature and return type
rg -t ts "execute\(" packages/core/src/tools/shell.ts -A 3 -B 1 | head -40Repository: protoLabsAI/protoCLI
Length of output: 389
🏁 Script executed:
# Check what the registry interface expects and how it uses the pid
rg -t ts "interface.*BackgroundShellTask|getBackgroundShellRegistry" -A 10 | head -60Repository: protoLabsAI/protoCLI
Length of output: 4073
🏁 Script executed:
# Look at the complete context around line 33 to see variable names
sed -n '1,100p' packages/core/src/tools/shell.ts | cat -nRepository: protoLabsAI/protoCLI
Length of output: 3860
🏁 Script executed:
# Get the complete BackgroundShellTask interface
cat -n packages/core/src/backgroundShells/types.ts | head -50Repository: protoLabsAI/protoCLI
Length of output: 1635
🏁 Script executed:
# Look at ShellExecutionService.execute return type
rg -t ts "execute\s*\(" packages/core/src/services/shellExecutionService.ts -A 30 | head -60Repository: protoLabsAI/protoCLI
Length of output: 841
🏁 Script executed:
# Find where result.pid is set or defined
rg -t ts "result\.pid|return.*pid" packages/core/src/services/shellExecutionService.ts -B 2 -A 2Repository: protoLabsAI/protoCLI
Length of output: 839
🏁 Script executed:
# Look at what ShellExecutionService.execute returns in detail
cat -n packages/core/src/services/shellExecutionService.ts | grep -A 200 "interface.*ShellExecutionHandle\|type.*ShellExecutionHandle"Repository: protoLabsAI/protoCLI
Length of output: 6529
🏁 Script executed:
# Search for ShellExecutionResult type definition
rg -t ts "interface ShellExecutionResult|type ShellExecutionResult" packages/ -A 10Repository: protoLabsAI/protoCLI
Length of output: 2916
🏁 Script executed:
# Look at the full context around where result is created in shell.ts to see all available variables
sed -n '320,450p' packages/core/src/tools/shell.ts | cat -nRepository: protoLabsAI/protoCLI
Length of output: 5696
🏁 Script executed:
# Verify there's no pid field in ShellExecutionResult
rg -t ts "interface ShellExecutionResult" -A 15 packages/core/src/services/shellExecutionService.tsRepository: protoLabsAI/protoCLI
Length of output: 713
🏁 Script executed:
# Check what pid and result variables are available in the context around line 116
sed -n '1,150p' packages/core/src/services/shellExecutionService.ts | grep -A 10 "interface ShellExecutionResult"Repository: protoLabsAI/protoCLI
Length of output: 495
🏁 Script executed:
# Double-check the ShellExecutionHandle interface
rg -t ts "interface ShellExecutionHandle" -A 5 packages/core/src/services/shellExecutionService.tsRepository: protoLabsAI/protoCLI
Length of output: 289
🏁 Script executed:
# Check if result.pid is available at the registration point by looking at the full variable scope
sed -n '320,450p' packages/core/src/tools/shell.ts | grep -B 30 "registry.register"Repository: protoLabsAI/protoCLI
Length of output: 1132
🏁 Script executed:
# Also verify what the BackgroundShellTask comment says about the pid field
cat -n packages/core/src/backgroundShells/types.ts | grep -B 2 -A 2 "PID of the spawned"Repository: protoLabsAI/protoCLI
Length of output: 364
🏁 Script executed:
# Verify that pid and result.pid are the same by looking at how they're assigned in ShellExecutionService
rg -t ts "return.*pid.*result" packages/core/src/services/shellExecutionService.ts -B 5 -A 2Repository: protoLabsAI/protoCLI
Length of output: 426
🏁 Script executed:
# Also check if there's any documentation or usage pattern showing pid vs result.pid
rg -t ts "result\.pid|\.pid.*result" packages/core/src/tools/shell.ts -B 3 -A 3Repository: protoLabsAI/protoCLI
Length of output: 529
🏁 Script executed:
# Check the complete wrapper script to understand process group behavior
sed -n '295,325p' packages/core/src/tools/shell.ts | cat -nRepository: protoLabsAI/protoCLI
Length of output: 1873
🏁 Script executed:
# Verify that result.pid is the wrapper's PID by checking how it's used in comparisons
rg -t ts "result\.pid" packages/core/src/tools/shell.ts -B 2 -A 2Repository: protoLabsAI/protoCLI
Length of output: 421
Register the process-group ID, not the child PID, for bg_stop.
bg_stop signals -task.pid via process.kill(), which requires the process group leader's PID. The code currently stores the background child's PID (realPid, from $!), but the wrapper shell's PID (result.pid) is the actual process group leader. When the child runs in the wrapper's process group via { ( command ) & }, signaling the child's PID alone misses the group. Use result.pid (the wrapper PGID) in the registry instead.
Suggested fix
registry.register({
id: backgroundTaskId,
command: this.params.command,
description: this.params.description ?? this.params.command,
cwd,
outputPath: backgroundOutputPath,
- pid: realPid ?? undefined,
+ pid: result.pid ?? realPid ?? undefined,
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/tools/shell.ts` around lines 429 - 443, The registry
currently stores the child's PID (realPid) but bg_stop needs the process-group
leader (the wrapper) PID; update the registry.register call in the
background-start path to set pid to the wrapper's PID (result.pid) instead of
realPid — e.g., use result.pid (or result.pid ?? realPid as a safe fallback) for
the pid field when calling getBackgroundShellRegistry().register, leaving other
fields (id, command, description, cwd, outputPath) unchanged and keeping
startBackgroundShellWatcher(...) as is.
| is_background: { | ||
| type: 'boolean', | ||
| description: | ||
| 'Optional: Whether to run the command in background. If not specified, defaults to false (foreground execution). Explicitly set to true for long-running processes like development servers, watchers, or daemons that should continue running without blocking further commands.', | ||
| 'Optional: Whether to run the command in background. Defaults to false. Set to true for long-running processes (servers, watchers, evals, batch jobs). Background tasks return a Task ID and an Output file path; stdout/stderr is redirected to that file by the OS so output survives even after this tool returns. You will receive a <task_notification> on the next turn when the task exits — do not poll. Stop a runaway task with the bg_stop tool.', | ||
| }, |
There was a problem hiding this comment.
Make the background-task description platform-specific.
This text now promises Task ID, output-file persistence, automatic <task_notification>, and bg_stop for every background command, but the Windows path still returns only a PID and never registers the task. That contract mismatch will cause bad follow-up tool calls on Windows unless the description is gated to non-Windows or the Windows implementation is brought up to parity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/tools/shell.ts` around lines 719 - 723, The description for
the is_background option in the Shell tool currently promises cross-platform
behavior (Task ID, output-file persistence, <task_notification>, bg_stop) but
the Windows implementation only returns a PID and doesn't register tasks; update
the is_background description in packages/core/src/tools/shell.ts to be
platform-specific by either (A) clarifying that the full Task
ID/notification/bg_stop/output-file behavior applies only on non-Windows
platforms and Windows returns only a PID without notifications, or (B) change
wording to state differences explicitly for Windows vs non-Windows; locate the
is_background property in the Shell tool schema and modify its description text
to reflect platform-specific behavior so callers won’t assume Windows support
for task notifications or bg_stop.
… span (#145) Symptom (also reported on the Langfuse trace, session 442ed5c7, turn ba924d250d7c with 739s latency and zero LLM activity in the middle): user presses Esc to cancel, then any subsequent input is silently dropped — UI stays in the loading indicator forever. Root cause: - cancelOngoingRequest aborts the AbortController and flips isResponding=false, but does not clear toolCalls. If a tool ignores its AbortSignal, it stays in 'executing' / 'scheduled' / etc., or finishes with responseSubmittedToGemini=false. Either way, streamingState computes 'Responding' (useGeminiStream.ts: 424-437) and submitQuery's guard (1305-1314) silently drops the next user submission. The in-code comment at line 244-245 even flags this class of bug for a different cause. - cancelOngoingRequest never calls endTurnSpan, so the OTel turn span leaks. The recap + prompt-suggestion LLM calls that fire on streamingState=Idle then attach to the dead span — Langfuse reports the turn as 12 minutes long when the actual model work was 1.7 seconds. Inheritance: upstream qwen-code has the identical cancelOngoingRequest shape and the identical silent-return guard. gemini-cli upstream's PR #21960 (closing #21096) addressed a different cancel-related bug (retry-loop loading indicator showing stale "still on it" text), not the stuck-toolCalls case. Issue #18525 there ("Agent Stuck between Responses") is essentially the same symptom and is still open. So this is inherited, not introduced — but the leaked turn span is fork-only, since startTurnSpan/endTurnSpan are part of our agent harness. Fix: - useReactToolScheduler exposes a new forceCancelStaleToolCalls() that flips responseSubmittedToGemini=true on terminal calls and synthesizes a 'cancelled' state for any non-terminal call (with a clear "User cancelled. Tool was force-cleared after the abort signal did not stop it within the grace window" message in the responseParts so downstream consumers don't choke). - cancelOngoingRequest in useGeminiStream: * marks every current toolCall as submitted immediately (handles the common case where the tool finished but the flag wasn't flipped), * schedules a 3s setTimeout that calls forceCancelStaleToolCalls and surfaces a WARNING if anything had to be force-cleared so the user knows the underlying process may still be running, * calls endTurnSpan('ok') so the recap/suggestion LLM calls don't keep nesting under a dead turn span in Langfuse. - submitQuery no longer silently drops submissions when streamingState is non-Idle. It logs a clear WARNING explaining what state we're in (Responding / WaitingForConfirmation / Backgrounded) and what the user should do (approve the tool, wait, or press Esc). Tests: 3,767 cli tests + 5,337 core tests pass. Existing cancellation tests in useGeminiStream.test.tsx (49 tests in that file) continue to pass with the new flow. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/cli/src/ui/hooks/useGeminiStream.ts (1)
1347-1387:⚠️ Potential issue | 🟠 MajorThe new warning is still skipped while a request/backgrounded turn is actually active.
queryGuardRef.current.tryStart()returnsnullbefore thisstreamingStatebranch runs, so normal user submissions during an in-flight response or backgrounded turn still hit the silent return at Lines 1349-1351. The warning only appears after the guard has already been released, which misses the main UX case this change is trying to address.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cli/src/ui/hooks/useGeminiStream.ts` around lines 1347 - 1387, The guard is acquired too early so tryStart() can return null and cause a silent return before you can surface the warning; in useGeminiStream adjust the logic so you check streamingState (Responding / WaitingForConfirmation / Backgrounded) and call addItem(...) to show the warning before attempting queryGuardRef.current.tryStart(), or if you must keep tryStart first then detect generation === null and streamingState indicates an active turn and call addItem(...) (using the same stateLabel logic) instead of returning silently; reference the generation variable, queryGuardRef.current.tryStart()/end(), the streamingState enum checks, and the addItem call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/cli/src/ui/hooks/useGeminiStream.ts`:
- Around line 550-561: The timeout in useGeminiStream.ts currently calls
forceCancelStaleToolCallsRef.current() after 3s which acts on the global
toolCallsRef state at fire time and can mistakenly cancel tools from a new turn;
change the pattern so that at cancellation time you capture the specific IDs (or
prompt/turn ID) of the tool calls to be cleared (e.g., snapshot an array of
cancelled IDs) and pass that snapshot into the delayed callback, then update
forceCancelStaleToolCallsRef.current or provide a new function (e.g.,
forceCancelSpecificToolCalls) that accepts the captured IDs and only cancels
those if still pending when the timer runs, and keep the addItem warning but
reference the number of captured IDs actually cleared rather than scanning
global state at timeout.
In `@packages/cli/src/ui/hooks/useReactToolScheduler.ts`:
- Around line 203-259: forceCancelStaleToolCalls only mutates UI state
(setToolCallsForDisplay) but doesn't tell the CoreToolScheduler to abort/cancel
the underlying tracked call, so the scheduler can later emit a real completion;
update forceCancelStaleToolCalls to also notify the scheduler for each
non-terminal TrackedToolCall (use the callId/invocation from the
TrackedToolCall/request) by invoking the scheduler's cancel/abort API (or the
existing method that marks a tracked call as cancelled) before writing the
synthesized cancelled TrackedCancelledToolCall and setting
responseSubmittedToGemini; ensure lastForceCancelledCountRef and
setToolCallsForDisplay behavior remain the same so the UI and scheduler stay in
sync and duplicate completion submissions are prevented.
- Around line 209-218: The code is incrementing the counter variable changed for
calls that are already terminal but only need responseSubmittedToGemini toggled,
which overreports “force-cleared” tool calls; update the logic in
useReactToolScheduler so you do not increment changed in the branch that handles
already-terminal tc values (the block checking tc.status === 'success' ||
'error' || 'cancelled' where you set responseSubmittedToGemini), and instead
only increment changed in the actual force-cancel code path where a call’s
status is actively set to 'cancelled' (the real force-cancel transition), and
make the same fix for the analogous code at the other site (lines referenced
around the second block) so changed truly reflects only real force-cancels.
---
Outside diff comments:
In `@packages/cli/src/ui/hooks/useGeminiStream.ts`:
- Around line 1347-1387: The guard is acquired too early so tryStart() can
return null and cause a silent return before you can surface the warning; in
useGeminiStream adjust the logic so you check streamingState (Responding /
WaitingForConfirmation / Backgrounded) and call addItem(...) to show the warning
before attempting queryGuardRef.current.tryStart(), or if you must keep tryStart
first then detect generation === null and streamingState indicates an active
turn and call addItem(...) (using the same stateLabel logic) instead of
returning silently; reference the generation variable,
queryGuardRef.current.tryStart()/end(), the streamingState enum checks, and the
addItem call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: aa347afe-5149-44d3-9747-08153f28c3c8
📒 Files selected for processing (2)
packages/cli/src/ui/hooks/useGeminiStream.tspackages/cli/src/ui/hooks/useReactToolScheduler.ts
| setTimeout(() => { | ||
| const cleared = forceCancelStaleToolCallsRef.current(); | ||
| if (cleared > 0) { | ||
| addItem( | ||
| { | ||
| type: MessageType.WARNING, | ||
| text: `Force-cleared ${cleared} stuck tool call(s) after cancel grace window. The underlying process(es) may still be running in the background.`, | ||
| }, | ||
| Date.now(), | ||
| ); | ||
| } | ||
| }, 3000); |
There was a problem hiding this comment.
Scope the 3s force-clear to the cancelled turn.
This timeout operates on whatever toolCallsRef.current contains when it fires. If the user cancels, immediately starts a new turn, and that new turn launches tools within the next 3 seconds, this callback can force-cancel the new turn's tool calls instead of the stale ones. Capture the cancelled call IDs or prompt ID here and only clear those same calls if they are still pending when the timer runs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/cli/src/ui/hooks/useGeminiStream.ts` around lines 550 - 561, The
timeout in useGeminiStream.ts currently calls
forceCancelStaleToolCallsRef.current() after 3s which acts on the global
toolCallsRef state at fire time and can mistakenly cancel tools from a new turn;
change the pattern so that at cancellation time you capture the specific IDs (or
prompt/turn ID) of the tool calls to be cleared (e.g., snapshot an array of
cancelled IDs) and pass that snapshot into the delayed callback, then update
forceCancelStaleToolCallsRef.current or provide a new function (e.g.,
forceCancelSpecificToolCalls) that accepts the captured IDs and only cancels
those if still pending when the timer runs, and keep the addItem warning but
reference the number of captured IDs actually cleared rather than scanning
global state at timeout.
| // Already terminal: just guarantee the submitted flag so | ||
| // streamingState clears even if handleCompletedTools never ran. | ||
| if ( | ||
| tc.status === 'success' || | ||
| tc.status === 'error' || | ||
| tc.status === 'cancelled' | ||
| ) { | ||
| if (tc.responseSubmittedToGemini) return tc; | ||
| changed++; | ||
| return { ...tc, responseSubmittedToGemini: true }; |
There was a problem hiding this comment.
Only count real force-cancels in the returned value.
changed is incremented for already-terminal calls that merely need responseSubmittedToGemini=true. The caller treats the return value as “stuck tool calls force-cleared”, so this can overstate what was actually force-cancelled and produce a misleading warning.
Also applies to: 255-258
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/cli/src/ui/hooks/useReactToolScheduler.ts` around lines 209 - 218,
The code is incrementing the counter variable changed for calls that are already
terminal but only need responseSubmittedToGemini toggled, which overreports
“force-cleared” tool calls; update the logic in useReactToolScheduler so you do
not increment changed in the branch that handles already-terminal tc values (the
block checking tc.status === 'success' || 'error' || 'cancelled' where you set
responseSubmittedToGemini), and instead only increment changed in the actual
force-cancel code path where a call’s status is actively set to 'cancelled' (the
real force-cancel transition), and make the same fix for the analogous code at
the other site (lines referenced around the second block) so changed truly
reflects only real force-cancels.
…147) CodeQL flagged two new alerts on the background-shell wrapper code landed in #140: - js/polynomial-redos at shell.ts:292 ('&+$' on a trimmed string) - js/polynomial-redos at shell.ts:305 ('\\s*&\\s*$' on a trimmed string) Both are low practical risk (the inputs are bounded model-emitted shell commands) but the alert is blocking the dev → main promotion in PR #141. Swap each for a plain string-op equivalent — same intent, no quantifier-on-quantifier shape for the analyzer to flag. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Fixes the eval-loss bug: background shell tasks now write stdout/stderr to per-task disk files; the model is told the path on tool return and gets a
<task_notification>on the next turn when the task completes. Adds/bgandbg_stop.Merge strategy
Use merge commit per branch-strategy.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Behavior
Tests
Documentation