Skip to content

fix: preserve shell execution status across cuabot and computer-server#1404

Open
zouyonghe wants to merge 1 commit into
trycua:mainfrom
zouyonghe:fix/cuabot-shell-contract
Open

fix: preserve shell execution status across cuabot and computer-server#1404
zouyonghe wants to merge 1 commit into
trycua:mainfrom
zouyonghe:fix/cuabot-shell-contract

Conversation

@zouyonghe
Copy link
Copy Markdown

@zouyonghe zouyonghe commented Apr 27, 2026

Summary

This is a shell execution correctness fix, not just a cuabot UX change. Today a command can fail or time out while outer tooling still treats the invocation as success or loses the real exit status.

  • Preserve structured shell execution status across cuabotd, the client, and the CLI with exit_code, success, timed_out, signal, and pid.
  • Propagate non-zero exits and timeouts from cuabot --bash back to the local CLI exit status.
  • Align computer-server run_command() semantics so success reflects return_code == 0, and set a fuller GUI shell environment for container commands.

Fixes #1403.

Verification

  • pnpm exec node --import tsx --test src/bashResult.test.ts
  • pnpm run build
  • uv run pytest libs/python/computer-server/tests/test_server.py -q
  • uv run ruff check libs/python/computer-server/computer_server/handlers/base.py libs/python/computer-server/tests/test_server.py
  • cuabot --bash "exit 42" exits with 42
  • missing command returns exit_code: 127, success: false
  • timeout returns exit_code: 124, success: false, timed_out: true
  • shell env includes DISPLAY, USER, LOGNAME, HOME, XDG_RUNTIME_DIR, XAUTHORITY

Summary by CodeRabbit

  • Bug Fixes

    • Improved bash command execution with proper exit code propagation to the CLI
    • Enhanced timeout detection and signal handling for container-based commands
    • Better error reporting for failed command execution scenarios
  • Tests

    • Added comprehensive test coverage for bash execution results and utilities
    • Added test coverage for command failure handling

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 27, 2026

@zouyonghe is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

The changes introduce a new structured result type (BashResult) for shell command execution that captures exit codes, success status, and timeout metadata, replacing the previous flat result object containing only stdout and stderr. This involves creating new utility functions to build container scripts with proper GUI environment setup, convert execution errors to structured results, and map exit codes appropriately. The implementation propagates through the Node.js client interface, CLI command handler, daemon request processing, and Python handler.

Changes

Cohort / File(s) Summary
Core bash result abstraction
libs/cuabot/src/bashResult.ts, libs/cuabot/src/bashResult.test.ts
Introduces BashResult interface and utilities: buildContainerScript() sets up GUI environment variables (DISPLAY, USER, LOGNAME, HOME, XDG_RUNTIME_DIR, XAUTHORITY); bashResultFromExecError() converts execution errors to structured results with timeout detection; exitCodeForBashResult() maps results to CLI exit codes.
Type propagation
libs/cuabot/src/client.ts
Updates CuaBotClient.bash() return type from inline structure to BashResult interface.
CLI exit code handling
libs/cuabot/src/cuabot.tsx
Refactors --bash command to use exitCodeForBashResult() to compute proper exit status instead of always exiting with 0.
Daemon implementation
libs/cuabot/src/cuabotd.ts
Refactors execInContainer() to return BashResult including success/exit_code fields; replaces inline container script with buildContainerScript(); updates background execution to capture PID and exit code; wraps synchronous execution with timeout command; uses bashResultFromExecError() for error handling.
Python handler
libs/python/computer-server/computer_server/handlers/base.py, libs/python/computer-server/tests/test_server.py
Changes run_command() to compute success from return code instead of always True; adds test case for command failure behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through exit codes with glee,
No more lost signals or timeouts unseen,
Success now speaks truth, from shell to CLI,
With DISPLAY set bright and results pristine,
Status preserved, the way it should be! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed All coding requirements from issue #1403 are met: BashResult interface adds exit_code/success/timed_out/signal/pid fields [bashResult.ts], CLI exit propagation implemented [cuabot.tsx], container script sets full GUI environment with DISPLAY/USER/LOGNAME/HOME/XDG_RUNTIME_DIR/XAUTHORITY [bashResult.ts], and failure detection logic added [cuabotd.ts].
Out of Scope Changes check ✅ Passed All code changes directly address issue #1403 requirements: new BashResult abstraction and helpers, container environment setup, error-to-result mapping, CLI exit status propagation, and Python test validation. No unrelated refactoring detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: preserve shell execution status across cuabot and computer-server' accurately summarizes the main change: restoring and propagating shell execution status (exit codes, success flags, timeouts) across the cuabot client, daemon, and computer-server components.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
libs/cuabot/src/cuabotd.ts (1)

322-340: ⚠️ Potential issue | 🟡 Minor

Background path silently masks fast successful exits as live PIDs.

In bgCmd, after echo $pid; sleep 1, the if ! kill -0 $pid branch runs wait $pid; code=$?; cat log >&2; exit $code. On a fast clean exit (code 0), the wrapper exits 0, execAsync resolves, and you return success: true with a PID that is no longer alive (and the captured log goes only to stderr of the wrapper, not into result.stdout).

This is inconsistent with the fast-failure case, which correctly propagates a non-zero exit via bashResultFromExecError. Consider distinguishing the "already exited" outcome explicitly so the result carries the real exit_code and avoids advertising a stale PID. For example, have the wrapper print the actual outcome on a second stdout line and parse it on the Node side.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/cuabot/src/cuabotd.ts` around lines 322 - 340, The background-launch
wrapper (bgCmd) can report a PID even when the launched process already exited,
causing success:true with a stale PID; change the wrapper to always print a
second marker line after the PID that indicates the real outcome (e.g.
"OUTCOME:EXIT:<code>" or "OUTCOME:RUNNING"), and update the Node parsing logic
(the code around execAsync, parseInt(stdout...), and the returned object) to
read that second stdout line, set exit_code to the actual exit code when the job
already exited, set success=false when exit_code !== 0, and only advertise pid
as live when the outcome equals RUNNING; also ensure any captured log contents
from the wrapper are included in result.stdout or result.stderr so fast-exit
logs aren’t lost (use getContainerName(), scriptPath, scriptId references to
find the wrapper and parser).
🧹 Nitpick comments (2)
libs/python/computer-server/tests/test_server.py (2)

43-112: Consider reducing the abstract-method boilerplate.

The test correctly validates the new success/return_code semantics for non-zero exits. However, defining ~20 stubs to satisfy BaseAutomationHandler's abstract methods is repetitive and will need updating whenever the abstract surface changes. Two lighter alternatives:

  1. Generate stubs dynamically (one-liner over __abstractmethods__).
  2. Or, since run_command is a concrete method on BaseAutomationHandler that doesn't depend on any of the abstract methods, test it via an existing concrete subclass (e.g., the Linux/macOS automation handler used in test_run_command_timeout.py's _MinimalHandler) for consistency with the sibling test file.
♻️ Example: dynamic stubbing
-        class Handler(BaseAutomationHandler):
-            async def mouse_down(self, x=None, y=None, button="left"):
-                pass
-
-            async def mouse_up(self, x=None, y=None, button="left"):
-                pass
-            # ... (many more stubs)
-            async def get_cursor_position(self):
-                pass
+        # Build a concrete subclass by stubbing all abstract methods.
+        stubs = {
+            name: (lambda self, *a, **kw: None)
+            for name in BaseAutomationHandler.__abstractmethods__
+        }
+        Handler = type("Handler", (BaseAutomationHandler,), stubs)

Note: stubs can be sync since the test only exercises run_command.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/computer-server/tests/test_server.py` around lines 43 - 112, The
test TestRunCommand currently defines ~20 no-op async stubs to satisfy
BaseAutomationHandler; instead remove the verbose class and either (a)
dynamically create a minimal subclass that clears
BaseAutomationHandler.__abstractmethods__ and/or assigns no-op methods for those
names before instantiating, or (b) reuse the existing concrete _MinimalHandler
from test_run_command_timeout.py (which already implements the minimal surface)
and call its run_command; in both cases ensure you instantiate that lightweight
handler and call run_command("command -v definitely-not-installed") to assert
success is False and return_code != 0.

109-112: Minor: prefer a deterministically failing command.

command -v definitely-not-installed relies on the POSIX command builtin being present in the default shell and on the name not resolving — both essentially safe assumptions, but a literal false or exit 1 is shorter, faster, and cannot be perturbed by PATH contents.

♻️ Proposed tweak
-        result = await Handler().run_command("command -v definitely-not-installed")
+        result = await Handler().run_command("exit 1")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/computer-server/tests/test_server.py` around lines 109 - 112,
Update the failing command in the test to a deterministic failure: replace the
use of "command -v definitely-not-installed" passed to Handler().run_command
with a literal failing command like "false" (or "exit 1") so the test cannot be
influenced by PATH or shell builtins; keep the existing assertions on
result["success"] and result["return_code"] unchanged and only modify the
command string in the test invoking Handler().run_command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/cuabot/src/bashResult.ts`:
- Around line 28-42: Refine timed-out detection in bashResultFromExecError:
compute timedOut by first checking exitCode === 124 or err?.killed === true, and
only if those are false run the heuristic /timed out|timeout expired/i against
String(err?.message || ''); keep stderr fallback to 'Command timed out' only
when timedOut is true and preserve exit_code, success, timed_out and signal
behavior. This change is in the bashResultFromExecError function (variables:
exitCode, signal, timedOut, stdout/stderr, exit_code, timed_out).

In `@libs/cuabot/src/cuabot.tsx`:
- Around line 310-313: The current handling of client.bash() output uses
console.log/console.error which can add extra newlines and coerce values; change
the post-bash output to write raw bytes/strings to the std streams instead—use
process.stdout.write(result.stdout) and process.stderr.write(result.stderr) (or
ensure Buffer values are written directly) before calling
process.exit(exitCodeForBashResult(result)); keep the same conditionals around
result.stdout/result.stderr so you only write when present and preserve exact
bytes/strings without added newlines.

In `@libs/cuabot/src/cuabotd.ts`:
- Around line 343-347: The current mapping of timeout to timeoutSeconds causes
timeout=0 to become an immediate inner timeout; update the logic in the code
that builds the docker exec command (around the variables timeout,
timeoutSeconds, scriptPath and the call to execAsync/getContainerName()) so that
non-positive timeout values short-circuit and do not wrap the command with the
shell "timeout" utility (i.e., run `docker exec <container> <scriptPath>`
directly when timeout <= 0), while positive timeouts continue to use `timeout
${timeoutSeconds}s` (compute timeoutSeconds from timeout/1000) and pass the same
execAsync options; ensure you preserve the existing execAsync timeout option ({
timeout: timeout + 5000 }) behavior for both paths.

---

Outside diff comments:
In `@libs/cuabot/src/cuabotd.ts`:
- Around line 322-340: The background-launch wrapper (bgCmd) can report a PID
even when the launched process already exited, causing success:true with a stale
PID; change the wrapper to always print a second marker line after the PID that
indicates the real outcome (e.g. "OUTCOME:EXIT:<code>" or "OUTCOME:RUNNING"),
and update the Node parsing logic (the code around execAsync,
parseInt(stdout...), and the returned object) to read that second stdout line,
set exit_code to the actual exit code when the job already exited, set
success=false when exit_code !== 0, and only advertise pid as live when the
outcome equals RUNNING; also ensure any captured log contents from the wrapper
are included in result.stdout or result.stderr so fast-exit logs aren’t lost
(use getContainerName(), scriptPath, scriptId references to find the wrapper and
parser).

---

Nitpick comments:
In `@libs/python/computer-server/tests/test_server.py`:
- Around line 43-112: The test TestRunCommand currently defines ~20 no-op async
stubs to satisfy BaseAutomationHandler; instead remove the verbose class and
either (a) dynamically create a minimal subclass that clears
BaseAutomationHandler.__abstractmethods__ and/or assigns no-op methods for those
names before instantiating, or (b) reuse the existing concrete _MinimalHandler
from test_run_command_timeout.py (which already implements the minimal surface)
and call its run_command; in both cases ensure you instantiate that lightweight
handler and call run_command("command -v definitely-not-installed") to assert
success is False and return_code != 0.
- Around line 109-112: Update the failing command in the test to a deterministic
failure: replace the use of "command -v definitely-not-installed" passed to
Handler().run_command with a literal failing command like "false" (or "exit 1")
so the test cannot be influenced by PATH or shell builtins; keep the existing
assertions on result["success"] and result["return_code"] unchanged and only
modify the command string in the test invoking Handler().run_command.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fe68b7ea-43a0-4ea9-a266-d8f045cf298c

📥 Commits

Reviewing files that changed from the base of the PR and between 46dbcb4 and 563edc5.

📒 Files selected for processing (7)
  • libs/cuabot/src/bashResult.test.ts
  • libs/cuabot/src/bashResult.ts
  • libs/cuabot/src/client.ts
  • libs/cuabot/src/cuabot.tsx
  • libs/cuabot/src/cuabotd.ts
  • libs/python/computer-server/computer_server/handlers/base.py
  • libs/python/computer-server/tests/test_server.py

Comment on lines +28 to +42
export function bashResultFromExecError(err: any): BashResult {
const exitCode = typeof err?.code === 'number' ? err.code : null;
const signal = typeof err?.signal === 'string' ? err.signal : undefined;
const timedOut =
exitCode === 124 || err?.killed === true || /timed out|timeout expired/i.test(String(err?.message || ''));

return {
stdout: err?.stdout || '',
stderr: err?.stderr || (timedOut ? 'Command timed out' : ''),
exit_code: timedOut && exitCode === null ? null : exitCode,
success: false,
timed_out: timedOut || undefined,
signal,
};
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Heuristic timeout detection on err.message is fragile but acceptable.

util.promisify(exec) errors typically embed the command and (a slice of) stderr in message, so /timed out|timeout expired/i against err.message can in principle false-positive on user output containing those phrases. The added test case (timeout 10s nope) covers the obvious command-name aliasing, and the code === 124 || killed === true checks carry most of the load, so this is fine — just be aware of the heuristic if you later see weird timed_out: true for non-timeout commands.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/cuabot/src/bashResult.ts` around lines 28 - 42, Refine timed-out
detection in bashResultFromExecError: compute timedOut by first checking
exitCode === 124 or err?.killed === true, and only if those are false run the
heuristic /timed out|timeout expired/i against String(err?.message || ''); keep
stderr fallback to 'Command timed out' only when timedOut is true and preserve
exit_code, success, timed_out and signal behavior. This change is in the
bashResultFromExecError function (variables: exitCode, signal, timedOut,
stdout/stderr, exit_code, timed_out).

Comment on lines +310 to +313
const result = await client.bash(cmd);
if (result.stdout) console.log(result.stdout);
if (result.stderr) console.error(result.stderr);
process.exit(exitCodeForBashResult(result));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use raw stream writes to preserve exact command output.

console.log/console.error coerce values and append a trailing newline. Since result.stdout/result.stderr typically already end with \n, this double-newlines and breaks fidelity for piped consumers (the very contract this PR is hardening alongside exit codes).

♻️ Proposed fix
         const client = await getClient();
         const result = await client.bash(cmd);
-        if (result.stdout) console.log(result.stdout);
-        if (result.stderr) console.error(result.stderr);
+        if (result.stdout) process.stdout.write(result.stdout);
+        if (result.stderr) process.stderr.write(result.stderr);
         process.exit(exitCodeForBashResult(result));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const result = await client.bash(cmd);
if (result.stdout) console.log(result.stdout);
if (result.stderr) console.error(result.stderr);
process.exit(exitCodeForBashResult(result));
const result = await client.bash(cmd);
if (result.stdout) process.stdout.write(result.stdout);
if (result.stderr) process.stderr.write(result.stderr);
process.exit(exitCodeForBashResult(result));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/cuabot/src/cuabot.tsx` around lines 310 - 313, The current handling of
client.bash() output uses console.log/console.error which can add extra newlines
and coerce values; change the post-bash output to write raw bytes/strings to the
std streams instead—use process.stdout.write(result.stdout) and
process.stderr.write(result.stderr) (or ensure Buffer values are written
directly) before calling process.exit(exitCodeForBashResult(result)); keep the
same conditionals around result.stdout/result.stderr so you only write when
present and preserve exact bytes/strings without added newlines.

Comment on lines +343 to 347
const timeoutSeconds = Math.max(timeout / 1000, 0.001);
const { stdout, stderr } = await execAsync(
`docker exec ${getContainerName()} ${scriptPath}`,
{ timeout }
`docker exec ${getContainerName()} timeout ${timeoutSeconds}s ${scriptPath}`,
{ timeout: timeout + 5000 }
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

timeout=0 becomes an immediate timeout.

Math.max(timeout / 1000, 0.001) maps timeout: 0 to 0.001s, which fires the inner timeout command almost immediately and surfaces as a timed-out failure. If a caller passes 0 expecting "no timeout" (a common convention), this is a footgun. Either document 0 as "minimum interval" or short-circuit non-positive values to skip the timeout wrapper entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/cuabot/src/cuabotd.ts` around lines 343 - 347, The current mapping of
timeout to timeoutSeconds causes timeout=0 to become an immediate inner timeout;
update the logic in the code that builds the docker exec command (around the
variables timeout, timeoutSeconds, scriptPath and the call to
execAsync/getContainerName()) so that non-positive timeout values short-circuit
and do not wrap the command with the shell "timeout" utility (i.e., run `docker
exec <container> <scriptPath>` directly when timeout <= 0), while positive
timeouts continue to use `timeout ${timeoutSeconds}s` (compute timeoutSeconds
from timeout/1000) and pass the same execAsync options; ensure you preserve the
existing execAsync timeout option ({ timeout: timeout + 5000 }) behavior for
both paths.

@zouyonghe
Copy link
Copy Markdown
Author

zouyonghe commented May 7, 2026

This is worth treating as a cross-stack shell execution correctness fix, not a cuabot-only cleanup. Right now a command can fail, time out, or exit non-zero while outer tooling still sees success or loses the real execution state.

That affects the reliability of any wrapper, agent flow, or automation path built on top of these interfaces. The patch is small, but it closes a contract gap across cuabotd, the client and CLI layer, and computer-server, especially around exit-code propagation and success semantics.

Would appreciate quick follow-up here. This sits on a critical execution path, and the current behavior makes shell results harder to trust than they should be.

@zouyonghe zouyonghe changed the title fix: preserve cuabot shell execution status fix: preserve shell execution status across cuabot and computer-server May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cuabot shell execution loses exit status and timeout semantics

1 participant