fix: preserve shell execution status across cuabot and computer-server by zouyonghe · Pull Request #1404 · trycua/cua

zouyonghe · 2026-04-27T07:24:05Z

Summary

This is a shell execution correctness fix, not just a cuabot UX change. Today a command can fail or time out while outer tooling still treats the invocation as success or loses the real exit status.

Preserve structured shell execution status across cuabotd, the client, and the CLI with exit_code, success, timed_out, signal, and pid.
Propagate non-zero exits and timeouts from cuabot --bash back to the local CLI exit status.
Align computer-server run_command() semantics so success reflects return_code == 0, and set a fuller GUI shell environment for container commands.

Fixes #1403.

Verification

pnpm exec node --import tsx --test src/bashResult.test.ts
pnpm run build
uv run pytest libs/python/computer-server/tests/test_server.py -q
uv run ruff check libs/python/computer-server/computer_server/handlers/base.py libs/python/computer-server/tests/test_server.py
cuabot --bash "exit 42" exits with 42
missing command returns exit_code: 127, success: false
timeout returns exit_code: 124, success: false, timed_out: true
shell env includes DISPLAY, USER, LOGNAME, HOME, XDG_RUNTIME_DIR, XAUTHORITY

Summary by CodeRabbit

Bug Fixes
- Improved bash command execution with proper exit code propagation to the CLI
- Enhanced timeout detection and signal handling for container-based commands
- Better error reporting for failed command execution scenarios
Tests
- Added comprehensive test coverage for bash execution results and utilities
- Added test coverage for command failure handling

vercel · 2026-04-27T07:24:11Z

@zouyonghe is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-04-27T07:24:23Z

📝 Walkthrough

Walkthrough

The changes introduce a new structured result type (BashResult) for shell command execution that captures exit codes, success status, and timeout metadata, replacing the previous flat result object containing only stdout and stderr. This involves creating new utility functions to build container scripts with proper GUI environment setup, convert execution errors to structured results, and map exit codes appropriately. The implementation propagates through the Node.js client interface, CLI command handler, daemon request processing, and Python handler.

Changes

Cohort / File(s)	Summary
Core bash result abstraction `libs/cuabot/src/bashResult.ts`, `libs/cuabot/src/bashResult.test.ts`	Introduces `BashResult` interface and utilities: `buildContainerScript()` sets up GUI environment variables (`DISPLAY`, `USER`, `LOGNAME`, `HOME`, `XDG_RUNTIME_DIR`, `XAUTHORITY`); `bashResultFromExecError()` converts execution errors to structured results with timeout detection; `exitCodeForBashResult()` maps results to CLI exit codes.
Type propagation `libs/cuabot/src/client.ts`	Updates `CuaBotClient.bash()` return type from inline structure to `BashResult` interface.
CLI exit code handling `libs/cuabot/src/cuabot.tsx`	Refactors `--bash` command to use `exitCodeForBashResult()` to compute proper exit status instead of always exiting with 0.
Daemon implementation `libs/cuabot/src/cuabotd.ts`	Refactors `execInContainer()` to return `BashResult` including success/exit_code fields; replaces inline container script with `buildContainerScript()`; updates background execution to capture PID and exit code; wraps synchronous execution with `timeout` command; uses `bashResultFromExecError()` for error handling.
Python handler `libs/python/computer-server/computer_server/handlers/base.py`, `libs/python/computer-server/tests/test_server.py`	Changes `run_command()` to compute `success` from return code instead of always True; adds test case for command failure behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through exit codes with glee,
No more lost signals or timeouts unseen,
Success now speaks truth, from shell to CLI,
With DISPLAY set bright and results pristine,
Status preserved, the way it should be! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.34% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	All coding requirements from issue `#1403` are met: BashResult interface adds exit_code/success/timed_out/signal/pid fields [bashResult.ts], CLI exit propagation implemented [cuabot.tsx], container script sets full GUI environment with DISPLAY/USER/LOGNAME/HOME/XDG_RUNTIME_DIR/XAUTHORITY [bashResult.ts], and failure detection logic added [cuabotd.ts].
Out of Scope Changes check	✅ Passed	All code changes directly address issue `#1403` requirements: new BashResult abstraction and helpers, container environment setup, error-to-result mapping, CLI exit status propagation, and Python test validation. No unrelated refactoring detected.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: preserve shell execution status across cuabot and computer-server' accurately summarizes the main change: restoring and propagating shell execution status (exit codes, success flags, timeouts) across the cuabot client, daemon, and computer-server components.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

libs/cuabot/src/cuabotd.ts (1)
322-340: ⚠️ Potential issue | 🟡 Minor

Background path silently masks fast successful exits as live PIDs.

In bgCmd, after echo $pid; sleep 1, the if ! kill -0 $pid branch runs wait $pid; code=$?; cat log >&2; exit $code. On a fast clean exit (code 0), the wrapper exits 0, execAsync resolves, and you return success: true with a PID that is no longer alive (and the captured log goes only to stderr of the wrapper, not into result.stdout).

This is inconsistent with the fast-failure case, which correctly propagates a non-zero exit via bashResultFromExecError. Consider distinguishing the "already exited" outcome explicitly so the result carries the real exit_code and avoids advertising a stale PID. For example, have the wrapper print the actual outcome on a second stdout line and parse it on the Node side.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/cuabot/src/cuabotd.ts` around lines 322 - 340, The background-launch
wrapper (bgCmd) can report a PID even when the launched process already exited,
causing success:true with a stale PID; change the wrapper to always print a
second marker line after the PID that indicates the real outcome (e.g.
"OUTCOME:EXIT:<code>" or "OUTCOME:RUNNING"), and update the Node parsing logic
(the code around execAsync, parseInt(stdout...), and the returned object) to
read that second stdout line, set exit_code to the actual exit code when the job
already exited, set success=false when exit_code !== 0, and only advertise pid
as live when the outcome equals RUNNING; also ensure any captured log contents
from the wrapper are included in result.stdout or result.stderr so fast-exit
logs aren’t lost (use getContainerName(), scriptPath, scriptId references to
find the wrapper and parser).

🧹 Nitpick comments (2)

libs/python/computer-server/tests/test_server.py (2)
43-112: Consider reducing the abstract-method boilerplate.

The test correctly validates the new success/return_code semantics for non-zero exits. However, defining ~20 stubs to satisfy BaseAutomationHandler's abstract methods is repetitive and will need updating whenever the abstract surface changes. Two lighter alternatives:

Generate stubs dynamically (one-liner over __abstractmethods__).

Or, since run_command is a concrete method on BaseAutomationHandler that doesn't depend on any of the abstract methods, test it via an existing concrete subclass (e.g., the Linux/macOS automation handler used in test_run_command_timeout.py's _MinimalHandler) for consistency with the sibling test file.
♻️ Example: dynamic stubbing
-        class Handler(BaseAutomationHandler):
-            async def mouse_down(self, x=None, y=None, button="left"):
-                pass
-
-            async def mouse_up(self, x=None, y=None, button="left"):
-                pass
-            # ... (many more stubs)
-            async def get_cursor_position(self):
-                pass
+        # Build a concrete subclass by stubbing all abstract methods.
+        stubs = {
+            name: (lambda self, *a, **kw: None)
+            for name in BaseAutomationHandler.__abstractmethods__
+        }
+        Handler = type("Handler", (BaseAutomationHandler,), stubs)
Note: stubs can be sync since the test only exercises run_command.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/computer-server/tests/test_server.py` around lines 43 - 112, The
test TestRunCommand currently defines ~20 no-op async stubs to satisfy
BaseAutomationHandler; instead remove the verbose class and either (a)
dynamically create a minimal subclass that clears
BaseAutomationHandler.__abstractmethods__ and/or assigns no-op methods for those
names before instantiating, or (b) reuse the existing concrete _MinimalHandler
from test_run_command_timeout.py (which already implements the minimal surface)
and call its run_command; in both cases ensure you instantiate that lightweight
handler and call run_command("command -v definitely-not-installed") to assert
success is False and return_code != 0.
109-112: Minor: prefer a deterministically failing command.

command -v definitely-not-installed relies on the POSIX command builtin being present in the default shell and on the name not resolving — both essentially safe assumptions, but a literal false or exit 1 is shorter, faster, and cannot be perturbed by PATH contents.
♻️ Proposed tweak
-        result = await Handler().run_command("command -v definitely-not-installed")
+        result = await Handler().run_command("exit 1")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/computer-server/tests/test_server.py` around lines 109 - 112,
Update the failing command in the test to a deterministic failure: replace the
use of "command -v definitely-not-installed" passed to Handler().run_command
with a literal failing command like "false" (or "exit 1") so the test cannot be
influenced by PATH or shell builtins; keep the existing assertions on
result["success"] and result["return_code"] unchanged and only modify the
command string in the test invoking Handler().run_command.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/cuabot/src/bashResult.ts`:
- Around line 28-42: Refine timed-out detection in bashResultFromExecError:
compute timedOut by first checking exitCode === 124 or err?.killed === true, and
only if those are false run the heuristic /timed out|timeout expired/i against
String(err?.message || ''); keep stderr fallback to 'Command timed out' only
when timedOut is true and preserve exit_code, success, timed_out and signal
behavior. This change is in the bashResultFromExecError function (variables:
exitCode, signal, timedOut, stdout/stderr, exit_code, timed_out).

In `@libs/cuabot/src/cuabot.tsx`:
- Around line 310-313: The current handling of client.bash() output uses
console.log/console.error which can add extra newlines and coerce values; change
the post-bash output to write raw bytes/strings to the std streams instead—use
process.stdout.write(result.stdout) and process.stderr.write(result.stderr) (or
ensure Buffer values are written directly) before calling
process.exit(exitCodeForBashResult(result)); keep the same conditionals around
result.stdout/result.stderr so you only write when present and preserve exact
bytes/strings without added newlines.

In `@libs/cuabot/src/cuabotd.ts`:
- Around line 343-347: The current mapping of timeout to timeoutSeconds causes
timeout=0 to become an immediate inner timeout; update the logic in the code
that builds the docker exec command (around the variables timeout,
timeoutSeconds, scriptPath and the call to execAsync/getContainerName()) so that
non-positive timeout values short-circuit and do not wrap the command with the
shell "timeout" utility (i.e., run `docker exec <container> <scriptPath>`
directly when timeout <= 0), while positive timeouts continue to use `timeout
${timeoutSeconds}s` (compute timeoutSeconds from timeout/1000) and pass the same
execAsync options; ensure you preserve the existing execAsync timeout option ({
timeout: timeout + 5000 }) behavior for both paths.

---

Outside diff comments:
In `@libs/cuabot/src/cuabotd.ts`:
- Around line 322-340: The background-launch wrapper (bgCmd) can report a PID
even when the launched process already exited, causing success:true with a stale
PID; change the wrapper to always print a second marker line after the PID that
indicates the real outcome (e.g. "OUTCOME:EXIT:<code>" or "OUTCOME:RUNNING"),
and update the Node parsing logic (the code around execAsync,
parseInt(stdout...), and the returned object) to read that second stdout line,
set exit_code to the actual exit code when the job already exited, set
success=false when exit_code !== 0, and only advertise pid as live when the
outcome equals RUNNING; also ensure any captured log contents from the wrapper
are included in result.stdout or result.stderr so fast-exit logs aren’t lost
(use getContainerName(), scriptPath, scriptId references to find the wrapper and
parser).

---

Nitpick comments:
In `@libs/python/computer-server/tests/test_server.py`:
- Around line 43-112: The test TestRunCommand currently defines ~20 no-op async
stubs to satisfy BaseAutomationHandler; instead remove the verbose class and
either (a) dynamically create a minimal subclass that clears
BaseAutomationHandler.__abstractmethods__ and/or assigns no-op methods for those
names before instantiating, or (b) reuse the existing concrete _MinimalHandler
from test_run_command_timeout.py (which already implements the minimal surface)
and call its run_command; in both cases ensure you instantiate that lightweight
handler and call run_command("command -v definitely-not-installed") to assert
success is False and return_code != 0.
- Around line 109-112: Update the failing command in the test to a deterministic
failure: replace the use of "command -v definitely-not-installed" passed to
Handler().run_command with a literal failing command like "false" (or "exit 1")
so the test cannot be influenced by PATH or shell builtins; keep the existing
assertions on result["success"] and result["return_code"] unchanged and only
modify the command string in the test invoking Handler().run_command.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fe68b7ea-43a0-4ea9-a266-d8f045cf298c

📥 Commits

Reviewing files that changed from the base of the PR and between 46dbcb4 and 563edc5.

📒 Files selected for processing (7)

libs/cuabot/src/bashResult.test.ts
libs/cuabot/src/bashResult.ts
libs/cuabot/src/client.ts
libs/cuabot/src/cuabot.tsx
libs/cuabot/src/cuabotd.ts
libs/python/computer-server/computer_server/handlers/base.py
libs/python/computer-server/tests/test_server.py

coderabbitai · 2026-04-27T07:30:05Z

+export function bashResultFromExecError(err: any): BashResult {
+  const exitCode = typeof err?.code === 'number' ? err.code : null;
+  const signal = typeof err?.signal === 'string' ? err.signal : undefined;
+  const timedOut =
+    exitCode === 124 || err?.killed === true || /timed out|timeout expired/i.test(String(err?.message || ''));
+
+  return {
+    stdout: err?.stdout || '',
+    stderr: err?.stderr || (timedOut ? 'Command timed out' : ''),
+    exit_code: timedOut && exitCode === null ? null : exitCode,
+    success: false,
+    timed_out: timedOut || undefined,
+    signal,
+  };
+}


⚠️ Potential issue | 🟡 Minor

Heuristic timeout detection on err.message is fragile but acceptable.

util.promisify(exec) errors typically embed the command and (a slice of) stderr in message, so /timed out|timeout expired/i against err.message can in principle false-positive on user output containing those phrases. The added test case (timeout 10s nope) covers the obvious command-name aliasing, and the code === 124 || killed === true checks carry most of the load, so this is fine — just be aware of the heuristic if you later see weird timed_out: true for non-timeout commands.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@libs/cuabot/src/bashResult.ts` around lines 28 - 42, Refine timed-out detection in bashResultFromExecError: compute timedOut by first checking exitCode === 124 or err?.killed === true, and only if those are false run the heuristic /timed out|timeout expired/i against String(err?.message || ''); keep stderr fallback to 'Command timed out' only when timedOut is true and preserve exit_code, success, timed_out and signal behavior. This change is in the bashResultFromExecError function (variables: exitCode, signal, timedOut, stdout/stderr, exit_code, timed_out).

coderabbitai · 2026-04-27T07:30:05Z

+        const result = await client.bash(cmd);
+        if (result.stdout) console.log(result.stdout);
+        if (result.stderr) console.error(result.stderr);
+        process.exit(exitCodeForBashResult(result));


⚠️ Potential issue | 🟡 Minor

Use raw stream writes to preserve exact command output.

console.log/console.error coerce values and append a trailing newline. Since result.stdout/result.stderr typically already end with \n, this double-newlines and breaks fidelity for piped consumers (the very contract this PR is hardening alongside exit codes).

♻️ Proposed fix

const client = await getClient(); const result = await client.bash(cmd); - if (result.stdout) console.log(result.stdout); - if (result.stderr) console.error(result.stderr); + if (result.stdout) process.stdout.write(result.stdout); + if (result.stderr) process.stderr.write(result.stderr); process.exit(exitCodeForBashResult(result));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const result = await client.bash(cmd);

if (result.stdout) console.log(result.stdout);

if (result.stderr) console.error(result.stderr);

process.exit(exitCodeForBashResult(result));

const result = await client.bash(cmd);

if (result.stdout) process.stdout.write(result.stdout);

if (result.stderr) process.stderr.write(result.stderr);

process.exit(exitCodeForBashResult(result));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@libs/cuabot/src/cuabot.tsx` around lines 310 - 313, The current handling of client.bash() output uses console.log/console.error which can add extra newlines and coerce values; change the post-bash output to write raw bytes/strings to the std streams instead—use process.stdout.write(result.stdout) and process.stderr.write(result.stderr) (or ensure Buffer values are written directly) before calling process.exit(exitCodeForBashResult(result)); keep the same conditionals around result.stdout/result.stderr so you only write when present and preserve exact bytes/strings without added newlines.

coderabbitai · 2026-04-27T07:30:05Z

+      const timeoutSeconds = Math.max(timeout / 1000, 0.001);
      const { stdout, stderr } = await execAsync(
-        `docker exec ${getContainerName()} ${scriptPath}`,
-        { timeout }
+        `docker exec ${getContainerName()} timeout ${timeoutSeconds}s ${scriptPath}`,
+        { timeout: timeout + 5000 }
      );


⚠️ Potential issue | 🟡 Minor

timeout=0 becomes an immediate timeout.

Math.max(timeout / 1000, 0.001) maps timeout: 0 to 0.001s, which fires the inner timeout command almost immediately and surfaces as a timed-out failure. If a caller passes 0 expecting "no timeout" (a common convention), this is a footgun. Either document 0 as "minimum interval" or short-circuit non-positive values to skip the timeout wrapper entirely.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@libs/cuabot/src/cuabotd.ts` around lines 343 - 347, The current mapping of timeout to timeoutSeconds causes timeout=0 to become an immediate inner timeout; update the logic in the code that builds the docker exec command (around the variables timeout, timeoutSeconds, scriptPath and the call to execAsync/getContainerName()) so that non-positive timeout values short-circuit and do not wrap the command with the shell "timeout" utility (i.e., run `docker exec <container> <scriptPath>` directly when timeout <= 0), while positive timeouts continue to use `timeout ${timeoutSeconds}s` (compute timeoutSeconds from timeout/1000) and pass the same execAsync options; ensure you preserve the existing execAsync timeout option ({ timeout: timeout + 5000 }) behavior for both paths.

zouyonghe · 2026-05-07T09:32:35Z

This is worth treating as a cross-stack shell execution correctness fix, not a cuabot-only cleanup. Right now a command can fail, time out, or exit non-zero while outer tooling still sees success or loses the real execution state.

That affects the reliability of any wrapper, agent flow, or automation path built on top of these interfaces. The patch is small, but it closes a contract gap across cuabotd, the client and CLI layer, and computer-server, especially around exit-code propagation and success semantics.

Would appreciate quick follow-up here. This sits on a critical execution path, and the current behavior makes shell results harder to trust than they should be.

fix: preserve cuabot shell execution status

563edc5

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

zouyonghe changed the title ~~fix: preserve cuabot shell execution status~~ fix: preserve shell execution status across cuabot and computer-server May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: preserve shell execution status across cuabot and computer-server#1404

fix: preserve shell execution status across cuabot and computer-server#1404
zouyonghe wants to merge 1 commit into
trycua:mainfrom
zouyonghe:fix/cuabot-shell-contract

zouyonghe commented Apr 27, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

zouyonghe commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zouyonghe commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Summary by CodeRabbit

Uh oh!

vercel Bot commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

zouyonghe commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zouyonghe commented Apr 27, 2026 •

edited

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

zouyonghe commented May 7, 2026 •

edited

Loading