fix: preserve shell execution status across cuabot and computer-server#1404
fix: preserve shell execution status across cuabot and computer-server#1404zouyonghe wants to merge 1 commit into
Conversation
|
@zouyonghe is attempting to deploy a commit to the Cua Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThe changes introduce a new structured result type ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
libs/cuabot/src/cuabotd.ts (1)
322-340:⚠️ Potential issue | 🟡 MinorBackground path silently masks fast successful exits as live PIDs.
In
bgCmd, afterecho $pid; sleep 1, theif ! kill -0 $pidbranch runswait $pid; code=$?; cat log >&2; exit $code. On a fast clean exit (code 0), the wrapper exits 0,execAsyncresolves, and you returnsuccess: truewith a PID that is no longer alive (and the captured log goes only tostderrof the wrapper, not intoresult.stdout).This is inconsistent with the fast-failure case, which correctly propagates a non-zero exit via
bashResultFromExecError. Consider distinguishing the "already exited" outcome explicitly so the result carries the realexit_codeand avoids advertising a stale PID. For example, have the wrapper print the actual outcome on a second stdout line and parse it on the Node side.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/cuabot/src/cuabotd.ts` around lines 322 - 340, The background-launch wrapper (bgCmd) can report a PID even when the launched process already exited, causing success:true with a stale PID; change the wrapper to always print a second marker line after the PID that indicates the real outcome (e.g. "OUTCOME:EXIT:<code>" or "OUTCOME:RUNNING"), and update the Node parsing logic (the code around execAsync, parseInt(stdout...), and the returned object) to read that second stdout line, set exit_code to the actual exit code when the job already exited, set success=false when exit_code !== 0, and only advertise pid as live when the outcome equals RUNNING; also ensure any captured log contents from the wrapper are included in result.stdout or result.stderr so fast-exit logs aren’t lost (use getContainerName(), scriptPath, scriptId references to find the wrapper and parser).
🧹 Nitpick comments (2)
libs/python/computer-server/tests/test_server.py (2)
43-112: Consider reducing the abstract-method boilerplate.The test correctly validates the new
success/return_codesemantics for non-zero exits. However, defining ~20 stubs to satisfyBaseAutomationHandler's abstract methods is repetitive and will need updating whenever the abstract surface changes. Two lighter alternatives:
- Generate stubs dynamically (one-liner over
__abstractmethods__).- Or, since
run_commandis a concrete method onBaseAutomationHandlerthat doesn't depend on any of the abstract methods, test it via an existing concrete subclass (e.g., the Linux/macOS automation handler used intest_run_command_timeout.py's_MinimalHandler) for consistency with the sibling test file.♻️ Example: dynamic stubbing
- class Handler(BaseAutomationHandler): - async def mouse_down(self, x=None, y=None, button="left"): - pass - - async def mouse_up(self, x=None, y=None, button="left"): - pass - # ... (many more stubs) - async def get_cursor_position(self): - pass + # Build a concrete subclass by stubbing all abstract methods. + stubs = { + name: (lambda self, *a, **kw: None) + for name in BaseAutomationHandler.__abstractmethods__ + } + Handler = type("Handler", (BaseAutomationHandler,), stubs)Note: stubs can be sync since the test only exercises
run_command.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/computer-server/tests/test_server.py` around lines 43 - 112, The test TestRunCommand currently defines ~20 no-op async stubs to satisfy BaseAutomationHandler; instead remove the verbose class and either (a) dynamically create a minimal subclass that clears BaseAutomationHandler.__abstractmethods__ and/or assigns no-op methods for those names before instantiating, or (b) reuse the existing concrete _MinimalHandler from test_run_command_timeout.py (which already implements the minimal surface) and call its run_command; in both cases ensure you instantiate that lightweight handler and call run_command("command -v definitely-not-installed") to assert success is False and return_code != 0.
109-112: Minor: prefer a deterministically failing command.
command -v definitely-not-installedrelies on the POSIXcommandbuiltin being present in the default shell and on the name not resolving — both essentially safe assumptions, but a literalfalseorexit 1is shorter, faster, and cannot be perturbed byPATHcontents.♻️ Proposed tweak
- result = await Handler().run_command("command -v definitely-not-installed") + result = await Handler().run_command("exit 1")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/computer-server/tests/test_server.py` around lines 109 - 112, Update the failing command in the test to a deterministic failure: replace the use of "command -v definitely-not-installed" passed to Handler().run_command with a literal failing command like "false" (or "exit 1") so the test cannot be influenced by PATH or shell builtins; keep the existing assertions on result["success"] and result["return_code"] unchanged and only modify the command string in the test invoking Handler().run_command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/cuabot/src/bashResult.ts`:
- Around line 28-42: Refine timed-out detection in bashResultFromExecError:
compute timedOut by first checking exitCode === 124 or err?.killed === true, and
only if those are false run the heuristic /timed out|timeout expired/i against
String(err?.message || ''); keep stderr fallback to 'Command timed out' only
when timedOut is true and preserve exit_code, success, timed_out and signal
behavior. This change is in the bashResultFromExecError function (variables:
exitCode, signal, timedOut, stdout/stderr, exit_code, timed_out).
In `@libs/cuabot/src/cuabot.tsx`:
- Around line 310-313: The current handling of client.bash() output uses
console.log/console.error which can add extra newlines and coerce values; change
the post-bash output to write raw bytes/strings to the std streams instead—use
process.stdout.write(result.stdout) and process.stderr.write(result.stderr) (or
ensure Buffer values are written directly) before calling
process.exit(exitCodeForBashResult(result)); keep the same conditionals around
result.stdout/result.stderr so you only write when present and preserve exact
bytes/strings without added newlines.
In `@libs/cuabot/src/cuabotd.ts`:
- Around line 343-347: The current mapping of timeout to timeoutSeconds causes
timeout=0 to become an immediate inner timeout; update the logic in the code
that builds the docker exec command (around the variables timeout,
timeoutSeconds, scriptPath and the call to execAsync/getContainerName()) so that
non-positive timeout values short-circuit and do not wrap the command with the
shell "timeout" utility (i.e., run `docker exec <container> <scriptPath>`
directly when timeout <= 0), while positive timeouts continue to use `timeout
${timeoutSeconds}s` (compute timeoutSeconds from timeout/1000) and pass the same
execAsync options; ensure you preserve the existing execAsync timeout option ({
timeout: timeout + 5000 }) behavior for both paths.
---
Outside diff comments:
In `@libs/cuabot/src/cuabotd.ts`:
- Around line 322-340: The background-launch wrapper (bgCmd) can report a PID
even when the launched process already exited, causing success:true with a stale
PID; change the wrapper to always print a second marker line after the PID that
indicates the real outcome (e.g. "OUTCOME:EXIT:<code>" or "OUTCOME:RUNNING"),
and update the Node parsing logic (the code around execAsync,
parseInt(stdout...), and the returned object) to read that second stdout line,
set exit_code to the actual exit code when the job already exited, set
success=false when exit_code !== 0, and only advertise pid as live when the
outcome equals RUNNING; also ensure any captured log contents from the wrapper
are included in result.stdout or result.stderr so fast-exit logs aren’t lost
(use getContainerName(), scriptPath, scriptId references to find the wrapper and
parser).
---
Nitpick comments:
In `@libs/python/computer-server/tests/test_server.py`:
- Around line 43-112: The test TestRunCommand currently defines ~20 no-op async
stubs to satisfy BaseAutomationHandler; instead remove the verbose class and
either (a) dynamically create a minimal subclass that clears
BaseAutomationHandler.__abstractmethods__ and/or assigns no-op methods for those
names before instantiating, or (b) reuse the existing concrete _MinimalHandler
from test_run_command_timeout.py (which already implements the minimal surface)
and call its run_command; in both cases ensure you instantiate that lightweight
handler and call run_command("command -v definitely-not-installed") to assert
success is False and return_code != 0.
- Around line 109-112: Update the failing command in the test to a deterministic
failure: replace the use of "command -v definitely-not-installed" passed to
Handler().run_command with a literal failing command like "false" (or "exit 1")
so the test cannot be influenced by PATH or shell builtins; keep the existing
assertions on result["success"] and result["return_code"] unchanged and only
modify the command string in the test invoking Handler().run_command.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fe68b7ea-43a0-4ea9-a266-d8f045cf298c
📒 Files selected for processing (7)
libs/cuabot/src/bashResult.test.tslibs/cuabot/src/bashResult.tslibs/cuabot/src/client.tslibs/cuabot/src/cuabot.tsxlibs/cuabot/src/cuabotd.tslibs/python/computer-server/computer_server/handlers/base.pylibs/python/computer-server/tests/test_server.py
| export function bashResultFromExecError(err: any): BashResult { | ||
| const exitCode = typeof err?.code === 'number' ? err.code : null; | ||
| const signal = typeof err?.signal === 'string' ? err.signal : undefined; | ||
| const timedOut = | ||
| exitCode === 124 || err?.killed === true || /timed out|timeout expired/i.test(String(err?.message || '')); | ||
|
|
||
| return { | ||
| stdout: err?.stdout || '', | ||
| stderr: err?.stderr || (timedOut ? 'Command timed out' : ''), | ||
| exit_code: timedOut && exitCode === null ? null : exitCode, | ||
| success: false, | ||
| timed_out: timedOut || undefined, | ||
| signal, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Heuristic timeout detection on err.message is fragile but acceptable.
util.promisify(exec) errors typically embed the command and (a slice of) stderr in message, so /timed out|timeout expired/i against err.message can in principle false-positive on user output containing those phrases. The added test case (timeout 10s nope) covers the obvious command-name aliasing, and the code === 124 || killed === true checks carry most of the load, so this is fine — just be aware of the heuristic if you later see weird timed_out: true for non-timeout commands.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/cuabot/src/bashResult.ts` around lines 28 - 42, Refine timed-out
detection in bashResultFromExecError: compute timedOut by first checking
exitCode === 124 or err?.killed === true, and only if those are false run the
heuristic /timed out|timeout expired/i against String(err?.message || ''); keep
stderr fallback to 'Command timed out' only when timedOut is true and preserve
exit_code, success, timed_out and signal behavior. This change is in the
bashResultFromExecError function (variables: exitCode, signal, timedOut,
stdout/stderr, exit_code, timed_out).
| const result = await client.bash(cmd); | ||
| if (result.stdout) console.log(result.stdout); | ||
| if (result.stderr) console.error(result.stderr); | ||
| process.exit(exitCodeForBashResult(result)); |
There was a problem hiding this comment.
Use raw stream writes to preserve exact command output.
console.log/console.error coerce values and append a trailing newline. Since result.stdout/result.stderr typically already end with \n, this double-newlines and breaks fidelity for piped consumers (the very contract this PR is hardening alongside exit codes).
♻️ Proposed fix
const client = await getClient();
const result = await client.bash(cmd);
- if (result.stdout) console.log(result.stdout);
- if (result.stderr) console.error(result.stderr);
+ if (result.stdout) process.stdout.write(result.stdout);
+ if (result.stderr) process.stderr.write(result.stderr);
process.exit(exitCodeForBashResult(result));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const result = await client.bash(cmd); | |
| if (result.stdout) console.log(result.stdout); | |
| if (result.stderr) console.error(result.stderr); | |
| process.exit(exitCodeForBashResult(result)); | |
| const result = await client.bash(cmd); | |
| if (result.stdout) process.stdout.write(result.stdout); | |
| if (result.stderr) process.stderr.write(result.stderr); | |
| process.exit(exitCodeForBashResult(result)); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/cuabot/src/cuabot.tsx` around lines 310 - 313, The current handling of
client.bash() output uses console.log/console.error which can add extra newlines
and coerce values; change the post-bash output to write raw bytes/strings to the
std streams instead—use process.stdout.write(result.stdout) and
process.stderr.write(result.stderr) (or ensure Buffer values are written
directly) before calling process.exit(exitCodeForBashResult(result)); keep the
same conditionals around result.stdout/result.stderr so you only write when
present and preserve exact bytes/strings without added newlines.
| const timeoutSeconds = Math.max(timeout / 1000, 0.001); | ||
| const { stdout, stderr } = await execAsync( | ||
| `docker exec ${getContainerName()} ${scriptPath}`, | ||
| { timeout } | ||
| `docker exec ${getContainerName()} timeout ${timeoutSeconds}s ${scriptPath}`, | ||
| { timeout: timeout + 5000 } | ||
| ); |
There was a problem hiding this comment.
timeout=0 becomes an immediate timeout.
Math.max(timeout / 1000, 0.001) maps timeout: 0 to 0.001s, which fires the inner timeout command almost immediately and surfaces as a timed-out failure. If a caller passes 0 expecting "no timeout" (a common convention), this is a footgun. Either document 0 as "minimum interval" or short-circuit non-positive values to skip the timeout wrapper entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/cuabot/src/cuabotd.ts` around lines 343 - 347, The current mapping of
timeout to timeoutSeconds causes timeout=0 to become an immediate inner timeout;
update the logic in the code that builds the docker exec command (around the
variables timeout, timeoutSeconds, scriptPath and the call to
execAsync/getContainerName()) so that non-positive timeout values short-circuit
and do not wrap the command with the shell "timeout" utility (i.e., run `docker
exec <container> <scriptPath>` directly when timeout <= 0), while positive
timeouts continue to use `timeout ${timeoutSeconds}s` (compute timeoutSeconds
from timeout/1000) and pass the same execAsync options; ensure you preserve the
existing execAsync timeout option ({ timeout: timeout + 5000 }) behavior for
both paths.
|
This is worth treating as a cross-stack shell execution correctness fix, not a cuabot-only cleanup. Right now a command can fail, time out, or exit non-zero while outer tooling still sees success or loses the real execution state. That affects the reliability of any wrapper, agent flow, or automation path built on top of these interfaces. The patch is small, but it closes a contract gap across Would appreciate quick follow-up here. This sits on a critical execution path, and the current behavior makes shell results harder to trust than they should be. |
Summary
This is a shell execution correctness fix, not just a cuabot UX change. Today a command can fail or time out while outer tooling still treats the invocation as success or loses the real exit status.
cuabotd, the client, and the CLI withexit_code,success,timed_out,signal, andpid.cuabot --bashback to the local CLI exit status.computer-serverrun_command()semantics sosuccessreflectsreturn_code == 0, and set a fuller GUI shell environment for container commands.Fixes #1403.
Verification
pnpm exec node --import tsx --test src/bashResult.test.tspnpm run builduv run pytest libs/python/computer-server/tests/test_server.py -quv run ruff check libs/python/computer-server/computer_server/handlers/base.py libs/python/computer-server/tests/test_server.pycuabot --bash "exit 42"exits with42exit_code: 127,success: falseexit_code: 124,success: false,timed_out: trueDISPLAY,USER,LOGNAME,HOME,XDG_RUNTIME_DIR,XAUTHORITYSummary by CodeRabbit
Bug Fixes
Tests