Add file-based command relay for remote Docker testing#1174
Conversation
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughA file-based command relay: a Docker-hosted server monitors a shared Changes
Sequence DiagramsequenceDiagram
participant Host as Host<br/>(client.sh)
participant FS as Shared<br/>Filesystem
participant Docker as Docker<br/>(server.sh)
rect rgba(100,150,255,0.5)
Note over Host,Docker: Handshake
Docker->>FS: Write server.ready
Host->>FS: Read server.ready
Host->>FS: Write client.ready
Docker->>FS: Read client.ready
Docker->>FS: Write handshake.done
end
rect rgba(100,200,100,0.5)
Note over Host,Docker: Command Execution Cycle
Host->>FS: Write cmd/<id>.sh
Docker->>FS: Detect cmd/<id>.sh
Docker->>Docker: Execute cmd/<id>.sh in workdir
Docker->>FS: Write result/<id>.log (stdout+stderr)
Docker->>FS: Write result/<id>.exit (exit code)
Docker->>FS: Delete cmd/<id>.sh
Host->>FS: Read result/<id>.log and result/<id>.exit
Host->>FS: Delete result files
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
tools/debugger/README.md (1)
20-25: Document the trust boundary for.relay/explicitly.Please add a note that any actor with write access to
.relay/cmd/can execute commands in-container, and recommend restricting mount permissions to trusted users/processes only.Suggested wording
## Assumptions - The ModelOpt repo is accessible from both host and container (e.g., bind-mounted) - **HuggingFace models** are mounted at `/hf-local` - The server auto-detects the repo root from the location of `server.sh` +- The shared `.relay/` path is writable only by trusted users/processes; write access to + `.relay/cmd/` implies command execution capability inside the container.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/debugger/README.md` around lines 20 - 25, Add an explicit trust-boundary note to tools/debugger/README.md stating that any actor with write access to .relay/cmd/ can execute commands inside the container (since server.sh auto-detects the repo root and will run dispatched commands), and advise restricting mount permissions so only trusted users/processes can write to .relay/cmd/ (e.g., use read-only mounts where possible, limit host user ownership/ACLs, and avoid bind-mounting directories from untrusted sources).tools/debugger/client.sh (1)
39-45: Add validation for option values and timeout argument type.Lines 41–42 can fail under
set -uif the option argument is missing, and line 102 assumesTIMEOUTis numeric without validation. Use a validation helper to ensure options receive valid arguments and thatTIMEOUTis an integer before arithmetic comparison.Suggested hardening
+require_value() { + local opt="$1" + local val="${2-}" + if [[ -z "$val" || "$val" == --* ]]; then + echo "ERROR: ${opt} requires a value." + exit 1 + fi +} + while [[ $# -gt 0 ]]; do case "$1" in - --relay-dir) RELAY_DIR="$2"; shift 2 ;; - --timeout) TIMEOUT="$2"; shift 2 ;; + --relay-dir) + require_value "$1" "${2-}" + RELAY_DIR="$2" + shift 2 + ;; + --timeout) + require_value "$1" "${2-}" + [[ "$2" =~ ^[0-9]+$ ]] || { echo "ERROR: --timeout must be an integer."; exit 1; } + TIMEOUT="$2" + shift 2 + ;; *) break ;; esac done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/debugger/client.sh` around lines 39 - 45, The option parsing loop can dereference missing arguments under set -u and TIMEOUT is used as a number later without validation; add a small helper (e.g., require_arg) used by the parsing case to ensure flags like --relay-dir and --timeout have a following value before shifting, and after parsing validate that TIMEOUT is an integer (use a regex/grep check) before any arithmetic comparison (the code that reads/uses TIMEOUT later). Update the case branches that set RELAY_DIR and TIMEOUT to call the helper and fail with a clear message if the argument is absent or TIMEOUT is not numeric.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/debugger/client.sh`:
- Line 95: Replace the direct write to "$CMD_DIR/$cmd_id.sh" with an atomic
publish: create a temp file in the same directory (e.g.,
"$CMD_DIR/$cmd_id.sh.tmp.$$" or via mktemp in CMD_DIR), write the command
contents into that temp file, flush/sync and set executable permissions, then
rename (mv) the temp file to "$CMD_DIR/$cmd_id.sh" so the server never sees a
partially-written file; update the code around the echo "$*" >
"$CMD_DIR/$cmd_id.sh" usage to perform these steps using CMD_DIR and cmd_id.
- Around line 139-142: The flush case calls rm -rf on RELAY_DIR without
validation; before removing, canonicalize RELAY_DIR (e.g., via realpath) and
validate it is non-empty, not "/" and not a top-level system path (reject
prefixes like /, /etc, /bin, /usr, /boot, /sbin, /proc, /sys, /dev), ensure the
resolved path exists and is a subdirectory of an expected safe base (or require
explicit user confirmation if not), and only then perform rm -rf; update the
flush handling in client.sh (the flush case and any code that parses
--relay-dir) to perform these guards.
- Line 135: The command substitution setting pending uses ls and a fallback that
fails under set -euo pipefail; replace it so missing/invalid CMD_DIR yields 0
reliably by either first testing [ -d "$CMD_DIR" ] and setting pending=0 when
absent, or by using find on "$CMD_DIR" (with -maxdepth 1 and -name '*.sh' and
-type f) piped to wc -l while redirecting find errors to /dev/null; update the
assignment that sets pending to use one of these robust approaches (referencing
the pending variable and CMD_DIR).
In `@tools/debugger/server.sh`:
- Around line 103-121: The loop currently iterates over cmd files and deletes
them after running, which can replay on restart; change to atomically claim each
command before running by renaming/moving the file (using mv) to a processing
name (e.g., append .claimed or move into a PROCESS_DIR) and operate on that
claimed path instead of the original $CMD_DIR/$cmd_file; if the mv fails (no-op
or already claimed) skip that file, then run the claimed file from $WORKDIR, tee
output to $RESULT_DIR/$cmd_id.log, write the exit code to
$RESULT_DIR/$cmd_id.exit, and finally remove or archive the claimed file to mark
completion (all references: CMD_DIR, cmd_file, cmd_id, WORKDIR, RESULT_DIR).
- Line 76: The rm -rf "$RELAY_DIR" call is unsafe; add a guard in server.sh that
validates RELAY_DIR (use realpath/resolved path) and refuse to delete if it's
empty, "/", "/", ".", "/", or equals root or other critical system paths, and
ensure it is within an expected base directory (for example the project
workspace or /tmp) or contains an expected substring; if the check fails, print
an error and exit non‑zero. Locate the removal statement referencing RELAY_DIR
and replace it with the preflight validation (resolve path, check not
root/empty/"/" and check prefix or allowed list) before performing the rm -rf.
---
Nitpick comments:
In `@tools/debugger/client.sh`:
- Around line 39-45: The option parsing loop can dereference missing arguments
under set -u and TIMEOUT is used as a number later without validation; add a
small helper (e.g., require_arg) used by the parsing case to ensure flags like
--relay-dir and --timeout have a following value before shifting, and after
parsing validate that TIMEOUT is an integer (use a regex/grep check) before any
arithmetic comparison (the code that reads/uses TIMEOUT later). Update the case
branches that set RELAY_DIR and TIMEOUT to call the helper and fail with a clear
message if the argument is absent or TIMEOUT is not numeric.
In `@tools/debugger/README.md`:
- Around line 20-25: Add an explicit trust-boundary note to
tools/debugger/README.md stating that any actor with write access to .relay/cmd/
can execute commands inside the container (since server.sh auto-detects the repo
root and will run dispatched commands), and advise restricting mount permissions
so only trusted users/processes can write to .relay/cmd/ (e.g., use read-only
mounts where possible, limit host user ownership/ACLs, and avoid bind-mounting
directories from untrusted sources).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 18c84092-667e-4170-a1fa-fdc7a6efef5f
📒 Files selected for processing (5)
tools/debugger/.gitignoretools/debugger/CLAUDE.mdtools/debugger/README.mdtools/debugger/client.shtools/debugger/server.sh
tools/debugger/client.sh
Outdated
| flush) | ||
| if [[ -d "$RELAY_DIR" ]]; then | ||
| rm -rf "$RELAY_DIR" | ||
| echo "Relay directory cleared: $RELAY_DIR" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
rg -n -C2 'flush\)|rm -rf "\$RELAY_DIR"' tools/debugger/client.shRepository: NVIDIA/Model-Optimizer
Length of output: 255
🏁 Script executed:
cat tools/debugger/client.shRepository: NVIDIA/Model-Optimizer
Length of output: 4957
Add validation guards before recursive delete in flush command.
RELAY_DIR is user-controllable via the --relay-dir option and passed directly to rm -rf without validation. A user could run ./client.sh --relay-dir / flush or similar to delete critical system directories. Add safeguards to reject unsafe paths:
Recommended fix
flush)
+ if [[ -z "$RELAY_DIR" || "$RELAY_DIR" == "/" ]]; then
+ echo "ERROR: Refusing to clear unsafe relay dir: '$RELAY_DIR'"
+ exit 1
+ fi
if [[ -d "$RELAY_DIR" ]]; then
- rm -rf "$RELAY_DIR"
+ rm -rf -- "$RELAY_DIR"
echo "Relay directory cleared: $RELAY_DIR"
else
echo "Relay directory does not exist: $RELAY_DIR"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| flush) | |
| if [[ -d "$RELAY_DIR" ]]; then | |
| rm -rf "$RELAY_DIR" | |
| echo "Relay directory cleared: $RELAY_DIR" | |
| flush) | |
| if [[ -z "$RELAY_DIR" || "$RELAY_DIR" == "/" ]]; then | |
| echo "ERROR: Refusing to clear unsafe relay dir: '$RELAY_DIR'" | |
| exit 1 | |
| fi | |
| if [[ -d "$RELAY_DIR" ]]; then | |
| rm -rf -- "$RELAY_DIR" | |
| echo "Relay directory cleared: $RELAY_DIR" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/client.sh` around lines 139 - 142, The flush case calls rm -rf
on RELAY_DIR without validation; before removing, canonicalize RELAY_DIR (e.g.,
via realpath) and validate it is non-empty, not "/" and not a top-level system
path (reject prefixes like /, /etc, /bin, /usr, /boot, /sbin, /proc, /sys,
/dev), ensure the resolved path exists and is a subdirectory of an expected safe
base (or require explicit user confirmation if not), and only then perform rm
-rf; update the flush handling in client.sh (the flush case and any code that
parses --relay-dir) to perform these guards.
| export PYTHONPATH="$WORKDIR" | ||
|
|
||
| # Initialize relay directories | ||
| rm -rf "$RELAY_DIR" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
rg -n -C2 'rm -rf "\$RELAY_DIR"|--relay-dir' tools/debugger/server.shRepository: NVIDIA/Model-Optimizer
Length of output: 815
🏁 Script executed:
cat -n tools/debugger/server.shRepository: NVIDIA/Model-Optimizer
Length of output: 4816
Guard startup cleanup against unsafe relay paths.
Line 76 executes rm -rf on a configurable path without validating it first. An attacker or misconfigured invocation could pass --relay-dir / or another critical path, causing data loss. Add explicit safety checks before deletion.
Suggested guard
-# Initialize relay directories
-rm -rf "$RELAY_DIR"
+# Initialize relay directories
+if [[ -z "$RELAY_DIR" || "$RELAY_DIR" == "/" ]]; then
+ echo "[server] ERROR: refusing unsafe relay dir: '$RELAY_DIR'"
+ exit 1
+fi
+rm -rf -- "$RELAY_DIR"
mkdir -p "$CMD_DIR" "$RESULT_DIR"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/server.sh` at line 76, The rm -rf "$RELAY_DIR" call is unsafe;
add a guard in server.sh that validates RELAY_DIR (use realpath/resolved path)
and refuse to delete if it's empty, "/", "/", ".", "/", or equals root or other
critical system paths, and ensure it is within an expected base directory (for
example the project workspace or /tmp) or contains an expected substring; if the
check fails, print an error and exit non‑zero. Locate the removal statement
referencing RELAY_DIR and replace it with the preflight validation (resolve
path, check not root/empty/"/" and check prefix or allowed list) before
performing the rm -rf.
There was a problem hiding this comment.
Same comment from Claude
- rm -rf "$RELAY_DIR" on server startup is risky (server.sh:91)
If RELAY_DIR is accidentally set to an empty string or /, this would be destructive. Add a guard:
[[ -n "$RELAY_DIR" && "$RELAY_DIR" != "/" ]] || { echo "Invalid relay dir"; exit 1; }
There was a problem hiding this comment.
@chadvoegele, agreed — glad Claude flagged this too. The guard can be made slightly more robust by combining both suggestions, also handling edge cases like . and paths that don't start with / when the user likely intends an absolute path:
# Initialize relay directories
if [[ -z "$RELAY_DIR" || "$RELAY_DIR" == "/" || "$RELAY_DIR" == "." ]]; then
echo "[server] ERROR: refusing unsafe relay dir: '${RELAY_DIR}'"
exit 1
fi
rm -rf -- "$RELAY_DIR"
mkdir -p "$CMD_DIR" "$RESULT_DIR"The -- after rm -rf is also good practice to prevent argument injection if RELAY_DIR somehow starts with a -. The client.sh flush subcommand similarly removes cmd/ and result/ directories — worth verifying it has equivalent guards if --relay-dir is user-supplied there as well.
| while true; do | ||
| for cmd_file in "$CMD_DIR"/*.sh; do | ||
| [[ -e "$cmd_file" ]] || continue | ||
|
|
||
| cmd_id="$(basename "$cmd_file" .sh)" | ||
| echo "[server] Executing command $cmd_id..." | ||
|
|
||
| # Execute the command, tee stdout+stderr to console and result file | ||
| set +e | ||
| (cd "$WORKDIR" && bash "$cmd_file" 2>&1) | tee "$RESULT_DIR/$cmd_id.log" | ||
| exit_code=${PIPESTATUS[0]} | ||
| set -e | ||
|
|
||
| # Write exit code | ||
| echo "$exit_code" > "$RESULT_DIR/$cmd_id.exit" | ||
|
|
||
| # Remove the command file to mark it as processed | ||
| rm -f "$cmd_file" | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
rg -n -C3 'for cmd_file in "\$CMD_DIR"/\*\.sh|bash "\$cmd_file"|rm -f "\$cmd_file"' tools/debugger/server.shRepository: NVIDIA/Model-Optimizer
Length of output: 776
Claim command files before execution to prevent replay on server restart.
The current flow executes commands, then deletes them afterward. If the server crashes between execution and deletion, the same command file will be re-executed on restart.
Use atomic claiming with mv before execution to prevent this race condition:
Atomic claim-before-run pattern
while true; do
for cmd_file in "$CMD_DIR"/*.sh; do
[[ -e "$cmd_file" ]] || continue
cmd_id="$(basename "$cmd_file" .sh)"
+ claimed_file="$CMD_DIR/$cmd_id.running.$$"
+ if ! mv "$cmd_file" "$claimed_file" 2>/dev/null; then
+ continue
+ fi
echo "[server] Executing command $cmd_id..."
# Execute the command, tee stdout+stderr to console and result file
set +e
- (cd "$WORKDIR" && bash "$cmd_file" 2>&1) | tee "$RESULT_DIR/$cmd_id.log"
+ (cd "$WORKDIR" && bash "$claimed_file" 2>&1) | tee "$RESULT_DIR/$cmd_id.log"
exit_code=${PIPESTATUS[0]}
set -e
@@
- rm -f "$cmd_file"
+ rm -f "$claimed_file"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while true; do | |
| for cmd_file in "$CMD_DIR"/*.sh; do | |
| [[ -e "$cmd_file" ]] || continue | |
| cmd_id="$(basename "$cmd_file" .sh)" | |
| echo "[server] Executing command $cmd_id..." | |
| # Execute the command, tee stdout+stderr to console and result file | |
| set +e | |
| (cd "$WORKDIR" && bash "$cmd_file" 2>&1) | tee "$RESULT_DIR/$cmd_id.log" | |
| exit_code=${PIPESTATUS[0]} | |
| set -e | |
| # Write exit code | |
| echo "$exit_code" > "$RESULT_DIR/$cmd_id.exit" | |
| # Remove the command file to mark it as processed | |
| rm -f "$cmd_file" | |
| while true; do | |
| for cmd_file in "$CMD_DIR"/*.sh; do | |
| [[ -e "$cmd_file" ]] || continue | |
| cmd_id="$(basename "$cmd_file" .sh)" | |
| claimed_file="$CMD_DIR/$cmd_id.running.$$" | |
| if ! mv "$cmd_file" "$claimed_file" 2>/dev/null; then | |
| continue | |
| fi | |
| echo "[server] Executing command $cmd_id..." | |
| # Execute the command, tee stdout+stderr to console and result file | |
| set +e | |
| (cd "$WORKDIR" && bash "$claimed_file" 2>&1) | tee "$RESULT_DIR/$cmd_id.log" | |
| exit_code=${PIPESTATUS[0]} | |
| set -e | |
| # Write exit code | |
| echo "$exit_code" > "$RESULT_DIR/$cmd_id.exit" | |
| # Remove the command file to mark it as processed | |
| rm -f "$claimed_file" | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/server.sh` around lines 103 - 121, The loop currently iterates
over cmd files and deletes them after running, which can replay on restart;
change to atomically claim each command before running by renaming/moving the
file (using mv) to a processing name (e.g., append .claimed or move into a
PROCESS_DIR) and operate on that claimed path instead of the original
$CMD_DIR/$cmd_file; if the mv fails (no-op or already claimed) skip that file,
then run the claimed file from $WORKDIR, tee output to $RESULT_DIR/$cmd_id.log,
write the exit code to $RESULT_DIR/$cmd_id.exit, and finally remove or archive
the claimed file to mark completion (all references: CMD_DIR, cmd_file, cmd_id,
WORKDIR, RESULT_DIR).
|
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
tools/debugger/client.sh (2)
142-143:⚠️ Potential issue | 🟡 MinorUse
findfor pending count robustness (same issue as prior review).Line 143 still relies on
lsfor counting*.shfiles. This is brittle with shell/glob edge cases;findis safer and aligns with SC2012.#!/usr/bin/env bash set -euo pipefail tmp="$(mktemp -d)" touch "$tmp/a.sh" "$tmp/b.sh" echo "Current pattern:" bash -lc "set -euo pipefail; CMD_DIR='$tmp'; pending=\$(ls \"\$CMD_DIR\"/*.sh 2>/dev/null | wc -l); echo pending=\$pending" echo "Recommended pattern:" bash -lc "set -euo pipefail; CMD_DIR='$tmp'; pending=\$(find \"\$CMD_DIR\" -maxdepth 1 -type f -name '*.sh' | wc -l); echo pending=\$pending" rm -rf "$tmp"Suggested change
if [[ -d "$CMD_DIR" ]]; then - pending=$(ls "$CMD_DIR"/*.sh 2>/dev/null | wc -l) + pending=$(find "$CMD_DIR" -maxdepth 1 -type f -name '*.sh' | wc -l) else pending=0 fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/debugger/client.sh` around lines 142 - 143, Replace the fragile ls-based count of shell scripts with a find-based count: in the block that checks [[ -d "$CMD_DIR" ]], update the computation of the pending variable (the pending= assignment) to use find on $CMD_DIR with -maxdepth 1 -type f -name '*.sh' and pipe to wc -l (suppressing errors) so it robustly counts files and satisfies SC2012; keep the existing conditional around CMD_DIR and preserve error redirection to /dev/null.
150-153:⚠️ Potential issue | 🔴 CriticalGuard
rm -rfagainst unsafe relay paths (still unresolved).Line 152 deletes a user-provided path without safety checks.
--relay-dir /(or similarly dangerous paths) can cause destructive deletion.Suggested safety checks
flush) if [[ -d "$RELAY_DIR" ]]; then - rm -rf "$RELAY_DIR" + resolved="$(realpath -m "$RELAY_DIR")" + case "$resolved" in + "/"|"/bin"|"/boot"|"/dev"|"/etc"|"/lib"|"/lib64"|"/proc"|"/root"|"/run"|"/sbin"|"/sys"|"/usr"|"/var") + echo "ERROR: Refusing to clear unsafe relay dir: $resolved" + exit 1 + ;; + esac + rm -rf -- "$resolved" echo "Relay directory cleared: $RELAY_DIR" else echo "Relay directory does not exist: $RELAY_DIR"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/debugger/client.sh` around lines 150 - 153, The flush case currently runs rm -rf "$RELAY_DIR" unsafely; modify the flush) branch to validate RELAY_DIR before deletion: ensure RELAY_DIR is non-empty, resolve it with realpath (or readlink -f) into a variable (e.g., resolved_relay_dir), reject if it equals "/" or "." or "/" + "" or contains ".." or is shorter than a minimal safe prefix, and (preferably) enforce it is under an allowed base directory (or match a whitelist/regex); only after passing these checks prompt the user for confirmation and then run rm -rf "$resolved_relay_dir". Reference the flush) case and the RELAY_DIR usage to locate where to add the validation and confirmation logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/debugger/client.sh`:
- Around line 84-96: The run command handler currently allows empty invocations
and writes an empty script; add an early guard in the run case (before
generating cmd_id and writing files) that checks for no arguments (e.g., test
"$#" -eq 0 or test -z "$*") and prints a usage/error like "ERROR: No command
provided. Usage: client.sh run <command>" then exit 1; keep the rest of the
logic that creates cmd_id, writes to "$CMD_DIR/$cmd_id.sh.tmp" and moves it
intact.
---
Duplicate comments:
In `@tools/debugger/client.sh`:
- Around line 142-143: Replace the fragile ls-based count of shell scripts with
a find-based count: in the block that checks [[ -d "$CMD_DIR" ]], update the
computation of the pending variable (the pending= assignment) to use find on
$CMD_DIR with -maxdepth 1 -type f -name '*.sh' and pipe to wc -l (suppressing
errors) so it robustly counts files and satisfies SC2012; keep the existing
conditional around CMD_DIR and preserve error redirection to /dev/null.
- Around line 150-153: The flush case currently runs rm -rf "$RELAY_DIR"
unsafely; modify the flush) branch to validate RELAY_DIR before deletion: ensure
RELAY_DIR is non-empty, resolve it with realpath (or readlink -f) into a
variable (e.g., resolved_relay_dir), reject if it equals "/" or "." or "/" + ""
or contains ".." or is shorter than a minimal safe prefix, and (preferably)
enforce it is under an allowed base directory (or match a whitelist/regex); only
after passing these checks prompt the user for confirmation and then run rm -rf
"$resolved_relay_dir". Reference the flush) case and the RELAY_DIR usage to
locate where to add the validation and confirmation logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fee62419-0770-464f-b29c-6aacef9c8efa
📒 Files selected for processing (2)
tools/debugger/client.shtools/debugger/server.sh
✅ Files skipped from review due to trivial changes (1)
- tools/debugger/server.sh
| while [[ $# -gt 0 ]]; do | ||
| case "$1" in | ||
| --relay-dir) RELAY_DIR="$2"; shift 2 ;; | ||
| --timeout) TIMEOUT="$2"; shift 2 ;; | ||
| *) break ;; |
There was a problem hiding this comment.
Validate option values before use.
Line 41 and Line 42 assume $2 exists, and Line 109 assumes TIMEOUT is numeric. client.sh --timeout foo or missing option values can fail with shell errors instead of a clear user-facing message.
Suggested hardening
while [[ $# -gt 0 ]]; do
case "$1" in
- --relay-dir) RELAY_DIR="$2"; shift 2 ;;
- --timeout) TIMEOUT="$2"; shift 2 ;;
+ --relay-dir)
+ [[ $# -ge 2 ]] || { echo "ERROR: --relay-dir requires a value."; exit 1; }
+ RELAY_DIR="$2"; shift 2 ;;
+ --timeout)
+ [[ $# -ge 2 ]] || { echo "ERROR: --timeout requires a value."; exit 1; }
+ TIMEOUT="$2"; shift 2 ;;
*) break ;;
esac
done
+
+[[ "$TIMEOUT" =~ ^[0-9]+$ && "$TIMEOUT" -gt 0 ]] || {
+ echo "ERROR: --timeout must be a positive integer."
+ exit 1
+}Also applies to: 109-109
| run) | ||
| # Verify handshake was done | ||
| if [[ ! -f "$RELAY_DIR/handshake.done" ]]; then | ||
| echo "ERROR: Not connected. Run 'client.sh handshake' first." | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Generate a unique command ID (timestamp + PID to avoid collisions) | ||
| cmd_id="$(date +%s%N)_$$" | ||
|
|
||
| # Write the command file atomically (tmp + mv) | ||
| echo "$*" > "$CMD_DIR/$cmd_id.sh.tmp" | ||
| mv "$CMD_DIR/$cmd_id.sh.tmp" "$CMD_DIR/$cmd_id.sh" |
There was a problem hiding this comment.
Reject empty run commands early.
Line 84 onward does not validate command presence. run without args currently writes an empty script. Return a usage error when no command is provided.
Suggested guard
run)
+ if [[ $# -eq 0 ]]; then
+ echo "ERROR: Missing command. Usage: $0 run <command...>"
+ exit 1
+ fi
# Verify handshake was done
if [[ ! -f "$RELAY_DIR/handshake.done" ]]; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/client.sh` around lines 84 - 96, The run command handler
currently allows empty invocations and writes an empty script; add an early
guard in the run case (before generating cmd_id and writing files) that checks
for no arguments (e.g., test "$#" -eq 0 or test -z "$*") and prints a
usage/error like "ERROR: No command provided. Usage: client.sh run <command>"
then exit 1; keep the rest of the logic that creates cmd_id, writes to
"$CMD_DIR/$cmd_id.sh.tmp" and moves it intact.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1174 +/- ##
==========================================
- Coverage 74.27% 74.07% -0.20%
==========================================
Files 349 349
Lines 39846 40277 +431
==========================================
+ Hits 29594 29835 +241
- Misses 10252 10442 +190
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ChenhanYu
left a comment
There was a problem hiding this comment.
PR Review: Add file-based command relay for remote Docker testing
Summary
Clean, well-documented developer tool. File-based relay protocol is simple and effective — no networking, atomic file writes for result signaling, good handshake/status/flush subcommands. Documentation (README + CLAUDE.md) is thorough. Tested with a real PTQ workload. A few items below.
Comments
1. sync with file argument is not portable (server.sh:126)
On Linux, sync doesn't accept file arguments (that's macOS/BSD). This silently syncs all filesystems, which is slow and not the intent. Since you already use atomic tmp+mv for .exit and the client only reads .log after .exit exists, the ordering is correct without this line. Safe to remove.
2. Client timeout doesn't cancel server-side execution (client.sh:113-116)
Removing the .sh file only prevents future execution. If the server already picked up the command, it continues running with no way to cancel. Consider writing a $cmd_id.cancel file that the server checks, or documenting this as a known limitation.
3. pip install -e .[dev] on every server start (server.sh:98-101)
Runs on every server start even when the environment is already set up. For a large project this can take 30+ seconds. Consider a --skip-install flag or checking if modelopt is already installed.
4. kill -- -$$ may affect unrelated processes (server.sh:79)
Sends SIGTERM to the entire process group. If the server was launched from an interactive shell (common in Docker), $$ is the shell's process group, which could include other processes. Safer to track child PIDs explicitly or use pkill -P $$.
5. Copyright year is 2024
New files added in 2026 have Copyright (c) 2024. Should be 2026.
6. Minor: ls output parsed for counting (client.sh:143)
ls "$CMD_DIR"/*.sh 2>/dev/null | wc -l — fragile if filenames contain newlines. Safer: files=("$CMD_DIR"/*.sh); pending=${#files[@]}.
Overall: clean dev tool, no blockers. The sync portability issue and forced pip install are the most actionable.
This is an AI-assisted review — human sign-off required before merging.
| export PYTHONPATH="$WORKDIR" | ||
|
|
||
| # Initialize relay directories | ||
| rm -rf "$RELAY_DIR" |
There was a problem hiding this comment.
Same comment from Claude
- rm -rf "$RELAY_DIR" on server startup is risky (server.sh:91)
If RELAY_DIR is accidentally set to an empty string or /, this would be destructive. Add a guard:
[[ -n "$RELAY_DIR" && "$RELAY_DIR" != "/" ]] || { echo "Invalid relay dir"; exit 1; }
| | Path | Description | | ||
| |------|-------------| | ||
| | Repo root (auto-detected) | ModelOpt source, used as workdir | | ||
| | `/hf-local` | HuggingFace model cache | |
There was a problem hiding this comment.
Just the file-based relay doesn't have anything to do with hf-local , right?
Further I think if you rooted the whole thing under '${model-opt}/tools/debugger', then you could remove the model opt references too and this could be purely a portable file-based relay.
There was a problem hiding this comment.
Just the file-based relay doesn't have anything to do with hf-local , right? Yes
There was a problem hiding this comment.
Sure. This is to provide the info so that claude knows where to modify code regards to modelopt during the debugging process.
ChenhanYu
left a comment
There was a problem hiding this comment.
All review feedback addressed cleanly. sync removed, copyright updated, ls → find, kill -- -$$ → pkill -P $$, pip install now skips when already editable-installed. Nice additions: re-handshake support and smarter flush that preserves server.ready. LGTM.
Summary
tools/debugger/) that enables Claude Code (or any host-side automation) to execute commands inside a remote Docker container using only a shared filesystem — no networking setup required.pip install -e .[dev]), setsPYTHONPATH, and listens for commands.handshake,run,status, andflushsubcommands.README.md(full protocol docs) andCLAUDE.md(quick reference for Claude Code).Tested
nvfp4_experts_onlyquantization via the relay:Test plan
server.shinside a Docker container with the repo mountedclient.sh handshakefrom the hostclient.sh run "echo hello"and verify outputclient.sh flushand verify.relay/is cleared🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores