Add job cancellation support to the debugger command relay#1262
Add job cancellation support to the debugger command relay#1262
Conversation
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
📝 WalkthroughWalkthroughAdds a debug skill doc and extends the file-based debugger with client-side command logging and a cancel subcommand, plus server-side command execution rework, cancellation handling, atomic markers, and stricter modelopt validation and install failure behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Host Client
participant FS as Relay Dir (shared FS)
participant Server as Container Server
participant Cmd as Command Process
Client->>FS: write .relay/command (cmd_id, cmd_content)
Client->>FS: optionally write handshake/status files
FS->>Server: server polls for command file
Server->>FS: atomically read+remove command file -> write .relay/running (cmd_id:pid)
Server->>Cmd: start bash -c "cmd_content" (background), log -> logfile
Server->>FS: tail logfile -> stream output to Client (via files)
Client->>FS: (optional) write .relay/cancel with target cmd_id
FS->>Server: server detects .relay/cancel, reads target
alt target matches running cmd_id
Server->>Cmd: send SIGTERM -> wait -> send SIGKILL if needed
Server->>logfile: append "[cancelled]"
Server->>FS: write cmd_id.exit (130), remove .relay/running and .relay/cancel
else target mismatch
Server->>FS: clear .relay/cancel and continue
end
Server->>FS: on normal completion write cmd_id.exit (exit_code), remove .relay/running
Client->>FS: read cmd_id.exit and logs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/debugger/client.sh`:
- Line 94: The current echo "[client] Running: $*" logs the entire command
(which may contain secrets); change it to log only the command identifier
(cmd_id) by default and gate full command text behind an explicit debug flag
(e.g., DEBUG or DEBUG_CLIENT). Update the client.sh location where echo
"[client] Running: $*" appears to: log "[client] Running cmd_id=<cmd_id>" by
default, and add a guarded conditional that, when DEBUG (or DEBUG_CLIENT) is set
(true/1), emits the full command string (the previous "$*") to the logs; ensure
any added flag checks are robust for both "1"/"true" values and reference the
existing cmd_id variable name so reviewers can find the change easily.
In `@tools/debugger/server.sh`:
- Around line 144-145: The current code reads the full command into cmd_content
and prints it, which may leak secrets; change the logging so it does not echo
the full payload—stop using cmd_content in the server log and instead log only
the command identifier (cmd_id) or a redacted/hashed form; update the block that
uses cmd_file, cmd_content and the echo "[server] Executing command $cmd_id:
$cmd_content" to either omit cmd_content entirely or replace it with a safe
redaction/hashing of the content before logging while still preserving cmd_id
for traceability.
- Around line 96-99: The path check using actual.startswith(expected) is
boundary-unsafe and should be replaced with a directory-boundary check using a
canonical commonpath approach (e.g., call Python's os.path.commonpath on real
paths) to ensure actual is inside expected; locate the check referencing
actual.startswith(expected) and change it to compute canonical paths (realpath)
and compare os.path.commonpath([expected, actual]) == expected. Also stop
echoing full command contents from cmd_file into logs: find the variables
cmd_file and cmd_content and replace the logging line that prints "[server]
Executing command $cmd_id: $cmd_content" with a safer log that only includes the
cmd_id and a non-sensitive summary (e.g., truncated length or a redacted
placeholder), or validate/sanitize cmd_content before logging so secrets/API
keys are never emitted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9b5c9d28-466f-46b9-aa50-b44b975a4a2d
📒 Files selected for processing (3)
.claude/skills/debug/SKILL.mdtools/debugger/client.shtools/debugger/server.sh
| # Generate a unique command ID (timestamp + PID to avoid collisions) | ||
| cmd_id="$(date +%s%N)_$$" | ||
|
|
||
| echo "[client] Running: $*" |
There was a problem hiding this comment.
Avoid logging raw command text by default.
Line 94 prints the full command, which can expose inline credentials/tokens in shell/CI logs. Log only cmd_id by default and gate full command logging behind an explicit debug flag.
🔧 Proposed change
- echo "[client] Running: $*"
+ if [[ "${DEBUGGER_LOG_COMMANDS:-0}" == "1" ]]; then
+ echo "[client] Running command id=$cmd_id: $*"
+ else
+ echo "[client] Running command id=$cmd_id"
+ fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/client.sh` at line 94, The current echo "[client] Running: $*"
logs the entire command (which may contain secrets); change it to log only the
command identifier (cmd_id) by default and gate full command text behind an
explicit debug flag (e.g., DEBUG or DEBUG_CLIENT). Update the client.sh location
where echo "[client] Running: $*" appears to: log "[client] Running
cmd_id=<cmd_id>" by default, and add a guarded conditional that, when DEBUG (or
DEBUG_CLIENT) is set (true/1), emits the full command string (the previous "$*")
to the logs; ensure any added flag checks are robust for both "1"/"true" values
and reference the existing cmd_id variable name so reviewers can find the change
easily.
| if not actual.startswith(expected): | ||
| print(f'modelopt loaded from {actual}, expected under {expected}', file=sys.stderr) | ||
| sys.exit(1) | ||
| " 2>&1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the prefix false-positive behavior vs boundary-safe check.
python - <<'PY'
import os
actual = "/workspace/repo2/modelopt"
expected = "/workspace/repo"
print("startswith:", actual.startswith(expected)) # incorrect acceptance
print("commonpath:", os.path.commonpath([actual, expected]) == expected) # correct rejection
PYRepository: NVIDIA/Model-Optimizer
Length of output: 100
🏁 Script executed:
# Examine the actual file to verify the code at lines 90-100 and 144-145
head -150 tools/debugger/server.sh | tail -70Repository: NVIDIA/Model-Optimizer
Length of output: 2591
Use boundary-safe path validation and avoid logging sensitive command content.
Line 96 uses actual.startswith(expected), which can incorrectly accept sibling paths (e.g., /workspace/repo2 vs /workspace/repo). Use os.path.commonpath for a true directory-boundary check.
Additionally, line 144 logs the full command content without filtering:
cmd_content=$(cat "$cmd_file")
echo "[server] Executing command $cmd_id: $cmd_content"This exposes any secrets, API keys, or credentials present in command files to server logs, violating the guideline that sensitive information must not be hardcoded or logged.
🔧 Proposed changes
For path validation (lines 96-99):
check_modelopt_local() {
- python -c "
+ WORKDIR_ENV=\"$WORKDIR\" python -c "
import modelopt, os, sys
actual = os.path.realpath(modelopt.__path__[0])
-expected = os.path.realpath('$WORKDIR')
-if not actual.startswith(expected):
+expected = os.path.realpath(os.environ['WORKDIR_ENV'])
+if os.path.commonpath([actual, expected]) != expected:
print(f'modelopt loaded from {actual}, expected under {expected}', file=sys.stderr)
sys.exit(1)
" 2>&1
}For command logging (lines 144-145), avoid logging the full command content:
-echo "[server] Executing command $cmd_id: $cmd_content"
+echo "[server] Executing command $cmd_id"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/server.sh` around lines 96 - 99, The path check using
actual.startswith(expected) is boundary-unsafe and should be replaced with a
directory-boundary check using a canonical commonpath approach (e.g., call
Python's os.path.commonpath on real paths) to ensure actual is inside expected;
locate the check referencing actual.startswith(expected) and change it to
compute canonical paths (realpath) and compare os.path.commonpath([expected,
actual]) == expected. Also stop echoing full command contents from cmd_file into
logs: find the variables cmd_file and cmd_content and replace the logging line
that prints "[server] Executing command $cmd_id: $cmd_content" with a safer log
that only includes the cmd_id and a non-sensitive summary (e.g., truncated
length or a redacted placeholder), or validate/sanitize cmd_content before
logging so secrets/API keys are never emitted.
tools/debugger/server.sh
Outdated
| cmd_content=$(cat "$cmd_file") | ||
| echo "[server] Executing command $cmd_id: $cmd_content" |
There was a problem hiding this comment.
Do not log full command payloads in server output.
Lines 144-145 read and print full command content, which can leak credentials/tokens into persisted logs. Prefer logging only command ID (or a redacted form) by default.
🔧 Proposed change
- cmd_content=$(cat "$cmd_file")
- echo "[server] Executing command $cmd_id: $cmd_content"
+ echo "[server] Executing command $cmd_id"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cmd_content=$(cat "$cmd_file") | |
| echo "[server] Executing command $cmd_id: $cmd_content" | |
| echo "[server] Executing command $cmd_id" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/server.sh` around lines 144 - 145, The current code reads the
full command into cmd_content and prints it, which may leak secrets; change the
logging so it does not echo the full payload—stop using cmd_content in the
server log and instead log only the command identifier (cmd_id) or a
redacted/hashed form; update the block that uses cmd_file, cmd_content and the
echo "[server] Executing command $cmd_id: $cmd_content" to either omit
cmd_content entirely or replace it with a safe redaction/hashing of the content
before logging while still preserving cmd_id for traceability.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1262 +/- ##
=======================================
Coverage 76.90% 76.90%
=======================================
Files 350 350
Lines 40524 40524
=======================================
Hits 31166 31166
Misses 9358 9358
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
tools/debugger/server.sh (1)
156-163:⚠️ Potential issue | 🟠 MajorRaw command text is still being written to server logs.
Line 162 echoes the full payload, so any inline token/password passed through
client.sh runends up in persisted logs. Please log onlycmd_idby default and keep raw payload logging behind an explicit debug flag.🔧 Proposed change
- echo "[server] Executing command $cmd_id: $cmd_content" + echo "[server] Executing command $cmd_id"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/debugger/server.sh` around lines 156 - 163, The server currently logs the full raw command payload (cmd_content) which can leak secrets; change the logging in the command execution path to only log the command identifier (cmd_id) by default and remove or redact cmd_content from the echo. Add conditional logging behind an explicit debug flag (e.g., CHECK_DEBUG or DEBUG_VERBOSE) so that raw payloads are only printed when that flag is set (use a test like if [ -n "$DEBUG_VERBOSE" ] to echo cmd_content), and ensure the existing echo line referencing cmd_content is replaced accordingly in the block that reads and removes the command file (cmd_file -> cmd_id/cmd_content).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/debugger/client.sh`:
- Around line 119-120: Replace the non-atomic writes to the cancel marker in
tools/debugger/client.sh (the block that echoes "Sending cancel signal..." and
writes "$cmd_id" > "$RELAY_DIR/cancel") with an atomic write: write the payload
to a per-call temp file in $RELAY_DIR (e.g., "$RELAY_DIR/.cancel.$$" or
similar), fsync if available, then atomically mv the temp file to
"$RELAY_DIR/cancel"; apply the same change to the other occurrence around lines
199-200 to mirror how .relay/running and .exit are handled by the server.
- Around line 121-124: The loop that waits for "$RELAY_DIR/running" to disappear
can falsely time out if the cancelled command finishes and the server
immediately starts the next command; update both wait loops (used in the
timeout-cancel path and the cancel subcommand) to break not only when the
running file is missing but also when its recorded cmd_id differs from the
cancelled target_id. Implement a helper like
wait_for_running_id_to_clear(target_id, timeout) that reads
"$RELAY_DIR/running", extracts the current_id (prefix before ":"), returns
success if the file is gone or current_id != target_id, otherwise sleeps
POLL_INTERVAL and retries until timeout; replace the existing seq/sleep loops
with calls to this helper in both places.
In `@tools/debugger/README.md`:
- Around line 89-99: The Protocol docs describe an outdated pre-buffered flow;
update the README protocol to match the current behavior in
tools/debugger/server.sh and tools/debugger/client.sh by documenting that the
server reads the command file contents, immediately removes the
`.relay/cmd/<timestamp>.sh` file, then executes the command via `bash -c
"$cmd_content"` (rather than running `bash <file>` and deleting at end), and
that the client writes the target `cmd_id` into `.relay/cancel` for
cancellation; also adjust the described on-disk transitions for `.relay/cmd`,
`.relay/running`, `.relay/result/*`, and `.relay/cancel` to reflect these exact
steps and exit code 130 behavior.
In `@tools/debugger/server.sh`:
- Around line 99-116: The check_modelopt_local() helper is currently fooled by
the parent's PYTHONPATH; modify the subprocess invocation used inside
check_modelopt_local to clear PYTHONPATH and run Python in isolated mode so
import resolution ignores the working tree. Concretely, when launching the
inline python check (the python -c "... import modelopt ..." block referenced in
check_modelopt_local), set the environment variable PYTHONPATH="" for that
subprocess and invoke python with the -I flag so the check validates the
actually installed package state rather than the source tree at $WORKDIR.
---
Duplicate comments:
In `@tools/debugger/server.sh`:
- Around line 156-163: The server currently logs the full raw command payload
(cmd_content) which can leak secrets; change the logging in the command
execution path to only log the command identifier (cmd_id) by default and remove
or redact cmd_content from the echo. Add conditional logging behind an explicit
debug flag (e.g., CHECK_DEBUG or DEBUG_VERBOSE) so that raw payloads are only
printed when that flag is set (use a test like if [ -n "$DEBUG_VERBOSE" ] to
echo cmd_content), and ensure the existing echo line referencing cmd_content is
replaced accordingly in the block that reads and removes the command file
(cmd_file -> cmd_id/cmd_content).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 15f5922b-82e3-4c8e-87c9-eefdcbf41ddb
📒 Files selected for processing (5)
.claude/skills/debug/SKILL.mdtools/debugger/CLAUDE.mdtools/debugger/README.mdtools/debugger/client.shtools/debugger/server.sh
✅ Files skipped from review due to trivial changes (2)
- tools/debugger/CLAUDE.md
- .claude/skills/debug/SKILL.md
| echo "Sending cancel signal..." | ||
| echo "$cmd_id" > "$RELAY_DIR/cancel" |
There was a problem hiding this comment.
Write .relay/cancel atomically.
tools/debugger/server.sh removes empty or mismatched cancel markers, but both of these sites create .relay/cancel with plain >. A poll that lands between truncate and write can read an empty file and drop a real cancel request. Use a per-call temp file plus mv, like the server already does for .relay/running and .exit.
🔧 Proposed change
- echo "$cmd_id" > "$RELAY_DIR/cancel"
+ tmp_cancel="$RELAY_DIR/cancel.$$"
+ echo "$cmd_id" > "$tmp_cancel"
+ mv "$tmp_cancel" "$RELAY_DIR/cancel"
...
- echo "$running_id" > "$RELAY_DIR/cancel"
+ tmp_cancel="$RELAY_DIR/cancel.$$"
+ echo "$running_id" > "$tmp_cancel"
+ mv "$tmp_cancel" "$RELAY_DIR/cancel"Also applies to: 199-200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/client.sh` around lines 119 - 120, Replace the non-atomic
writes to the cancel marker in tools/debugger/client.sh (the block that echoes
"Sending cancel signal..." and writes "$cmd_id" > "$RELAY_DIR/cancel") with an
atomic write: write the payload to a per-call temp file in $RELAY_DIR (e.g.,
"$RELAY_DIR/.cancel.$$" or similar), fsync if available, then atomically mv the
temp file to "$RELAY_DIR/cancel"; apply the same change to the other occurrence
around lines 199-200 to mirror how .relay/running and .exit are handled by the
server.
| for _ in $(seq 1 10); do | ||
| [[ -f "$RELAY_DIR/running" ]] || break | ||
| sleep 1 | ||
| done |
There was a problem hiding this comment.
Do not wait for the relay to go idle after cancelling one command.
Both loops only wait for .relay/running to disappear. If the cancelled command finishes and the server immediately starts the next queued command, the marker can be recreated before the next poll and these waits will falsely time out even though the target cmd_id is already gone. Break once .relay/running is missing or its cmd_id no longer matches the cancelled one.
🔧 Suggested shape
wait_for_running_id_to_clear() {
local target_id="$1" timeout="$2" elapsed=0 current_info current_id
while [[ $elapsed -lt $timeout ]]; do
if [[ ! -f "$RELAY_DIR/running" ]]; then
return 0
fi
current_info=$(cat "$RELAY_DIR/running" 2>/dev/null || true)
current_id="${current_info%%:*}"
[[ "$current_id" != "$target_id" ]] && return 0
sleep "$POLL_INTERVAL"
elapsed=$((elapsed + POLL_INTERVAL))
done
return 1
}Apply the same check in both the timeout-cancel path and the cancel subcommand.
Also applies to: 204-211
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/client.sh` around lines 121 - 124, The loop that waits for
"$RELAY_DIR/running" to disappear can falsely time out if the cancelled command
finishes and the server immediately starts the next command; update both wait
loops (used in the timeout-cancel path and the cancel subcommand) to break not
only when the running file is missing but also when its recorded cmd_id differs
from the cancelled target_id. Implement a helper like
wait_for_running_id_to_clear(target_id, timeout) that reads
"$RELAY_DIR/running", extracts the current_id (prefix before ":"), returns
success if the file is gone or current_id != target_id, otherwise sleeps
POLL_INTERVAL and retries until timeout; replace the existing seq/sleep loops
with calls to this helper in both places.
| 1. Client writes a command to `.relay/cmd/<timestamp>.sh` | ||
| 2. Server detects the file, runs `bash <file>` in the workdir, captures output | ||
| 2. Server detects the file, runs `bash <file>` in the workdir in background, writes `.relay/running` | ||
| 3. Server writes `.relay/result/<timestamp>.log` and `.relay/result/<timestamp>.exit` | ||
| 4. Server removes the `.sh` file; client reads results and cleans up | ||
| 4. Server removes the `.sh` file and `.relay/running`; client reads results and cleans up | ||
|
|
||
| ### Cancellation | ||
|
|
||
| 1. Client writes `.relay/cancel` | ||
| 2. Server detects the cancel signal, kills the running command process tree | ||
| 3. Server writes exit code 130 and removes `.relay/running` and `.relay/cancel` | ||
| 4. Client-side timeout also triggers cancellation automatically |
There was a problem hiding this comment.
Protocol section still describes the pre-buffered execution flow.
tools/debugger/server.sh no longer runs bash <file> and deletes the .sh file at the end; it reads the file, removes it immediately, and executes bash -c "$cmd_content". tools/debugger/client.sh also writes the target cmd_id into .relay/cancel. This section should be updated so the documented on-disk state transitions match what users will actually see.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/README.md` around lines 89 - 99, The Protocol docs describe an
outdated pre-buffered flow; update the README protocol to match the current
behavior in tools/debugger/server.sh and tools/debugger/client.sh by documenting
that the server reads the command file contents, immediately removes the
`.relay/cmd/<timestamp>.sh` file, then executes the command via `bash -c
"$cmd_content"` (rather than running `bash <file>` and deleting at end), and
that the client writes the target `cmd_id` into `.relay/cancel` for
cancellation; also adjust the described on-disk transitions for `.relay/cmd`,
`.relay/running`, `.relay/result/*`, and `.relay/cancel` to reflect these exact
steps and exit code 130 behavior.
| # Ensure modelopt is editable-installed from WORKDIR | ||
| check_modelopt_local() { | ||
| python -c " | ||
| import modelopt, os, sys | ||
| actual = os.path.realpath(modelopt.__path__[0]) | ||
| expected = os.path.realpath('$WORKDIR') | ||
| if not actual.startswith(expected): | ||
| print(f'modelopt loaded from {actual}, expected under {expected}', file=sys.stderr) | ||
| sys.exit(1) | ||
| " 2>&1 | ||
| } | ||
|
|
||
| if check_modelopt_local >/dev/null 2>&1; then | ||
| echo "[server] modelopt already editable-installed from $WORKDIR, skipping pip install." | ||
| else | ||
| echo "[server] Installing modelopt (pip install -e .[dev]) ..." | ||
| (cd "$WORKDIR" && pip install -e ".[dev]") || { | ||
| echo "[server] WARNING: pip install failed (exit=$?), continuing anyway." | ||
| } | ||
| (cd "$WORKDIR" && pip install -e ".[dev]") | ||
| if ! check_modelopt_local; then |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Relevant server.sh lines:"
rg -n -C2 'export PYTHONPATH|check_modelopt_local|pip install -e' tools/debugger/server.sh
echo
repo_root="$(pwd)"
echo "Import resolution with PYTHONPATH set to the repo root:"
PYTHONPATH="$repo_root" python - <<'PY'
import importlib.util, os
spec = importlib.util.find_spec("modelopt")
print("PYTHONPATH =", os.environ["PYTHONPATH"])
print("spec is None:", spec is None)
if spec and spec.submodule_search_locations:
print("search locations =", list(spec.submodule_search_locations))
PYRepository: NVIDIA/Model-Optimizer
Length of output: 1054
check_modelopt_local() no longer validates editable install due to PYTHONPATH pollution.
Line 84 exports PYTHONPATH="$WORKDIR" before check_modelopt_local() runs. Python's module resolution honors PYTHONPATH first, so import modelopt succeeds directly from the source tree without requiring editable install metadata. This allows the "already editable-installed" branch to execute even when pip install -e ".[dev]" has never run, causing the server to skip installing dev dependencies it expects.
The subprocess in check_modelopt_local() must isolate itself from the parent's PYTHONPATH. Use PYTHONPATH="" and the -I flag to ensure the check validates actual installed state:
Suggested fix
check_modelopt_local() {
- python -c "
+ WORKDIR_ENV="$WORKDIR" PYTHONPATH="" python -I -c "
import modelopt, os, sys
actual = os.path.realpath(modelopt.__path__[0])
-expected = os.path.realpath('$WORKDIR')
+expected = os.path.realpath(os.environ['WORKDIR_ENV'])
if not actual.startswith(expected):
print(f'modelopt loaded from {actual}, expected under {expected}', file=sys.stderr)
sys.exit(1)
" 2>&1
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Ensure modelopt is editable-installed from WORKDIR | |
| check_modelopt_local() { | |
| python -c " | |
| import modelopt, os, sys | |
| actual = os.path.realpath(modelopt.__path__[0]) | |
| expected = os.path.realpath('$WORKDIR') | |
| if not actual.startswith(expected): | |
| print(f'modelopt loaded from {actual}, expected under {expected}', file=sys.stderr) | |
| sys.exit(1) | |
| " 2>&1 | |
| } | |
| if check_modelopt_local >/dev/null 2>&1; then | |
| echo "[server] modelopt already editable-installed from $WORKDIR, skipping pip install." | |
| else | |
| echo "[server] Installing modelopt (pip install -e .[dev]) ..." | |
| (cd "$WORKDIR" && pip install -e ".[dev]") || { | |
| echo "[server] WARNING: pip install failed (exit=$?), continuing anyway." | |
| } | |
| (cd "$WORKDIR" && pip install -e ".[dev]") | |
| if ! check_modelopt_local; then | |
| # Ensure modelopt is editable-installed from WORKDIR | |
| check_modelopt_local() { | |
| WORKDIR_ENV="$WORKDIR" PYTHONPATH="" python -I -c " | |
| import modelopt, os, sys | |
| actual = os.path.realpath(modelopt.__path__[0]) | |
| expected = os.path.realpath(os.environ['WORKDIR_ENV']) | |
| if not actual.startswith(expected): | |
| print(f'modelopt loaded from {actual}, expected under {expected}', file=sys.stderr) | |
| sys.exit(1) | |
| " 2>&1 | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/debugger/server.sh` around lines 99 - 116, The check_modelopt_local()
helper is currently fooled by the parent's PYTHONPATH; modify the subprocess
invocation used inside check_modelopt_local to clear PYTHONPATH and run Python
in isolated mode so import resolution ignores the working tree. Concretely, when
launching the inline python check (the python -c "... import modelopt ..." block
referenced in check_modelopt_local), set the environment variable PYTHONPATH=""
for that subprocess and invoke python with the -I flag so the check validates
the actually installed package state rather than the source tree at $WORKDIR.
Summary
cancelsubcommand to the client that terminates the currently running command on the serverKey changes
server.sh:
$RELAY_DIR/running(atomic tmp+mv write)$RELAY_DIR/cancelfile with cmd_id verification.exitfile written beforerunningmarker removed (ordering guarantee)set -e-safe:waituses|| exit_code=$?pattern; cleanup trap fully guardedclient.sh:
cancelsubcommand: writes target cmd_id to cancel file, waits for server acknowledgment (30s timeout)runtimeout now sends targeted cancel signal (verifies cmd_id match to avoid killing wrong command)runtimeout cleans up orphaned result filesstatusshows currently running commandflushrejects if a command is currently running (prevents state corruption)Protocol additions
Test plan
client.sh run "sleep 30"), cancel it (client.sh cancel), verify exit code 130--timeout 5 run "sleep 30"), verify auto-cancelstatusduring execution, verify it shows the running commandflushduring execution, verify it is rejected🤖 Generated with Claude Code
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/ASummary by CodeRabbit
Documentation
New Features
cancelcommand to request termination of the currently running remote command.Improvements