Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .agents/skills/common/environment-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ If previous runs left patches in `modelopt/` (from 4C unlisted model work), chec
2. **User doesn't specify** → check for cluster config:

```bash
cat ~/.config/modelopt/clusters.yaml 2>/dev/null || cat .claude/clusters.yaml 2>/dev/null
cat ~/.config/modelopt/clusters.yaml 2>/dev/null || cat .agents/clusters.yaml 2>/dev/null || cat .claude/clusters.yaml 2>/dev/null
```

If a cluster config exists with content → **use the remote cluster** (do not fall back to local even if local GPUs are available — the cluster config indicates the user's preferred execution environment). Otherwise → **local execution**.
Expand Down
2 changes: 1 addition & 1 deletion .agents/skills/deployment/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ All checks must pass before reporting success to the user.

### 6. Remote deployment (SSH/SLURM)

If a cluster config exists (`~/.config/modelopt/clusters.yaml` or `.claude/clusters.yaml`), or the user mentions running on a remote machine:
If a cluster config exists (`~/.config/modelopt/clusters.yaml`, `.agents/clusters.yaml`, or `.claude/clusters.yaml`), or the user mentions running on a remote machine:

0. **Check container registry auth** — before submitting any SLURM job with a container image, verify credentials exist on the cluster per `skills/common/slurm-setup.md` section 6. If credentials are missing for the image's registry, ask the user to fix auth or switch to an image on an authenticated registry (e.g., NGC). **Do not submit until auth is confirmed.**

Expand Down
2 changes: 1 addition & 1 deletion .agents/skills/deployment/tests/evals.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"query": "deploy my quantized model on the SLURM cluster",
"files": [],
"expected_behavior": [
"Checks for cluster config at ~/.config/modelopt/clusters.yaml or .claude/clusters.yaml",
"Checks for cluster config at ~/.config/modelopt/clusters.yaml, .agents/clusters.yaml, or .claude/clusters.yaml",
"Sources .agents/skills/common/remote_exec.sh",
"Calls remote_load_cluster, remote_check_ssh, remote_detect_env",
"Checks if checkpoint is already on remote (e.g., from prior PTQ run) before syncing; only syncs if local",
Expand Down
Loading