Skip to content

Commit adcc532

Browse files
Edwardf0t1claude
authored andcommitted
docs(skills): list .agents/clusters.yaml as canonical in deployment + env-setup hints (#1643)
### What does this PR do? Type of change: documentation Fixes minor doc drift left by the `.agents/` migration (#1362). That change made `.agents/clusters.yaml` the canonical project-level cluster config (with `.claude/clusters.yaml` kept for back-compat), and updated `remote-execution.md` + `remote_exec.sh` accordingly — but three cluster-config lookup hints still listed only `.claude/clusters.yaml`: - `.agents/skills/deployment/SKILL.md` — remote-deployment check - `.agents/skills/deployment/tests/evals.json` — `expected_behavior` assertion - `.agents/skills/common/environment-setup.md` — cluster-config `cat` snippet Each now lists `.agents/clusters.yaml` (canonical) ahead of `.claude/clusters.yaml` (back-compat), matching the documented lookup order in `remote-execution.md` and the actual implementation in `remote_exec.sh`. Verified the rest are already correct: `remote_exec.sh`, `remote-execution.md`, and `.agents/README.md` list `.agents` as canonical with `.claude` labeled back-compat. ### Usage N/A — documentation only. ### Testing `json.load` validates the edited `evals.json`; changes are text-only lookup hints (no code paths affected). The implementation (`remote_exec.sh`) already checked `.agents/clusters.yaml` first, so behavior is unchanged. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: N/A (doc-only) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A (minor doc fix) - Did you get Claude approval on this PR?: ⬜ (run `/claude review` before marking ready) ### Additional Information Follow-up cleanup spun out of the day0-release work (#1596); kept separate to keep that PR scoped. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Updated cluster configuration detection to support local project-level configuration files alongside system-wide configurations. * **Tests** * Updated deployment tests to reflect expanded cluster configuration discovery mechanisms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 68b7b64 commit adcc532

3 files changed

Lines changed: 3 additions & 3 deletions

File tree

.agents/skills/common/environment-setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ If previous runs left patches in `modelopt/` (from 4C unlisted model work), chec
2424
2. **User doesn't specify** → check for cluster config:
2525

2626
```bash
27-
cat ~/.config/modelopt/clusters.yaml 2>/dev/null || cat .claude/clusters.yaml 2>/dev/null
27+
cat ~/.config/modelopt/clusters.yaml 2>/dev/null || cat .agents/clusters.yaml 2>/dev/null || cat .claude/clusters.yaml 2>/dev/null
2828
```
2929

3030
If a cluster config exists with content → **use the remote cluster** (do not fall back to local even if local GPUs are available — the cluster config indicates the user's preferred execution environment). Otherwise → **local execution**.

.agents/skills/deployment/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ All checks must pass before reporting success to the user.
185185

186186
### 6. Remote deployment (SSH/SLURM)
187187

188-
If a cluster config exists (`~/.config/modelopt/clusters.yaml` or `.claude/clusters.yaml`), or the user mentions running on a remote machine:
188+
If a cluster config exists (`~/.config/modelopt/clusters.yaml`, `.agents/clusters.yaml`, or `.claude/clusters.yaml`), or the user mentions running on a remote machine:
189189

190190
0. **Check container registry auth** — before submitting any SLURM job with a container image, verify credentials exist on the cluster per `skills/common/slurm-setup.md` section 6. If credentials are missing for the image's registry, ask the user to fix auth or switch to an image on an authenticated registry (e.g., NGC). **Do not submit until auth is confirmed.**
191191

.agents/skills/deployment/tests/evals.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
"query": "deploy my quantized model on the SLURM cluster",
2727
"files": [],
2828
"expected_behavior": [
29-
"Checks for cluster config at ~/.config/modelopt/clusters.yaml or .claude/clusters.yaml",
29+
"Checks for cluster config at ~/.config/modelopt/clusters.yaml, .agents/clusters.yaml, or .claude/clusters.yaml",
3030
"Sources .agents/skills/common/remote_exec.sh",
3131
"Calls remote_load_cluster, remote_check_ssh, remote_detect_env",
3232
"Checks if checkpoint is already on remote (e.g., from prior PTQ run) before syncing; only syncs if local",

0 commit comments

Comments
 (0)