You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scope `chmod` to only the directories the job needs — avoid world-writable paths on shared clusters.
195
+
196
+
---
197
+
198
+
## 6. Container Registry Authentication
199
+
200
+
**Before submitting any SLURM job that pulls a container image**, check that the cluster has credentials for the image's registry. Missing auth causes jobs to fail after waiting in the queue — a costly mistake.
201
+
202
+
### Step 1: Detect the container runtime
203
+
204
+
Different clusters use different container runtimes. Detect which is available:
3.**Suggest an alternative image** on an authenticated registry. NVIDIA clusters typically have NGC auth pre-configured, so prefer NGC-hosted images:
287
+
288
+
| DockerHub image | NGC alternative |
289
+
| --- | --- |
290
+
|`vllm/vllm-openai:latest`|`nvcr.io/nvidia/vllm:<YY.MM>-py3` (check [NGC catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm) for latest tag) |
291
+
|`nvcr.io/nvidia/tensorrt-llm/release:<tag>`| Already NGC |
292
+
293
+
> **Note:** NGC image tags follow `YY.MM-py3` format (e.g., `26.03-py3`). Not all DockerHub images have NGC equivalents. If no NGC alternative exists and DockerHub auth is missing, the user must add DockerHub credentials or pre-cache the image as a `.sqsh` file.
294
+
295
+
4. After the user fixes auth or switches images, verify the image is **actually pullable** before submitting (credentials alone don't guarantee the image exists):
296
+
297
+
```bash
298
+
# enroot — test pull (aborts after manifest fetch)
299
+
enroot import --output /dev/null docker://<registry>#<image> 2>&1 | head -10
300
+
# Success: shows "Fetching image manifest" + layer info
301
+
# Failure: shows "401 Unauthorized" or "404 Not Found"
302
+
303
+
# docker
304
+
docker manifest inspect <image>2>&1| head -5
305
+
306
+
# singularity
307
+
singularity pull --dry-run docker://<image>2>&1| head -5
308
+
```
309
+
310
+
> **Important**: Credentials existing for a registry does NOT mean a specific image is accessible. The image may not exist, or the credentials may lack permissions for that repository. Always verify the specific image before submitting.
311
+
312
+
### Common failure modes
313
+
314
+
| Symptom | Runtime | Cause | Fix |
315
+
| --- | --- | --- | --- |
316
+
|`curl: (22) ... error: 401`| enroot | No credentials for registry | Add to `~/.config/enroot/.credentials`|
317
+
|`pyxis: failed to import docker image`| enroot | Auth failed or rate limit | Check credentials; DockerHub free: 100 pulls/6h per IP |
318
+
|`unauthorized: authentication required`| docker | No `docker login`| Run `docker login [registry]`|
319
+
| Image pulls on some nodes but not others | any | Cached on one node only | Pre-cache image or ensure auth on all nodes |
description: Run commands inside a remote Docker container via the file-based command relay (tools/debugger). Use when the user says "run in Docker", "run on GPU", "debug remotely", "run test in container", "check nvidia-smi", "run pytest in Docker", or needs to execute any command inside a Docker container that shares the repo filesystem. Requires the user to have started server.sh inside the container first.
4
+
---
5
+
6
+
# Remote Docker Debugger
7
+
8
+
Execute commands inside a Docker container from the host using the file-based command relay.
9
+
10
+
**Read `tools/debugger/CLAUDE.md` for full usage details** — it has the protocol and examples.
11
+
12
+
## Quick Reference
13
+
14
+
```bash
15
+
# Check connection
16
+
bash tools/debugger/client.sh status
17
+
18
+
# Connect to server (user must start server.sh in Docker first)
19
+
bash tools/debugger/client.sh handshake
20
+
21
+
# Run a command
22
+
bash tools/debugger/client.sh run "<command>"
23
+
24
+
# Long-running command (default timeout is 600s)
25
+
bash tools/debugger/client.sh --timeout 1800 run "<command>"
Copy file name to clipboardExpand all lines: .claude/skills/deployment/SKILL.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -174,6 +174,8 @@ All checks must pass before reporting success to the user.
174
174
175
175
If a cluster config exists (`~/.config/modelopt/clusters.yaml` or `.claude/clusters.yaml`), or the user mentions running on a remote machine:
176
176
177
+
0.**Check container registry auth** — before submitting any SLURM job with a container image, verify credentials exist on the cluster per `skills/common/slurm-setup.md` section 6. If credentials are missing for the image's registry, ask the user to fix auth or switch to an image on an authenticated registry (e.g., NGC). **Do not submit until auth is confirmed.**
178
+
177
179
1.**Source remote utilities:**
178
180
179
181
```bash
@@ -222,6 +224,10 @@ For NEL-managed deployment (evaluation with self-deployment), use the evaluation
222
224
|`Connection refused` on health check | Server still starting | Wait 30-60s for large models; check logs for errors |
223
225
|`modelopt_fp4 not supported`| Framework doesn't support FP4 for this model | Check support matrix in `references/support-matrix.md` |
224
226
227
+
## Unsupported Models
228
+
229
+
If the model is not in the validated support matrix (`references/support-matrix.md`), deployment may fail due to weight key mismatches, missing architecture mappings, or quantized/unquantized layer confusion. Read `references/unsupported-models.md` for the iterative debug loop: **run → read error → diagnose → patch framework source → re-run**. For kernel-level issues, escalate to the framework team rather than attempting fixes.
230
+
225
231
## Success Criteria
226
232
227
233
1. Server process is running and healthy (`/health` returns 200)
When deploying a model not in the validated support matrix (`support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM.
4
+
5
+
## Step 1 — Run and collect the error
6
+
7
+
Submit the deployment job. When it fails, read the full log — focus on the **first** error traceback (not "See root cause above" wrappers). Identify the file and line number in the framework source.
8
+
9
+
## Step 2 — Diagnose the root cause
10
+
11
+
Fetch the framework source at the failing line (use `gh api` for the tagged version, or `find` inside the container). Common error categories:
12
+
13
+
| Category | Symptoms | Examples |
14
+
|----------|----------|----------|
15
+
|**Weight key mismatch**|`KeyError`, `Unexpected key`, `Missing key` during weight loading | Checkpoint uses `model.language_model.layers.*` but framework expects `model.layers.*`. See [vllm#39406](https://github.com/vllm-project/vllm/pull/39406)|
16
+
|**Quantized/unquantized layer confusion**| Wrong layer type loaded, dtype errors, shape mismatches | Framework tries to load unquantized layers with FP4 kernel due to overly broad `quantization_config.ignore` patterns or missing ignore entries. See [sglang#18937](https://github.com/sgl-project/sglang/pull/18937)|
17
+
|**Missing architecture support**|`NoneType is not iterable`, `KeyError` on model type, unknown architecture | Framework's model handler doesn't recognize the text backbone type (e.g., `ministral3` not handled in vLLM's `mistral3.py` init). Fix: extend the model type mapping |
18
+
|**Transformers version mismatch**|`ImportError`, `KeyError` on config fields | Framework ships with older transformers that doesn't know the model type. Fix: upgrade transformers after installing the framework |
19
+
|**Kernel-level issues**| CUDA errors, `triton` import failures, unsupported ops | Framework lacks kernel support for this model + quantization combo |
20
+
21
+
## Step 3 — Apply a targeted fix
22
+
23
+
Focus on **small, targeted patches** to the framework source. Do not modify `config.json` or the checkpoint — fix the framework's handling instead.
24
+
25
+
### Weight key mismatches and architecture mapping gaps
26
+
27
+
Patch the framework source in the run script using `sed` or a Python one-liner. Keep patches minimal — change only what's needed to unblock the current error.
28
+
29
+
```bash
30
+
# Example: extend model type mapping in vLLM mistral3.py
31
+
FRAMEWORK_FILE=$(find /usr/local/lib -path "*/vllm/model_executor/models/mistral3.py"2>/dev/null | head -1)
32
+
sed -i 's/old_pattern/new_pattern/'"${FRAMEWORK_FILE}"
33
+
```
34
+
35
+
> **Tip**: when locating framework source files inside containers, use `find` instead of Python import — some frameworks print log messages to stdout during import that can corrupt captured paths.
36
+
37
+
### Speeding up debug iterations (vLLM)
38
+
39
+
When iterating on fixes, use these flags to shorten the feedback loop:
40
+
41
+
-**`--load-format dummy`** — skip loading actual model weights. Useful for testing whether the model initializes, config is parsed correctly, and weight keys match without waiting for the full checkpoint load.
42
+
-**`VLLM_USE_PRECOMPILED=1 pip install --editable .`** — when patching vLLM source directly (instead of `sed`), this rebuilds only Python code without recompiling C++/CUDA extensions.
43
+
44
+
### Quantized/unquantized layer confusion
45
+
46
+
Check `hf_quant_config.json` ignore patterns against the framework's weight loading logic. The framework may try to load layers listed in `ignore` with quantized kernels, or vice versa. Fix by adjusting the framework's layer filtering logic.
47
+
48
+
### Kernel-level issues
49
+
50
+
These require framework kernel team involvement. Do NOT attempt to patch kernels. Instead:
51
+
52
+
1. Document the exact error (model, format, framework version, GPU type)
53
+
2. Inform the user: *"This model + quantization combination requires kernel support that isn't available in {framework} v{version}. I'd suggest reaching out to the {framework} kernel team or trying a different framework."*
54
+
3. Suggest trying an alternative framework (vLLM → SGLang → TRT-LLM)
55
+
56
+
## Step 4 — Re-run and iterate
57
+
58
+
After applying a fix, resubmit the job. Each iteration may reveal a new error (e.g., fixing the init error exposes a weight loading error). Continue the loop: **run → read error → diagnose → patch → re-run**.
59
+
60
+
Typical iteration count: 1-3 for straightforward fixes, 3-5 for models requiring multiple patches.
61
+
62
+
## Step 5 — Know when to stop
63
+
64
+
**Stop patching and escalate** when:
65
+
66
+
- The error is in compiled CUDA kernels or triton ops (not Python-level)
67
+
- The fix requires changes to core framework abstractions (not just model handlers)
68
+
- You've done 5+ iterations without the server starting
69
+
70
+
In these cases, inform the user and suggest: trying a different framework, checking for a newer framework version, or filing an issue with the framework team.
@@ -74,9 +75,9 @@ Prompt the user with "I'll ask you 5 questions to build the base config we'll ad
74
75
4. Safety & Security (like Garak and Safety Harness)
75
76
5. Multilingual (like MMATH, Global MMLU, MMLU-Prox)
76
77
77
-
DON'T ALLOW FOR ANY OTHER OPTIONS, only the ones listed above under each category (Execution, Deployment, Auto-export, Model type, Benchmarks). YOU HAVE TO GATHER THE ANSWERS for the 5 questions before you can build the base config.
78
+
Only accept options from the categories listed above (Execution, Deployment, Auto-export, Model type, Benchmarks). YOU HAVE TO GATHER THE ANSWERS for the 5 questions before you can build the base config.
78
79
79
-
> **Note:** These categories come from NEL's `build-config` CLI. If `nel skills build-config --help`shows different options than listed above, use the CLI's current options instead.
80
+
> **Note:** These categories come from NEL's `build-config` CLI. **Always run `nel skills build-config --help`first** to get the current options — they may differ from this list (e.g., `chat_reasoning` instead of separate `chat`/`reasoning`, `general_knowledge` instead of `standard`). When the CLI's current options differ from this list, prefer the CLI's options.
80
81
81
82
When you have all the answers, run the script to build the base config:
82
83
@@ -181,6 +182,36 @@ If the user needs multi-node evaluation (model >120B, or more throughput), read
181
182
182
183
- The docs may show incorrect parameter names for logging. Use `max_logged_requests` and `max_logged_responses` (NOT `max_saved_*` or `max_*`).
Validate the exported checkpoint's quantization pattern matches the recipe. Quantization config patterns can silently miss layers if the model uses non-standard naming (e.g., Gemma4 `experts.*` missed by `*mlp*` patterns) — this only surfaces later as deployment failures. Read `references/checkpoint-validation.md` for the validation script, expected patterns per recipe, and common pattern gaps.
137
+
116
138
## Key API Rules
117
139
118
140
-`mtq.register()` classes **must** define `_setup()` and call it from `__init__`
@@ -124,6 +146,7 @@ Report the path and size to the user.
124
146
125
147
## Common Pitfalls
126
148
149
+
-**Model-specific dependencies**: Models with `trust_remote_code` may import packages not in the container (e.g., `mamba-ssm` for hybrid Mamba models). See Step 2.5. Use `EXTRA_PIP_DEPS` env var with the launcher, or install manually before running `hf_ptq.py`
127
150
-**Transformers version**: New models may need a newer version of transformers than what's installed. Check `config.json` for `transformers_version`. In containers, beware of `PIP_CONSTRAINT` blocking upgrades — see `references/slurm-setup-ptq.md` for workarounds
128
151
-**Gated datasets**: Some calibration datasets require HF authentication. Ensure `HF_TOKEN` is set in the job environment, or use `--dataset cnn_dailymail` as a non-gated alternative
129
152
-**NFS root_squash + Docker**: See `skills/common/slurm-setup.md` section 5
@@ -137,6 +160,7 @@ Report the path and size to the user.
137
160
|`references/launcher-guide.md`| Step 4B only (launcher path) |
138
161
|`tools/launcher/CLAUDE.md`| Step 4B only, if you need more launcher detail |
139
162
|`references/unsupported-models.md`| Step 4C only (unlisted model) |
0 commit comments