Commit 9f8188d
authored
[1/N] Polish deployment skills - Add a debug loop for unsupported models (#1236)
### What does this PR do?
Type of change: Skills update
Add a debug loop guide for deploying unsupported models to the
deployment skill. When deploying models not in the validated support
matrix (e.g., newly quantized VLMs or models with new architectures like
Devstral/ministral3), the inference framework (vLLM, SGLang, TRT-LLM)
often fails during model init or weight loading.
This PR adds:
- `references/unsupported-models.md` — a 5-step iterative debug
workflow: **run → read error → diagnose → patch framework source →
re-run**
- A short pointer in `SKILL.md` under "Unsupported Models" (keeps
SKILL.md concise, matching the PTQ skill's pattern)
The guide covers five common error categories with real-world examples:
- **Weight key mismatches** (e.g.,
[vllm#39406](vllm-project/vllm#39406))
- **Quantized/unquantized layer confusion** (e.g.,
[sglang#18937](sgl-project/sglang#18937))
- **Missing architecture support** (e.g., `ministral3` not handled in
vLLM's `mistral3.py`)
- **Transformers version mismatches**
- **Kernel-level issues** (escalate to framework team)
Motivated by deploying a Devstral-Small-2-24B NVFP4 checkpoint on vLLM,
where vLLM's `mistral3.py` didn't handle `ministral3` as a text backbone
model type.
### Testing
Validated end-to-end: NVFP4 quantization of Devstral-Small-2-24B → vLLM
deployment on B100 GPUs with the debug loop (3 iterations to get the
server running).
### Before your PR is "*Ready for review*"
- Is this change backward compatible?: N/A (documentation only)
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: N/A (skill documentation)
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added a deployment guide for unsupported models with an iterative "run
→ read error → diagnose → patch → re-run" troubleshooting workflow,
common failure categories, escalation criteria, and practical
remediation tips.
* Added post-quantization validation guidance and a lightweight script
to verify which layers are quantized vs excluded, plus recommendations
for addressing unexpected layers and MoE/VLM naming gaps.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>1 parent d45219b commit 9f8188d
File tree
5 files changed
+166
-0
lines changed- .claude/skills
- deployment
- references
- ptq
- references
5 files changed
+166
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
222 | 222 | | |
223 | 223 | | |
224 | 224 | | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
225 | 229 | | |
226 | 230 | | |
227 | 231 | | |
| |||
Lines changed: 70 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
116 | 120 | | |
117 | 121 | | |
118 | 122 | | |
| |||
137 | 141 | | |
138 | 142 | | |
139 | 143 | | |
| 144 | + | |
140 | 145 | | |
141 | 146 | | |
142 | 147 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
347 | 347 | | |
348 | 348 | | |
349 | 349 | | |
| 350 | + | |
350 | 351 | | |
0 commit comments