Commit dd8314b
## Summary
- **vLLM fix**: Add EAGLE3 speculative decoding code path in
`_deploy_vllm_impl` — loads base model with `speculative_config` instead
of treating the unquantized EAGLE3 draft model as a quantized model
(which fails looking for nonexistent `quantization_config`)
- **SGLang fix**: Add Nemotron-specific kwargs
(`mamba_scheduler_strategy="extra_buffer"`, `SGLANG_ENABLE_SPEC_V2=1`)
for hybrid Mamba+attention architecture; remove SGLang from Nemotron
EAGLE3 test backends since upstream SGLang does not support speculative
decoding with NemotronH
- Verified vLLM fix on OCI-HSG cluster (job 3020228) — EAGLE3
speculative decoding generates correct output with TP=4
## Test plan
- [x] vLLM EAGLE3 deploy verified on cluster (4×GPU,
Nemotron-3-Nano-30B-A3B)
- [x] SGLang confirmed as upstream limitation (NemotronH `extra_buffer`
not supported for speculative decoding)
- [x] Pre-commit checks pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Tests**
* Updated deployment configurations for improved Eagle3 speculative
decoding support across vLLM and SGLang.
* Enhanced quantization method selection logic for model deployments.
* Added environment configuration for Nemotron model optimizations.
* Refined backend compatibility settings in test deployment
configurations.
<!-- review_stack_entry_start -->
[](https://app.coderabbit.ai/change-stack/NVIDIA/Model-Optimizer/pull/1568?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)
<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Ye Yu <yeyu@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent eb5ed2d commit dd8314b
2 files changed
Lines changed: 43 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
309 | 309 | | |
310 | 310 | | |
311 | 311 | | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | | - | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
321 | 333 | | |
322 | 334 | | |
323 | 335 | | |
| |||
347 | 359 | | |
348 | 360 | | |
349 | 361 | | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
362 | 381 | | |
363 | 382 | | |
364 | 383 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
640 | 640 | | |
641 | 641 | | |
642 | 642 | | |
643 | | - | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
644 | 646 | | |
645 | 647 | | |
646 | 648 | | |
| |||
0 commit comments