[codex] Fix vLLM layerwise reload alias buffers by samsja · Pull Request #2701 · PrimeIntellect-ai/prime-rl

samsja · 2026-06-03T17:06:15Z

Summary

Fix vLLM layerwise reload alias-buffer handling so Mamba/NemotronH reloads do not copy aliased buffers back over parameter storage.
Cover aliases to child-module parameters such as MambaMixer2.conv_weights -> conv1d.weight and skip alias buffers that vLLM intentionally omitted from restore metadata.
Update the weight-update monitoring workflow note to check inference-server /update_weights results, not only trainer-side broadcast logs.

Details

The vLLM API server /update_weights path reloads checkpoint-format weights layer by layer. NemotronH/Mamba can register buffers on a parent module that alias parameter storage owned by a child module. The previous alias handling only covered direct layer parameters and could either copy stale buffer data back over parameter storage or fail when vLLM omitted an alias buffer from the restored module state.

The patch compares aliased buffers against recursive parameter storage and captured kernel parameters, uses module registries instead of getattr, and skips absent alias buffers.

Validation

UV_NO_SYNC=1 uv run pytest tests/unit/inference/test_vllm_reload_patches.py
UV_NO_SYNC=1 uv run ruff check src/prime_rl/inference/patches.py tests/unit/inference/test_vllm_reload_patches.py
Combined-stack SLURM job 23599 with the external-LB config fix from [codex] Fix external-LB inference config sizing #2705 reached all 4 API servers ready, completed rollouts with Error 0.0%, all 16 vLLM workers reloaded checkpoint-format weights, all four /update_weights calls returned 200 OK, inference resumed, and the trainer started step 1. Final log scan found no ERROR, Traceback, Fatal, ValueError, data_parallel_rank, conv_weights, or Internal Server Error failures.

…fter vLLM reload vLLM 0.22's layerwise reload mis-loads exactly two NemotronH per-layer param families through the online-reload path -- mixer.D (Mamba SSD skip) and the MoE router's gate.e_score_correction_bias -- while loading all other weights correctly. mixer.D becomes non-deterministic garbage/inf (NaN logits) and the gate bias gets a wrong value (broken routing), so generations go to NaN after a weight update. Restore both from the received broadcast (correct by definition) via each param's own weight_loader. Also drop monkey_patch_vllm_layerwise_reload_alias_buffers: it crashes on vLLM 0.22 (AttributeError on the delattr'd conv_weights) and conv_weights is handled correctly by vLLM's native reload finalize. Supersedes #2701. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

samsja added 2 commits June 3, 2026 22:35

fix vLLM layerwise reload alias buffers

c893e9f

fix vllm mamba layerwise reload aliases

305689d

samsja force-pushed the fix/vllm-mamba-reload-alias-buffers branch from dcc607a to 305689d Compare June 4, 2026 01:32

hallerite mentioned this pull request Jun 4, 2026

fix(inference): restore NemotronH mixer.D after vLLM 0.22 layerwise reload #2714

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Fix vLLM layerwise reload alias buffers#2701

[codex] Fix vLLM layerwise reload alias buffers#2701
samsja wants to merge 2 commits into
mainfrom
fix/vllm-mamba-reload-alias-buffers

samsja commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samsja commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samsja commented Jun 3, 2026 •

edited

Loading