Skip to content

[codex] Fix vLLM layerwise reload alias buffers#2701

Draft
samsja wants to merge 2 commits into
mainfrom
fix/vllm-mamba-reload-alias-buffers
Draft

[codex] Fix vLLM layerwise reload alias buffers#2701
samsja wants to merge 2 commits into
mainfrom
fix/vllm-mamba-reload-alias-buffers

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Jun 3, 2026

Summary

  • Fix vLLM layerwise reload alias-buffer handling so Mamba/NemotronH reloads do not copy aliased buffers back over parameter storage.
  • Cover aliases to child-module parameters such as MambaMixer2.conv_weights -> conv1d.weight and skip alias buffers that vLLM intentionally omitted from restore metadata.
  • Update the weight-update monitoring workflow note to check inference-server /update_weights results, not only trainer-side broadcast logs.

Details

The vLLM API server /update_weights path reloads checkpoint-format weights layer by layer. NemotronH/Mamba can register buffers on a parent module that alias parameter storage owned by a child module. The previous alias handling only covered direct layer parameters and could either copy stale buffer data back over parameter storage or fail when vLLM omitted an alias buffer from the restored module state.

The patch compares aliased buffers against recursive parameter storage and captured kernel parameters, uses module registries instead of getattr, and skips absent alias buffers.

Validation

  • UV_NO_SYNC=1 uv run pytest tests/unit/inference/test_vllm_reload_patches.py
  • UV_NO_SYNC=1 uv run ruff check src/prime_rl/inference/patches.py tests/unit/inference/test_vllm_reload_patches.py
  • Combined-stack SLURM job 23599 with the external-LB config fix from [codex] Fix external-LB inference config sizing #2705 reached all 4 API servers ready, completed rollouts with Error 0.0%, all 16 vLLM workers reloaded checkpoint-format weights, all four /update_weights calls returned 200 OK, inference resumed, and the trainer started step 1. Final log scan found no ERROR, Traceback, Fatal, ValueError, data_parallel_rank, conv_weights, or Internal Server Error failures.

@samsja samsja force-pushed the fix/vllm-mamba-reload-alias-buffers branch from dcc607a to 305689d Compare June 4, 2026 01:32
hallerite added a commit that referenced this pull request Jun 4, 2026
…fter vLLM reload

vLLM 0.22's layerwise reload mis-loads exactly two NemotronH per-layer param
families through the online-reload path -- mixer.D (Mamba SSD skip) and the MoE
router's gate.e_score_correction_bias -- while loading all other weights
correctly. mixer.D becomes non-deterministic garbage/inf (NaN logits) and the
gate bias gets a wrong value (broken routing), so generations go to NaN after a
weight update. Restore both from the received broadcast (correct by definition)
via each param's own weight_loader.

Also drop monkey_patch_vllm_layerwise_reload_alias_buffers: it crashes on vLLM
0.22 (AttributeError on the delattr'd conv_weights) and conv_weights is handled
correctly by vLLM's native reload finalize. Supersedes #2701.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant