Skip to content

[codex] Remove vLLM layerwise reload alias patch#2706

Draft
samsja wants to merge 1 commit into
mainfrom
fix/remove-vllm-layerwise-reload-patch
Draft

[codex] Remove vLLM layerwise reload alias patch#2706
samsja wants to merge 1 commit into
mainfrom
fix/remove-vllm-layerwise-reload-patch

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Jun 4, 2026

Summary

  • Remove the prime-rl monkey patch for vLLM layerwise reload alias buffers.
  • Stop overriding vLLM's _copy_and_restore_kernel_tensors during the general plugin setup.

Details

vLLM 0.22.0 already registers MambaMixer2.conv_weights as a non-persistent buffer and its layerwise reload metadata skips non-persistent buffers that alias parameter storage. Keeping the prime-rl patch conflicts with that behavior because the original kernel tensor list can still contain conv_weights while the temporary restored layer intentionally does not.

This returns the checkpoint-format /update_weights path to vLLM's own reload implementation.

Validation

  • UV_NO_SYNC=1 uv run ruff check src/prime_rl/inference/patches.py
  • Local vLLM reload probe after transformers_v5_compat():
    • verified layerwise._copy_and_restore_kernel_tensors remains the original vLLM function
    • verified MambaMixer2.__init__ registers conv_weights with persistent=False
    • verified vLLM's default reload metadata omits a non-persistent child-parameter alias buffer and restores the layer without AttributeError or parameter corruption
  • Runtime validation on local stack with [codex] Fix external-LB inference config sizing #2705 + this PR, Slurm job 23624:
    • all 4 API servers reached Application startup complete
    • orchestrator initialized NCCL with 4 servers, inference_world_size=16, gpus_per_server=4
    • all 16 inference workers logged Reloading checkpoint-format weights with vLLM layerwise processing
    • all 4 /update_weights admin requests returned 200 OK
    • no Enabled vLLM layerwise reload alias-buffer patch, conv_weights, AttributeError, Internal Server Error, or Traceback appeared in the runtime logs
    • W&B run: https://wandb.ai/primeintellect/nemotron_sami_debug/runs/e2f73ef0996346cf9c4fe91732a4796f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant