[DO NOT MERGE]: Test by S1ro1 · Pull Request #2675 · PrimeIntellect-ai/prime-rl

S1ro1 · 2026-05-30T01:44:55Z

Note

Low Risk
Scoped to vLLM LoRA profiling/warmup in inference patches; no changes to training, auth, or persisted data paths.

Overview
Adds a vLLM inference monkey patch so dummy LoRA warmup during profiling no longer crashes on FusedMoEWithLoRA (and related MoE LoRA layers) under expert parallelism.

The new monkey_patch_fused_moe_dummy_lora replaces LoRAModelManager.create_dummy_lora and is registered from transformers_v5_compat alongside the other vLLM patches. For FusedMoEWithLoRA, it limits packed “replacement” slices to the local EP experts and sets n_slices from that narrowed list so dummy weight creation does not index past module.lora_a_stacked (vLLM 0.21.0 bug). It also keeps explicit handling for FusedMoE3DWithLoRA and MoE PackedLoRALayerWeights packing.

^{Reviewed by Cursor Bugbot for commit e6010e3. Bugbot is set up for automated code reviews on this repo. Configure here.}

test

e6010e3

S1ro1 marked this pull request as draft May 31, 2026 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE]: Test#2675

[DO NOT MERGE]: Test#2675
S1ro1 wants to merge 1 commit into
mainfrom
ep-lora

S1ro1 commented May 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

S1ro1 commented May 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

S1ro1 commented May 30, 2026 •

edited by cursor Bot

Loading