Skip to content

[DO NOT MERGE]: Test#2675

Draft
S1ro1 wants to merge 1 commit into
mainfrom
ep-lora
Draft

[DO NOT MERGE]: Test#2675
S1ro1 wants to merge 1 commit into
mainfrom
ep-lora

Conversation

@S1ro1
Copy link
Copy Markdown
Collaborator

@S1ro1 S1ro1 commented May 30, 2026

Note

Low Risk
Scoped to vLLM LoRA profiling/warmup in inference patches; no changes to training, auth, or persisted data paths.

Overview
Adds a vLLM inference monkey patch so dummy LoRA warmup during profiling no longer crashes on FusedMoEWithLoRA (and related MoE LoRA layers) under expert parallelism.

The new monkey_patch_fused_moe_dummy_lora replaces LoRAModelManager.create_dummy_lora and is registered from transformers_v5_compat alongside the other vLLM patches. For FusedMoEWithLoRA, it limits packed “replacement” slices to the local EP experts and sets n_slices from that narrowed list so dummy weight creation does not index past module.lora_a_stacked (vLLM 0.21.0 bug). It also keeps explicit handling for FusedMoE3DWithLoRA and MoE PackedLoRALayerWeights packing.

Reviewed by Cursor Bugbot for commit e6010e3. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 marked this pull request as draft May 31, 2026 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant