Skip to content

Fix MoE models patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation.#1741

Merged
rkazants merged 14 commits into
huggingface:mainfrom
popovaan:gpt_oss_bug
May 20, 2026
Merged

Fix MoE models patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation.#1741
rkazants merged 14 commits into
huggingface:mainfrom
popovaan:gpt_oss_bug

Conversation

@popovaan
Copy link
Copy Markdown
Collaborator

@popovaan popovaan commented May 15, 2026

What does this PR do?

Fixes CVS-186759

Before submitting

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@popovaan popovaan changed the title Fix GPT-OSS MoE patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation. Fix MoE models patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation. May 18, 2026
@popovaan popovaan requested a review from rkazants May 18, 2026 09:32
@popovaan popovaan marked this pull request as ready for review May 18, 2026 09:32
@popovaan
Copy link
Copy Markdown
Collaborator Author

@rkazants @echarlaix @IlyasMoutawwakil please review.

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

hi can you please explain the reason for the rewrite ? i assume from the title that its a specific pattern for openvino op fusion ?
btw you dont have to patch/overwrite the batched_mm iml, we can simply register a new impl for ov export purposes and use it by default (with set_experts_impl)

@rkazants
Copy link
Copy Markdown
Collaborator

Hi @echarlaix, @IlyasMoutawwakil,

Can you please review it? This is quite urgent fix.

Thanks,
Roman

@popovaan
Copy link
Copy Markdown
Collaborator Author

popovaan commented May 19, 2026

hi can you please explain the reason for the rewrite ? i assume from the title that its a specific pattern for openvino op fusion ?

Hi! Yes, this is exactly the reason for the rewrite. OpenVINO MoE optimizations expect a specific MoE pattern which is fused in a optimized operation, otherwise we have a quite large performance regression.

btw you dont have to patch/overwrite the batched_mm iml, we can simply register a new impl for ov export purposes and use it by default (with set_experts_impl)

Sure, I will use this approach.

@popovaan
Copy link
Copy Markdown
Collaborator Author

@IlyasMoutawwakil
When I try to add an ov specific MoE implementation to a ALL_EXPERTS_FUNCTIONS._global_mapping and then set patcher._model.set_experts_implementation("ov_batched_mm") I get the following error:

  File "/home/panas/venv/new_py3_12/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1937, in get_correct_experts_implementation
    raise ValueError(message)
ValueError: Specified `experts_implementation="ov_batched_mm"` is not supported. The only possible arguments are `experts_implementation="eager"`, `"experts_implementation=grouped_mm"` and `"experts_implementation=batched_mm"`.

What is the correct way to register the new MoE impl to avoid this error?

Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition @popovaan

Comment thread optimum/exporters/openvino/model_patcher.py
Comment on lines +211 to +222
def patch_batched_mm(patcher):
from transformers.integrations.moe import ALL_EXPERTS_FUNCTIONS

patcher.original_gpt_oss_forward = ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"]
ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"] = batched_mm_experts_forward_patched
patcher._model.set_experts_implementation("batched_mm")


def set_original_batched_mm(patcher):
from transformers.integrations.moe import ALL_EXPERTS_FUNCTIONS

ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"] = patcher.original_gpt_oss_forward
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

ALL_EXPERTS_FUNCTIONS.register("ov_batched_mm", batched_mm_experts_forward_patched)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I set it like this:

ALL_EXPERTS_FUNCTIONS.register("ov_batched_mm", batched_mm_experts_forward_patched)
patcher._model.set_experts_implementation("ov_batched_mm")

It results in error:

ValueError: Specified `experts_implementation="ov_batched_mm"` is not supported. The only possible arguments are `experts_implementation="eager"`, `"experts_implementation=grouped_mm"` and `"experts_implementation=batched_mm"`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@popovaan could it come from https://github.com/huggingface/transformers/blob/v5.0.0/src/transformers/modeling_utils.py#L1932 where possible experts implementation are set to ["eager", "grouped_mm", "batched_mm"] when we could check ALL_EXPERTS_FUNCTIONS instead ? cc @IlyasMoutawwakil

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the error comes from this line.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of patching get_correct_experts_implementation ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that worked. Done.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi yes i hardcoded the list of supported impls long ago and it only got fixed / became dynamically checked in last transformers version 🥲

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +512 to +515
from transformers.models.mixtral.modeling_mixtral import MixtralExperts

self.original_moe_forward = MixtralExperts.forward
MixtralExperts.forward = lfm2_moe_experts_forward
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from transformers.models.mixtral.modeling_mixtral import MixtralExperts
self.original_moe_forward = MixtralExperts.forward
MixtralExperts.forward = lfm2_moe_experts_forward
self._model.set_experts_implementation("ov_batched_mm")

Copy link
Copy Markdown
Collaborator Author

@popovaan popovaan May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting custom MoE implementation patcher._model.set_experts_implementation("ov_batched_mm") doesn't work, as I mentioned in the previous comment it fails with this error:

ValueError: Specified `experts_implementation="ov_batched_mm"` is not supported. The only possible arguments are `experts_implementation="eager"`, `"experts_implementation=grouped_mm"` and `"experts_implementation=batched_mm"`.

Looks like it's due to this check that expects specifically one of ["eager", "grouped_mm", "batched_mm"]: https://github.com/huggingface/transformers/blob/08810b1e278938278c50153ee1edfd7a20a759da/src/transformers/modeling_utils.py#L1932

@popovaan popovaan requested a review from echarlaix May 20, 2026 07:54
@popovaan popovaan requested a review from regisss May 20, 2026 10:22
@popovaan
Copy link
Copy Markdown
Collaborator Author

@regisss could you please review?

Comment thread optimum/exporters/openvino/model_patcher.py Outdated
Comment thread optimum/exporters/openvino/model_patcher.py Outdated
from transformers.models.phimoe.modeling_phimoe import PhimoeExperts

self.original_moe_forward = PhimoeExperts.forward
PhimoeExperts.forward = lfm2_moe_experts_forward
Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we do the same here ? (use ov_batched_mm or similar)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be done in a following PR

Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for iterating @popovaan

@rkazants rkazants merged commit 140e49a into huggingface:main May 20, 2026
49 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants