Fix MoE models patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation. by popovaan · Pull Request #1741 · huggingface/optimum-intel

popovaan · 2026-05-15T15:30:59Z

What does this PR do?

Fixes CVS-186759

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? - Will be covered in Added a new test to check whether some transformations is applied during conversion or completion #1651

HuggingFaceDocBuilderDev · 2026-05-15T15:33:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

popovaan · 2026-05-18T15:42:52Z

@rkazants @echarlaix @IlyasMoutawwakil please review.

IlyasMoutawwakil · 2026-05-18T23:10:00Z

hi can you please explain the reason for the rewrite ? i assume from the title that its a specific pattern for openvino op fusion ?
btw you dont have to patch/overwrite the batched_mm iml, we can simply register a new impl for ov export purposes and use it by default (with set_experts_impl)

rkazants · 2026-05-19T07:05:59Z

Hi @echarlaix, @IlyasMoutawwakil,

Can you please review it? This is quite urgent fix.

Thanks,
Roman

popovaan · 2026-05-19T07:33:08Z

hi can you please explain the reason for the rewrite ? i assume from the title that its a specific pattern for openvino op fusion ?

Hi! Yes, this is exactly the reason for the rewrite. OpenVINO MoE optimizations expect a specific MoE pattern which is fused in a optimized operation, otherwise we have a quite large performance regression.

btw you dont have to patch/overwrite the batched_mm iml, we can simply register a new impl for ov export purposes and use it by default (with set_experts_impl)

Sure, I will use this approach.

popovaan · 2026-05-19T08:08:29Z

@IlyasMoutawwakil
When I try to add an ov specific MoE implementation to a ALL_EXPERTS_FUNCTIONS._global_mapping and then set patcher._model.set_experts_implementation("ov_batched_mm") I get the following error:

  File "/home/panas/venv/new_py3_12/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1937, in get_correct_experts_implementation
    raise ValueError(message)
ValueError: Specified `experts_implementation="ov_batched_mm"` is not supported. The only possible arguments are `experts_implementation="eager"`, `"experts_implementation=grouped_mm"` and `"experts_implementation=batched_mm"`.

What is the correct way to register the new MoE impl to avoid this error?

echarlaix

Thanks for the addition @popovaan

echarlaix · 2026-05-19T17:17:31Z

+def patch_batched_mm(patcher):
+    from transformers.integrations.moe import ALL_EXPERTS_FUNCTIONS
+
+    patcher.original_gpt_oss_forward = ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"]
+    ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"] = batched_mm_experts_forward_patched
+    patcher._model.set_experts_implementation("batched_mm")
+
+
+def set_original_batched_mm(patcher):
+    from transformers.integrations.moe import ALL_EXPERTS_FUNCTIONS
+
+    ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"] = patcher.original_gpt_oss_forward


why not

ALL_EXPERTS_FUNCTIONS.register("ov_batched_mm", batched_mm_experts_forward_patched)

When I set it like this:

ALL_EXPERTS_FUNCTIONS.register("ov_batched_mm", batched_mm_experts_forward_patched) patcher._model.set_experts_implementation("ov_batched_mm")

It results in error:

ValueError: Specified `experts_implementation="ov_batched_mm"` is not supported. The only possible arguments are `experts_implementation="eager"`, `"experts_implementation=grouped_mm"` and `"experts_implementation=batched_mm"`.

@popovaan could it come from https://github.com/huggingface/transformers/blob/v5.0.0/src/transformers/modeling_utils.py#L1932 where possible experts implementation are set to ["eager", "grouped_mm", "batched_mm"] when we could check ALL_EXPERTS_FUNCTIONS instead ? cc @IlyasMoutawwakil

Yes, the error comes from this line.

what do you think of patching get_correct_experts_implementation ?

Yes, that worked. Done.

thanks a lot

hi yes i hardcoded the list of supported impls long ago and it only got fixed / became dynamically checked in last transformers version 🥲

was fixed in huggingface/transformers#45577

echarlaix · 2026-05-19T17:17:56Z

+            from transformers.models.mixtral.modeling_mixtral import MixtralExperts
+
+            self.original_moe_forward = MixtralExperts.forward
+            MixtralExperts.forward = lfm2_moe_experts_forward


Suggested change

from transformers.models.mixtral.modeling_mixtral import MixtralExperts

self.original_moe_forward = MixtralExperts.forward

MixtralExperts.forward = lfm2_moe_experts_forward

self._model.set_experts_implementation("ov_batched_mm")

Setting custom MoE implementation patcher._model.set_experts_implementation("ov_batched_mm") doesn't work, as I mentioned in the previous comment it fails with this error:

ValueError: Specified `experts_implementation="ov_batched_mm"` is not supported. The only possible arguments are `experts_implementation="eager"`, `"experts_implementation=grouped_mm"` and `"experts_implementation=batched_mm"`.

Looks like it's due to this check that expects specifically one of ["eager", "grouped_mm", "batched_mm"]: https://github.com/huggingface/transformers/blob/08810b1e278938278c50153ee1edfd7a20a759da/src/transformers/modeling_utils.py#L1932

popovaan · 2026-05-20T10:35:03Z

@regisss could you please review?

echarlaix · 2026-05-20T14:41:10Z

+            from transformers.models.phimoe.modeling_phimoe import PhimoeExperts
+
+            self.original_moe_forward = PhimoeExperts.forward
+            PhimoeExperts.forward = lfm2_moe_experts_forward


shouldn't we do the same here ? (use ov_batched_mm or similar)

can be done in a following PR

echarlaix

Thanks a lot for iterating @popovaan

popovaan added 6 commits May 15, 2026 17:48

Change MoE patching for gpt_oss.

6ce30d7

Remove not needed changes.

a21f0b2

Remove not needed changes.

93f1ab0

Minor fix.

73328d0

Fix qwen3_moe.

041f524

Code style.

c3fb666

popovaan changed the title ~~Fix GPT-OSS MoE patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation.~~ Fix MoE models patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation. May 18, 2026

popovaan requested a review from rkazants May 18, 2026 09:32

popovaan marked this pull request as ready for review May 18, 2026 09:32

Fix other topologies.

55b404f

Skip gpt_oss export test.

abbd714

rkazants approved these changes May 19, 2026

View reviewed changes

rkazants requested review from IlyasMoutawwakil and echarlaix May 19, 2026 07:05

echarlaix reviewed May 19, 2026

View reviewed changes

popovaan requested a review from echarlaix May 20, 2026 07:54

popovaan added 2 commits May 20, 2026 10:10

Comment.

c6f4f32

Merge remote-tracking branch 'upstream/main' into gpt_oss_bug

4467952

popovaan requested a review from regisss May 20, 2026 10:22

echarlaix reviewed May 20, 2026

View reviewed changes

Comment thread optimum/exporters/openvino/model_patcher.py Outdated

echarlaix reviewed May 20, 2026

View reviewed changes

Comment thread optimum/exporters/openvino/model_patcher.py Outdated

Patch get_correct_experts_implementation.

6f864d2

echarlaix reviewed May 20, 2026

View reviewed changes

echarlaix approved these changes May 20, 2026

View reviewed changes

popovaan added 3 commits May 20, 2026 16:43

Code style.

024c549

Add transformers version check.

a596014

Remove unpatching.

533528b

rkazants merged commit 140e49a into huggingface:main May 20, 2026
49 of 51 checks passed

Conversation

popovaan commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented May 15, 2026

Uh oh!

popovaan commented May 18, 2026

Uh oh!

IlyasMoutawwakil commented May 18, 2026

Uh oh!

rkazants commented May 19, 2026

Uh oh!

popovaan commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

popovaan commented May 19, 2026

Uh oh!

echarlaix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

popovaan May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

popovaan commented May 20, 2026

Uh oh!

Uh oh!

Uh oh!

echarlaix May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

echarlaix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

popovaan commented May 15, 2026 •

edited

Loading

popovaan commented May 19, 2026 •

edited

Loading

popovaan May 20, 2026 •

edited

Loading

echarlaix May 20, 2026 •

edited

Loading