Fix MoE models patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation.#1741
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@rkazants @echarlaix @IlyasMoutawwakil please review. |
|
hi can you please explain the reason for the rewrite ? i assume from the title that its a specific pattern for openvino op fusion ? |
|
Hi @echarlaix, @IlyasMoutawwakil, Can you please review it? This is quite urgent fix. Thanks, |
Hi! Yes, this is exactly the reason for the rewrite. OpenVINO MoE optimizations expect a specific MoE pattern which is fused in a optimized operation, otherwise we have a quite large performance regression.
Sure, I will use this approach. |
|
@IlyasMoutawwakil What is the correct way to register the new MoE impl to avoid this error? |
| def patch_batched_mm(patcher): | ||
| from transformers.integrations.moe import ALL_EXPERTS_FUNCTIONS | ||
|
|
||
| patcher.original_gpt_oss_forward = ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"] | ||
| ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"] = batched_mm_experts_forward_patched | ||
| patcher._model.set_experts_implementation("batched_mm") | ||
|
|
||
|
|
||
| def set_original_batched_mm(patcher): | ||
| from transformers.integrations.moe import ALL_EXPERTS_FUNCTIONS | ||
|
|
||
| ALL_EXPERTS_FUNCTIONS._global_mapping["batched_mm"] = patcher.original_gpt_oss_forward |
There was a problem hiding this comment.
why not
ALL_EXPERTS_FUNCTIONS.register("ov_batched_mm", batched_mm_experts_forward_patched)
There was a problem hiding this comment.
When I set it like this:
ALL_EXPERTS_FUNCTIONS.register("ov_batched_mm", batched_mm_experts_forward_patched)
patcher._model.set_experts_implementation("ov_batched_mm")
It results in error:
ValueError: Specified `experts_implementation="ov_batched_mm"` is not supported. The only possible arguments are `experts_implementation="eager"`, `"experts_implementation=grouped_mm"` and `"experts_implementation=batched_mm"`.
There was a problem hiding this comment.
@popovaan could it come from https://github.com/huggingface/transformers/blob/v5.0.0/src/transformers/modeling_utils.py#L1932 where possible experts implementation are set to ["eager", "grouped_mm", "batched_mm"] when we could check ALL_EXPERTS_FUNCTIONS instead ? cc @IlyasMoutawwakil
There was a problem hiding this comment.
Yes, the error comes from this line.
There was a problem hiding this comment.
what do you think of patching get_correct_experts_implementation ?
There was a problem hiding this comment.
Yes, that worked. Done.
There was a problem hiding this comment.
hi yes i hardcoded the list of supported impls long ago and it only got fixed / became dynamically checked in last transformers version 🥲
| from transformers.models.mixtral.modeling_mixtral import MixtralExperts | ||
|
|
||
| self.original_moe_forward = MixtralExperts.forward | ||
| MixtralExperts.forward = lfm2_moe_experts_forward |
There was a problem hiding this comment.
| from transformers.models.mixtral.modeling_mixtral import MixtralExperts | |
| self.original_moe_forward = MixtralExperts.forward | |
| MixtralExperts.forward = lfm2_moe_experts_forward | |
| self._model.set_experts_implementation("ov_batched_mm") |
There was a problem hiding this comment.
Setting custom MoE implementation patcher._model.set_experts_implementation("ov_batched_mm") doesn't work, as I mentioned in the previous comment it fails with this error:
ValueError: Specified `experts_implementation="ov_batched_mm"` is not supported. The only possible arguments are `experts_implementation="eager"`, `"experts_implementation=grouped_mm"` and `"experts_implementation=batched_mm"`.
Looks like it's due to this check that expects specifically one of ["eager", "grouped_mm", "batched_mm"]: https://github.com/huggingface/transformers/blob/08810b1e278938278c50153ee1edfd7a20a759da/src/transformers/modeling_utils.py#L1932
|
@regisss could you please review? |
| from transformers.models.phimoe.modeling_phimoe import PhimoeExperts | ||
|
|
||
| self.original_moe_forward = PhimoeExperts.forward | ||
| PhimoeExperts.forward = lfm2_moe_experts_forward |
There was a problem hiding this comment.
shouldn't we do the same here ? (use ov_batched_mm or similar)
There was a problem hiding this comment.
can be done in a following PR
What does this PR do?
Fixes CVS-186759
Before submitting