Skip to content

add dispatch_ffn_combine_bf16 kernel for deepep#410

Open
zuje123 wants to merge 6 commits into
sgl-project:mainfrom
zuje123:br_kernel_dispatch_ffn_combine_bf16
Open

add dispatch_ffn_combine_bf16 kernel for deepep#410
zuje123 wants to merge 6 commits into
sgl-project:mainfrom
zuje123:br_kernel_dispatch_ffn_combine_bf16

Conversation

@zuje123
Copy link
Copy Markdown
Collaborator

@zuje123 zuje123 commented Mar 27, 2026

DeepEp supports dispatch_ffn_combine_bf16, which designed to enhance the performance of certain deep learning models by fusing operations and without quant.

Based on dispatch_ffn_combine_bf16, the following adaptations were made:

  1. Changed the parameter types of weight1, weight2, scale1, and scale2 from TensorList to Tensor.
  2. Set cce-auto-sync=off and resolved synchronization issues.
  3. Fixed the issue where moe_init_routing_v2 encountered the error "VEC supports illegal configurations in commands" when bs=0.
  4. Add validation for HCCL_BUFFSIZE.

Test with Qwen3-30B-A3B:
precision_test

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant