Skip to content

add fp4 matmul kernels for deepseek v4 flash#867

Open
sywangyi wants to merge 5 commits into
huggingface:mainfrom
sywangyi:deepseek_v4_fp4
Open

add fp4 matmul kernels for deepseek v4 flash#867
sywangyi wants to merge 5 commits into
huggingface:mainfrom
sywangyi:deepseek_v4_fp4

Conversation

@sywangyi
Copy link
Copy Markdown
Contributor

verified in xpu.

sywangyi added 2 commits May 18, 2026 16:11
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@sywangyi sywangyi requested review from danieldk and drbh as code owners May 18, 2026 08:13
@sywangyi
Copy link
Copy Markdown
Contributor Author

@IlyasMoutawwakil please help review

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@IlyasMoutawwakil
Copy link
Copy Markdown
Member

shouldn't they be w4a8 ?

@sywangyi
Copy link
Copy Markdown
Contributor Author

sywangyi commented May 19, 2026

no, activation is 16bit since fp8_act_quant is not called in activation in fp4 kernels path

@sywangyi
Copy link
Copy Markdown
Contributor Author

To do FP8 × FP4, the activations cannot be used as-is. They must first be quantized, typically per-token or per-block, together with the corresponding scale factors. In the MoE grouped path, there is already additional overhead from routing, sorting, the activation function, and the second projection, so activation quantization, dequantization, and scale movement are not free. FP8 activations only provide a clear speedup when the backend fuses activation quantization and GEMM efficiently. DeepGEMM does that; this Triton fallback does not.

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

yes exactly and that's why we do it in the batched and grouped fp8 paths as well so why not in these fp4 ones 😅 for me this is not much of a choice but rather how to stay as close to the original dsv4 implementation.

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Comment thread finegrained-fp8/torch-ext/finegrained_fp8/batched.py Outdated
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants