Skip to content

Commit 18665da

Browse files
authored
[Cherry-Pick][Op][Optimization]Kernel fusion: cast+sigmoid+bias+noauxtc(#7777) (#7832)
* [Op][Optimization]Kernel fusion: cast+sigmoid+bias+noauxtc (#7777) [Cherry-Pick] * Kernel fusion for blackwell and deepgemm backend in non-EPLB scenarios * fix hard code in ep.py
1 parent 88a7479 commit 18665da

18 files changed

Lines changed: 1386 additions & 15 deletions

custom_ops/gpu_ops/cpp_extensions.cc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -693,6 +693,15 @@ std::vector<paddle::Tensor> NoauxTc(paddle::Tensor& scores,
693693
bool renormalize,
694694
float routed_scaling_factor);
695695

696+
std::vector<paddle::Tensor> grouped_topk(
697+
paddle::Tensor& gating_output,
698+
paddle::Tensor& e_score_correction_bias,
699+
int n_group,
700+
int topk_group,
701+
int topk,
702+
bool renormalize,
703+
float routed_scaling_factor);
704+
696705
std::vector<paddle::Tensor> NoauxTcRedundant(
697706
paddle::Tensor& scores,
698707
paddle::Tensor& scores_with_bias,
@@ -1696,6 +1705,8 @@ PYBIND11_MODULE(fastdeploy_ops, m) {
16961705

16971706
m.def("noaux_tc", &NoauxTc, "noaux_tc for Deepseekv3 MoE compute");
16981707

1708+
m.def("grouped_topk", &grouped_topk, "fused grouped topk for MoE routing");
1709+
16991710
m.def("noaux_tc_redundant",
17001711
&NoauxTcRedundant,
17011712
"noaux_tc_redundant for MoE compute");

0 commit comments

Comments
 (0)