Skip to content

Commit 514ed5c

Browse files
authored
[Cherry-Pick][Op][Optimization]Kernel fusion: cast+sigmoid+bias+noauxtc(#7777) (#7818)
* [Op][Optimization]Kernel fusion: cast+sigmoid+bias+noauxtc (#7777) [Cherry-Pick][Op][Optimization]Kernel fusion: cast+sigmoid+bias+noauxtc (#7777) * Bug fixes and modifications to the fused kernel switch. * fix replicated env args
1 parent d71bdda commit 514ed5c

7 files changed

Lines changed: 1319 additions & 21 deletions

File tree

custom_ops/gpu_ops/cpp_extensions.cc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -691,6 +691,15 @@ std::vector<paddle::Tensor> NoauxTc(paddle::Tensor& scores,
691691
bool renormalize,
692692
float routed_scaling_factor);
693693

694+
std::vector<paddle::Tensor> grouped_topk(
695+
paddle::Tensor& gating_output,
696+
paddle::Tensor& e_score_correction_bias,
697+
int n_group,
698+
int topk_group,
699+
int topk,
700+
bool renormalize,
701+
float routed_scaling_factor);
702+
694703
std::vector<paddle::Tensor> FusedCastSigmoidBias(const paddle::Tensor& input,
695704
const paddle::Tensor& bias,
696705
std::string cast_type);
@@ -1704,6 +1713,8 @@ PYBIND11_MODULE(fastdeploy_ops, m) {
17041713

17051714
m.def("noaux_tc", &NoauxTc, "noaux_tc for Deepseekv3 MoE compute");
17061715

1716+
m.def("grouped_topk", &grouped_topk, "fused grouped topk for MoE routing");
1717+
17071718
m.def("fused_cast_sigmoid_bias",
17081719
&FusedCastSigmoidBias,
17091720
"Fused cast+sigmoid+bias for MoE gating scores",

0 commit comments

Comments
 (0)