Skip to content

Commit a742d4b

Browse files
committed
Enalbe fused softmax/sigmoid + topk path for 1024 experts
Per measuring, the fused path delivers better performance when the number of experts is 1024. 1 token + 1024 experts: average uplift ~3% 64 tokens + 1024 experts: average uplift ~6% 128 tokens + 1024 experts: average uplift ~7% 256 tokens + 1024 experts: average uplift ~45% Current MoE models do not yet support as many as 1024 experts. However, when customers compare performance at 1024 experts, this optimization can provide better performance metrics. Signed-off-by: LiJianyu <jianyu.li@intel.com>
1 parent 3c03f84 commit a742d4b

1 file changed

Lines changed: 4 additions & 0 deletions

File tree

csrc/moe/topk.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -738,6 +738,10 @@ void topk_gating_kernel_launcher(
738738
LAUNCH_TOPK(
739739
512, WARPS_PER_TB, BYTES_PER_LDG_POWER_OF_2, ScoringFuncParam);
740740
break;
741+
case 1024:
742+
LAUNCH_TOPK(
743+
1024, WARPS_PER_TB, BYTES_PER_LDG_POWER_OF_2, ScoringFuncParam);
744+
break;
741745
case 192:
742746
LAUNCH_TOPK(
743747
192, WARPS_PER_TB, BYTES_PER_LDG_MULTIPLE_64, ScoringFuncParam);

0 commit comments

Comments
 (0)