Commit e4831e4
authored
[NPU] Add NPU Fused MoE kernel (#1183)
## Motivation
This pr ports `fused_moe.py` and `fused_moe_kernels.py` to an NPU-affine
implementation while preserving the original math. The computational
definition is unchanged: forward remains `W1 (gate/up) -> SwiGLU -> W2
-> token-weighted gather`, and backward still follows `dA' = dO @ W2^T`
to produce `d_pre_act / dS / dW2 / dX / dW1`.
The main changes are execution-strategy optimizations for NPU.
## Note: Use the Skill
For this fused_moe kernel migration, we followed the skill document from
#1197.
## Testing Done
- Hardware Type: Ascend 910B2
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [x] run `make test-convergence` to ensure convergence
🤖 Generated with: [cursor](https://cursor.com/).1 parent dcd404b commit e4831e4
5 files changed
Lines changed: 1166 additions & 5 deletions
File tree
- benchmark/scripts
- src/liger_kernel/ops/backends/_ascend/ops
- test/transformers
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
160 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
161 | 168 | | |
162 | 169 | | |
163 | 170 | | |
| |||
231 | 238 | | |
232 | 239 | | |
233 | 240 | | |
234 | | - | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
235 | 250 | | |
236 | 251 | | |
237 | 252 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
| 36 | + | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| |||
149 | 151 | | |
150 | 152 | | |
151 | 153 | | |
| 154 | + | |
| 155 | + | |
152 | 156 | | |
153 | 157 | | |
154 | 158 | | |
| |||
0 commit comments