optimize group_norm for ASCEND_NPU#1154
Conversation
1. Main Optimization Points and Analysis1.1 Forward Pass: Single Kernel → Two‑Kernel SplitOriginal Version
Optimized Version
Why This Optimization
1.2 Adaptive Block Size SelectionOriginal Version
Optimized Version
Why This Optimization
1.3 Small‑Size Fusion Path (Single‑task Kernel)
Why This Optimization
1.4 Parameter Gradient Computation: Atomic Operations → Explicit Partial ReductionOriginal Version
Optimized Version
Why This Optimization
1.5 Parallel Strategy AdjustmentOriginal Version
Optimized Version
Why This Optimization
|
c9d43b2 to
f9b96e4
Compare
a1b9562 to
3719207
Compare
|
@Tcc0403 , could you help review my code? |
8872b43 to
99b4e13
Compare



Summary
Testing Done
make testto ensure correctnessmake checkstyleto ensure code stylemake test-convergenceto ensure convergence