feat(ascend): add 9 Ascend operator kernels#47
Open
zhangyue207 wants to merge 14 commits intomasterfrom
Open
Conversation
bf9e4b1 to
7398f9f
Compare
added 11 commits
April 15, 2026 13:34
- Add AclTensorCache for descriptor reuse across operator calls - Rename ToAclDtype/IsIntegerDtype to toAclDtype/isIntegerDtype (camelCase) - Extend WorkspacePool with multi-slot support and capture-mode assertion - Optimize Gemm kernel with executor/scalar caching - Add CacheKey hash support for operator instance caching - Fix generate_wrappers.py argument ordering and format - Rename skip_unsupported_dtypes fixture, add get_npu_stream utility
Add base classes: Cast, Cat, Linear, Matmul (replaces MatMul), Mul, PagedAttention, SiluAndMul. Rename AddRmsNorm params to match CANN convention (x1/x2/gamma/y_out/x_out). Remove verbose doc comments from FlashAttention, ReshapeAndCache, RotaryEmbedding base classes (implementation details belong in kernels).
Add ACLNN-based implementations for: Add, Cast, Cat, CausalSoftmax, FlashAttention, Linear, Matmul, Mul, RmsNorm, RotaryEmbedding, ReshapeAndCache (+ v2), Swiglu, SiluAndMul. All kernels use AclTensorCache for descriptor reuse and WorkspacePool for device memory management. Executor instances are cached with aclSetAclOpExecutorRepeatable for repeat dispatch.
Add alternative implementations with registries: - AddRmsNorm: decomposed (0), fused aclnnAddRmsNorm (1), custom AscendC (2) - RmsNorm: ACLNN (0), custom AscendC (1) - RotaryEmbedding: ACLNN (0), ATB Rope (1) - ReshapeAndCache: ACLNN (0), ScatterPaKvCache (1), ATB (2) - Swiglu: decomposed (0), fused aclnnSwiGlu (1) - SiluAndMul: fused aclnnSwiGlu (0), registry (1) - PagedAttention: ATB (0)
Standalone AscendC kernel project with CMake build system. Includes op_host tiling, op_kernel device code, precision tests, and msprof benchmarks for both operators.
Add new tests: Cast, Cat, E2E Layer, FlashAttention, Linear, Matmul, Mul, PagedAttention, ReshapeAndCache, RotaryEmbedding, SiluAndMul. Update existing tests with NPU stream handling and Ascend-specific parametrization.
- C1: auto-format all C++ files with clang-format (25 files) - C4: lowercase assert messages, remove trailing periods (10 messages) - G4: backtick-fence identifiers in comments (causal_softmax) - P5: add blank lines before return statements (generate_wrappers.py)
- C4: lowercase assert message starts (workspace_pool_, rms_norm, rotary_embedding) - C4: remove trailing period from workspace_pool_ assert - C9: add blank line between SlotKey struct members - G4: backtick-fence identifiers in comments across 12 files - G4: backtick-fence identifiers in assert messages (flash_attention, rotary_embedding) - P1: remove duplicate `import re` in generate_wrappers.py - P4: add blank lines around control flow in test_flash_attention.py
- C4: lowercase "rope" in ATB assert messages - G4: backtick-fence `VariantPack`, `rotaryCoeff`, `sparseMode`, `hostData` - G4: backtick-fence identifiers in Python test comments - P4: add blank line before `if` in test_rms_norm_precision.py
3f43d57 to
be48553
Compare
added 3 commits
April 15, 2026 15:08
… loading - Delete `test_rms_norm_precision.py` (duplicate of `tests/test_rms_norm.py`) - Delete `run_rms_norm_precision_report.py` (another copy with hardcoded path) - Unify `test_add_rms_norm.py` to use `import ascend_kernel` instead of ctypes manual loading
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add, RmsNorm, Swiglu, Matmul, CausalSoftmax, AddRmsNorm,ReshapeAndCache, RotaryEmbedding, FlashAttention.