[WIP]add Prefill causal conv1d by AndyKong2020 · Pull Request #418 · sgl-project/sgl-kernel-npu

AndyKong2020 · 2026-04-01T02:40:01Z

No description provided.

- Add conv_state update verification with 100% exact match - Update tolerance standard to match ops-transformer (atol=1e-2, rtol=1e-3) - Add medians statistic and detailed precision summary - Test results: 95.14% output elements match, 100% state elements match

- Change bias, num_accepted_tokens, query_start_loc to optional parameters - Add wrapper to handle None values by converting to empty tensors - Update test to use simplified call without empty tensors Now users can call: torch.ops.npu.causal_conv1d_update(x, weight, conv_state, ...) instead of requiring empty tensors for unused parameters.

- Implement tiling caching mechanism similar to lightning_indexer - Add CausalConv1dUpdateTilingKey struct and hash function - Support up to 256 cached tiling configurations (MAX_CAPTURE_NUM) - Add graph mode detection via TORCH_NPU_COMPILE_ENABLE env var - Split tiling data population into separate populate_tiling_data function - Use at::from_blob to reference cached tiling in graph mode This fixes tiling issues in graph mode when using torch.compile by reusing cached tiling data instead of creating new tensors for each execution, which is incompatible with graph capture.

- Replace x.is_npu() with x.device().type() == c10::DeviceType::PrivateUse1 - is_npu() method is not available in ATen C++ API - Use proper device type comparison for NPU detection - Compilation now succeeds with build.sh -a kernels

Increased the maximum capture number from 256 to 1024 to align with lightning_indexer. Removed unused global tiling buffer and refactored tiling data population and retrieval logic for improved performance.

gemini-code-assist · 2026-04-01T02:40:05Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

AndyKong2020 and others added 11 commits March 24, 2026 09:16

[]

998f8e2

Fix device detection for graph mode tiling cache

4554864

- Replace x.is_npu() with x.device().type() == c10::DeviceType::PrivateUse1 - is_npu() method is not available in ATen C++ API - Use proper device type comparison for NPU detection - Compilation now succeeds with build.sh -a kernels

[feat]Increase MAX_CAPTURE_NUM and refactor tiling logic

52cd6c8

Increased the maximum capture number from 256 to 1024 to align with lightning_indexer. Removed unused global tiling buffer and refactored tiling data population and retrieval logic for improved performance.

[rm]remove useless comments

5d284af

chore: stop tracking local env file

9a96268

fix: correct causal conv1d tiling split

3614c8d

feat: add prefill causal_conv1d low-level op

f05d583

fix: restore portable CANN path detection

e322278

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]add Prefill causal conv1d#418

[WIP]add Prefill causal conv1d#418
AndyKong2020 wants to merge 11 commits intosgl-project:mainfrom
AndyKong2020:prefill-causal-conv1d-v2

AndyKong2020 commented Apr 1, 2026

Uh oh!

gemini-code-assist Bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AndyKong2020 commented Apr 1, 2026

Uh oh!

gemini-code-assist Bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant