Skip to content

[WIP]add Prefill causal conv1d#418

Open
AndyKong2020 wants to merge 11 commits intosgl-project:mainfrom
AndyKong2020:prefill-causal-conv1d-v2
Open

[WIP]add Prefill causal conv1d#418
AndyKong2020 wants to merge 11 commits intosgl-project:mainfrom
AndyKong2020:prefill-causal-conv1d-v2

Conversation

@AndyKong2020
Copy link
Copy Markdown
Contributor

No description provided.

AndyKong2020 and others added 11 commits March 24, 2026 09:16
- Add conv_state update verification with 100% exact match
- Update tolerance standard to match ops-transformer (atol=1e-2, rtol=1e-3)
- Add medians statistic and detailed precision summary
- Test results: 95.14% output elements match, 100% state elements match
- Change bias, num_accepted_tokens, query_start_loc to optional parameters
- Add wrapper to handle None values by converting to empty tensors
- Update test to use simplified call without empty tensors

Now users can call:
  torch.ops.npu.causal_conv1d_update(x, weight, conv_state, ...)
instead of requiring empty tensors for unused parameters.
- Implement tiling caching mechanism similar to lightning_indexer
- Add CausalConv1dUpdateTilingKey struct and hash function
- Support up to 256 cached tiling configurations (MAX_CAPTURE_NUM)
- Add graph mode detection via TORCH_NPU_COMPILE_ENABLE env var
- Split tiling data population into separate populate_tiling_data function
- Use at::from_blob to reference cached tiling in graph mode

This fixes tiling issues in graph mode when using torch.compile
by reusing cached tiling data instead of creating new tensors
for each execution, which is incompatible with graph capture.
- Replace x.is_npu() with x.device().type() == c10::DeviceType::PrivateUse1
- is_npu() method is not available in ATen C++ API
- Use proper device type comparison for NPU detection
- Compilation now succeeds with build.sh -a kernels
Increased the maximum capture number from 256 to 1024 to align with lightning_indexer. Removed unused global tiling buffer and refactored tiling data population and retrieval logic for improved performance.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant