[WIP]add Prefill causal conv1d#418
Open
AndyKong2020 wants to merge 11 commits intosgl-project:mainfrom
Open
Conversation
- Add conv_state update verification with 100% exact match - Update tolerance standard to match ops-transformer (atol=1e-2, rtol=1e-3) - Add medians statistic and detailed precision summary - Test results: 95.14% output elements match, 100% state elements match
- Change bias, num_accepted_tokens, query_start_loc to optional parameters - Add wrapper to handle None values by converting to empty tensors - Update test to use simplified call without empty tensors Now users can call: torch.ops.npu.causal_conv1d_update(x, weight, conv_state, ...) instead of requiring empty tensors for unused parameters.
- Implement tiling caching mechanism similar to lightning_indexer - Add CausalConv1dUpdateTilingKey struct and hash function - Support up to 256 cached tiling configurations (MAX_CAPTURE_NUM) - Add graph mode detection via TORCH_NPU_COMPILE_ENABLE env var - Split tiling data population into separate populate_tiling_data function - Use at::from_blob to reference cached tiling in graph mode This fixes tiling issues in graph mode when using torch.compile by reusing cached tiling data instead of creating new tensors for each execution, which is incompatible with graph capture.
- Replace x.is_npu() with x.device().type() == c10::DeviceType::PrivateUse1 - is_npu() method is not available in ATen C++ API - Use proper device type comparison for NPU detection - Compilation now succeeds with build.sh -a kernels
Increased the maximum capture number from 256 to 1024 to align with lightning_indexer. Removed unused global tiling buffer and refactored tiling data population and retrieval logic for improved performance.
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.