Skip to content

Commit a6ebe8a

Browse files
committed
Runtime dispatch: recurrent (T=1) vs chunked (T>1) inside triton_op
Move decode/prefill dispatch inside the chunk_gated_delta_rule triton_op instead of using torch.cond at model level. This follows the same pattern as the SDPA triton_op (pow2/non-pow2 dispatch) and avoids torch.cond incompatibility with AOTI's FunctionalTensor pipeline. Changes: - chunk_gated_delta_rule.py: Add fused recurrent Triton kernel for T=1, refactor chunked pipeline into _launch_chunked(), dispatch via Python if inside the @triton_op wrapper - model.py: Remove torch.cond from GatedDeltaNet.forward(), call triton_op directly (dispatch is internal) - export.py: Single-method export with dynamic seq_len dim - main.cpp: Fix create_text_llm_runner API signature
1 parent 5465d8b commit a6ebe8a

File tree

4 files changed

+272
-202
lines changed

4 files changed

+272
-202
lines changed

0 commit comments

Comments
 (0)