Skip to content

Update gluon#3114

Open
fsx950223 wants to merge 10 commits intomainfrom
update_gluon
Open

Update gluon#3114
fsx950223 wants to merge 10 commits intomainfrom
update_gluon

Conversation

@fsx950223
Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

fsx950223 and others added 9 commits April 29, 2026 08:14
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Made-with: Cursor
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Made-with: Cursor
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Made-with: Cursor
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@fsx950223 fsx950223 requested review from a team and Copilot May 11, 2026 03:40
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3114 --add-label <label>

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Gluon-based paged-attention decode implementation, including AOT compilation metadata and PS (partitioned softmax) reduction paths, to better support FP8 workflows and configurable output/temporary dtypes.

Changes:

  • Add OUTPUT_DTYPE constexpr plumbed through Gluon decode kernels and the wrapper, and adjust temporary output dtype defaults for FP8 queries.
  • Update PS reduce FlyDSL kernel codegen to newer FlyDSL APIs (fx.*, const_expr) and adjust reduction fallback ordering.
  • Extend Gluon AOT compile signature/function-keying to include output and temporary output dtypes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
op_tests/triton_tests/test_pa_decode_gluon.py Runs more local test entrypoints when invoked as a script (__main__).
csrc/cpp_itfs/pa_gluon_aot/pa_decode_gluon_aot.py Adds output/temporary dtype specialization to AOT compile signature and naming.
aiter/ops/triton/gluon/pa_decode_gluon.py Adds OUTPUT_DTYPE to kernels/wrappers, updates FP8 query handling, and refactors PS reduce fallback logic/FlyDSL kernel.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 5125 to +5152
@@ -5166,8 +5146,34 @@ def _paged_attention_decode_v2_reduce_kernel_wrapper(
head_size=head_size,
context_partition_num=context_partition_num,
)
return
except ImportError:
except Exception:
pass
try:
if CXX_PS_REDUCE_AVAILABLE:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants