Update gluon by fsx950223 · Pull Request #3114 · ROCm/aiter

fsx950223 · 2026-05-11T03:40:50Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: fsx950223 <fsx950223@outlook.com> Made-with: Cursor

Signed-off-by: fsx950223 <fsx950223@outlook.com> Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-11T03:40:59Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3114 --add-label <label>

Copilot

Pull request overview

This PR updates the Gluon-based paged-attention decode implementation, including AOT compilation metadata and PS (partitioned softmax) reduction paths, to better support FP8 workflows and configurable output/temporary dtypes.

Changes:

Add OUTPUT_DTYPE constexpr plumbed through Gluon decode kernels and the wrapper, and adjust temporary output dtype defaults for FP8 queries.
Update PS reduce FlyDSL kernel codegen to newer FlyDSL APIs (fx.*, const_expr) and adjust reduction fallback ordering.
Extend Gluon AOT compile signature/function-keying to include output and temporary output dtypes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
op_tests/triton_tests/test_pa_decode_gluon.py	Runs more local test entrypoints when invoked as a script (`__main__`).
csrc/cpp_itfs/pa_gluon_aot/pa_decode_gluon_aot.py	Adds output/temporary dtype specialization to AOT compile signature and naming.
aiter/ops/triton/gluon/pa_decode_gluon.py	Adds `OUTPUT_DTYPE` to kernels/wrappers, updates FP8 query handling, and refactors PS reduce fallback logic/FlyDSL kernel.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -5166,8 +5146,34 @@ def _paged_attention_decode_v2_reduce_kernel_wrapper(
                head_size=head_size,
                context_partition_num=context_partition_num,
            )
-            return
-        except ImportError:
+        except Exception:
+            pass
+        try:
+            if CXX_PS_REDUCE_AVAILABLE:


fsx950223 and others added 9 commits April 29, 2026 08:14

store

81ab1ee

fix(pa): Preserve fp32 temporary decode output

6d9cfaf

Signed-off-by: fsx950223 <fsx950223@outlook.com> Made-with: Cursor

update unit test

b93afff

fix(pa): Pass decode output dtype explicitly

8a5b37f

Signed-off-by: fsx950223 <fsx950223@outlook.com> Made-with: Cursor

fix(pa): Use bf16 temporary output for fp8 decode

3643d1c

Signed-off-by: fsx950223 <fsx950223@outlook.com> Made-with: Cursor

Merge branch 'main' into update_gluon

b5d955f

fix head size 64 query load issue

87f890b

fix aot

7d3336b

fix(gluon): FP8 query quant, PS kernel names, FlyDSL reduce

196a2f8

Signed-off-by: fsx950223 <fsx950223@outlook.com> Co-authored-by: Cursor <cursoragent@cursor.com>

fsx950223 requested review from a team and Copilot May 11, 2026 03:40

fsx950223 added the ci:atom label May 11, 2026

Copilot started reviewing on behalf of fsx950223 May 11, 2026 03:41 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Merge branch 'main' into update_gluon

eddb579

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update gluon#3114

Update gluon#3114
fsx950223 wants to merge 10 commits intomainfrom
update_gluon

fsx950223 commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fsx950223 commented May 11, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented May 11, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants