-
Notifications
You must be signed in to change notification settings - Fork 386
Pull requests: ROCm/aiter
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
perf: eliminate end_sync in custom allreduce by delaying input tensor release until next sync
#4084
opened Jul 5, 2026 by
jpy794
Loading…
1 task done
fix: synchronize custom collectives before return
#4082
opened Jul 5, 2026 by
jpy794
Loading…
1 task done
process group timeout from 600s to 1200s
#4081
opened Jul 5, 2026 by
benenzhu
Contributor
Loading…
1 task
[OPUS] Absorb module_rmsnorm_quant into the opus rmsnorm module
#4080
opened Jul 4, 2026 by
carlushuang
Collaborator
Loading…
[opus] backport #4056: gate TDM/named-barrier on clang>=22 for ROCm 7.0/7.1 (release/v0.1.17)
#4078
opened Jul 4, 2026 by
carlushuang
Collaborator
Loading…
[fmoe][gfx950] Add GLM-5.2 FP8 MXFP8 (per_1x32) MoE tuned configs
#4074
opened Jul 3, 2026 by
zejunchen-zejun
Contributor
Loading…
1 task
[gfx1250][optimization] qk norm rope quant optimization for gfx1250
#4073
opened Jul 3, 2026 by
jli-melchior
Contributor
Loading…
1 task
[Bugfix][Build] Grouped MoE build should respect GPU_ARCHS
#4072
opened Jul 3, 2026 by
simondanielsson
Loading…
1 task
fix max_fp8 from 240 to 448 for gfx950
#4070
opened Jul 3, 2026 by
fangche123
Contributor
Loading…
1 task
bf16 asm mha: enable doubleq and kv reverse to improve perf
#4068
opened Jul 3, 2026 by
tingchen988
Contributor
Loading…
1 task
feat(attention): head-dim-tiled Triton flash attention for ViT (gfx1151)
#4065
opened Jul 2, 2026 by
carlushuang
Collaborator
Loading…
docs(python): condense verbose comments (comments-only, no code change)
#4062
opened Jul 2, 2026 by
carlushuang
Collaborator
•
Draft
docs(csrc): condense verbose comments (comments-only, no code change)
#4061
opened Jul 2, 2026 by
carlushuang
Collaborator
•
Draft
[OPUS] RMSNorm backend using opus to reduce compile time (and keep feature / performance)
ci:all
#4059
opened Jul 2, 2026 by
carlushuang
Collaborator
Loading…
[Triton][GDN] Add in-place state scatter + h output to VK chunk
ci:mi300x
Run MI300X standard and OPUS CI on PRs
ci:sglang
#4058
opened Jul 2, 2026 by
hsthe29
Loading…
[Triton][GDN] Support V-major (hvk) state layout in decode kernel
#4057
opened Jul 2, 2026 by
hsthe29
Loading…
[MoE] Optimize Qwen3.5-397B PTPC FP8 MoE performance for batch sizes 64 and 128
#4053
opened Jul 2, 2026 by
apinge
Loading…
1 task
Gluon Fused Dynamic mxfp4 Quant Moe Sort for gfx1250
#4049
opened Jul 1, 2026 by
amd-jrosas
Loading…
1 task done
[triton] Optimized Unified Attention for Gemma-4-31b
#4044
opened Jul 1, 2026 by
a-sidorova
Loading…
1 task
[Perf] opt deepseek v4 fp8 quant (fused compress attn and fused_qk_norm_rope_group_quant)
#4043
opened Jul 1, 2026 by
yzhou103
Contributor
Loading…
1 of 3 tasks
[FlyDSL] jagged_dense_bmm_broadcast_add (jdbba)
#4042
opened Jul 1, 2026 by
anhminhnguyenhoang
Contributor
•
Draft
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.