Attention Perf: Transpose blocked K right before QK instead of pre-transposing before the kernel by AmesingFlank · Pull Request #2374 · pytorch/helion

AmesingFlank · 2026-05-09T03:42:31Z

Optimization found by claude, by comparing the current Helion kernel with this reference impl

Previously, The Helion-generated kernel reshapes and transposes K before the kernel: k_view = k_in.reshape([-1, n_dim, head_dim]).transpose(0, 2, 1) producing shape [B*H, D, S]. And then during matmul (jax.dot_general) it then uses a standard contraction dimension_numbers=(((2,), (1,)), ((0,), (0,))).

This causes sub-optimal DMA patterns because the transposed K layout in HBM has non-contiguous memory access for the pipeline's sequential block reads along the sequence dimension.

The PR modifies the kernel by keeping K contiguous, and only transposes right before the matmul. On top of the previous PR (#2373) this improves the TFLOPs from 633 to 652

…ansposing before the kernel stack-info: PR: #2374, branch: AmesingFlank/stack/50

AmesingFlank added a commit that referenced this pull request May 9, 2026

Attention Perf: Transpose blocked K right before QK instead of pre-tr…

4577603

…ansposing before the kernel stack-info: PR: #2374, branch: AmesingFlank/stack/50

AmesingFlank force-pushed the AmesingFlank/stack/50 branch from 0bdb0b9 to 4577603 Compare May 9, 2026 03:42

AmesingFlank force-pushed the AmesingFlank/stack/49 branch from 1b0111f to cdfb1ce Compare May 9, 2026 03:42

AmesingFlank mentioned this pull request May 9, 2026

Attention Perf: Multiply Q in-loop to avoid memory spillage #2373

Merged

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 9, 2026

AmesingFlank marked this pull request as draft May 9, 2026 04:08

AmesingFlank changed the base branch from AmesingFlank/stack/49 to main May 9, 2026 04:08

AmesingFlank force-pushed the AmesingFlank/stack/50 branch from 4577603 to a2abca6 Compare May 9, 2026 04:08

AmesingFlank changed the base branch from main to AmesingFlank/stack/49 May 9, 2026 04:08

AmesingFlank marked this pull request as ready for review May 9, 2026 04:08

AmesingFlank added a commit that referenced this pull request May 9, 2026

Attention Perf: Transpose blocked K right before QK instead of pre-tr…

6ad5991

…ansposing before the kernel stack-info: PR: #2374, branch: AmesingFlank/stack/50

AmesingFlank marked this pull request as draft May 9, 2026 05:16

AmesingFlank changed the base branch from AmesingFlank/stack/49 to main May 9, 2026 05:16

AmesingFlank force-pushed the AmesingFlank/stack/50 branch from a2abca6 to 6ad5991 Compare May 9, 2026 05:16

AmesingFlank changed the base branch from main to AmesingFlank/stack/49 May 9, 2026 05:16

AmesingFlank marked this pull request as ready for review May 9, 2026 05:16

jansel approved these changes May 9, 2026

View reviewed changes

AmesingFlank changed the base branch from AmesingFlank/stack/49 to main May 11, 2026 16:44

AmesingFlank changed the base branch from main to AmesingFlank/stack/49 May 11, 2026 16:45

Attention Perf: Transpose blocked K right before QK instead of pre-tr…

f9e9453

…ansposing before the kernel stack-info: PR: #2374, branch: AmesingFlank/stack/50

AmesingFlank marked this pull request as draft May 11, 2026 16:45

AmesingFlank changed the base branch from AmesingFlank/stack/49 to main May 11, 2026 16:45

AmesingFlank force-pushed the AmesingFlank/stack/50 branch from 6ad5991 to f9e9453 Compare May 11, 2026 16:45

AmesingFlank marked this pull request as ready for review May 11, 2026 16:45

AmesingFlank merged commit 2e0e236 into main May 11, 2026
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention Perf: Transpose blocked K right before QK instead of pre-transposing before the kernel#2374

Attention Perf: Transpose blocked K right before QK instead of pre-transposing before the kernel#2374
AmesingFlank merged 1 commit into
mainfrom
AmesingFlank/stack/50

AmesingFlank commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AmesingFlank commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AmesingFlank commented May 9, 2026 •

edited

Loading