Skip to content

[LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::OnlineAttentionOp#23292

Merged
keshavvinayak01 merged 7 commits into
mainfrom
personal/users/keshavvinayak01/linalgext-torch-rewrite-flexattention
Apr 22, 2026
Merged

[LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::OnlineAttentionOp#23292
keshavvinayak01 merged 7 commits into
mainfrom
personal/users/keshavvinayak01/linalgext-torch-rewrite-flexattention

Conversation

@keshavvinayak01
Copy link
Copy Markdown
Contributor

@keshavvinayak01 keshavvinayak01 commented Jan 27, 2026

Following the discussion from #22441.

I ran the entire flex_attention_hop implementation with randomised input tensors, (Also see llvm/torch-mlir#4366) through aot.export and compared against eager mode, and I noticed no accuracy losses (On CPU)

Test: Torch ops test PR

@MaheshRavishankar
Copy link
Copy Markdown
Collaborator

@keshavvinayak01 moving PRs around makes it hard to track what is new and what has been up for a while. It is disruptive for reviewers. Can we keep this a bit more stable?

@keshavvinayak01
Copy link
Copy Markdown
Contributor Author

@Groverkss Could you close reviews on this?

@keshavvinayak01 keshavvinayak01 marked this pull request as draft April 8, 2026 23:50
Convert torch.hop_flex_attention -> iree_linalg_ext.online_attention
with inlined score/mask modification functions. The mask_mod and
score_mod function bodies are inlined directly into the score
modification region (no func.call, no separate mask tensor), enabling
fusion during attention decomposition and proper tiling.

Also fixes:
- IndexOp verifier to accept OnlineAttentionOp as parent
- OnlineAttentionOp::build scale/mask parameter swap
- applyPostQKMatmulElementwise to convert iree_linalg_ext.index ->
  linalg.index when cloning the score region during decomposition

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@keshavvinayak01 keshavvinayak01 force-pushed the personal/users/keshavvinayak01/linalgext-torch-rewrite-flexattention branch from 507e0ef to 8efbb1d Compare April 9, 2026 18:46
keshavvinayak01 and others added 4 commits April 9, 2026 18:59
Compute scale = rsqrt(head_dim) at runtime via tensor.dim + math.rsqrt
when the scale is not a constant float, instead of requiring a static
head dimension.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@keshavvinayak01 keshavvinayak01 changed the title [LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::AttentionOp [LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::OnlineAttentionOp Apr 10, 2026
@keshavvinayak01 keshavvinayak01 marked this pull request as ready for review April 10, 2026 00:15
Copy link
Copy Markdown
Contributor Author

@keshavvinayak01 keshavvinayak01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-opening with the rewriter converting to online_attention directly instead of AttentionOp. We might also use this torch op in fusilli cc @sjain-stanford so pulling you in for reviews.

cc @MaheshRavishankar @Groverkss

Copy link
Copy Markdown
Collaborator

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks fine to me.

@rsuderman can you review this PR and the follow ups on this if no one gets to it.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@keshavvinayak01 keshavvinayak01 merged commit 4bd3742 into main Apr 22, 2026
64 of 66 checks passed
@keshavvinayak01 keshavvinayak01 deleted the personal/users/keshavvinayak01/linalgext-torch-rewrite-flexattention branch April 22, 2026 16:33
benvanik pushed a commit that referenced this pull request Apr 24, 2026
…xt::OnlineAttentionOp (#23292)

Rewriter pattern for torch.hop_flex_attention -> iree_linalg_ext.online_attention

I ran the entire flex_attention_hop implementation with randomised input
tensors, (Also see llvm/torch-mlir#4366) through
aot.export and compared against eager mode, and I noticed no accuracy
losses (On CPU)

Test: [Torch ops test PR
](iree-org/iree-test-suites#149)

---------

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jerryyin pushed a commit that referenced this pull request May 7, 2026
…xt::OnlineAttentionOp (#23292)

Rewriter pattern for torch.hop_flex_attention -> iree_linalg_ext.online_attention

I ran the entire flex_attention_hop implementation with randomised input
tensors, (Also see llvm/torch-mlir#4366) through
aot.export and compared against eager mode, and I noticed no accuracy
losses (On CPU)

Test: [Torch ops test PR
](iree-org/iree-test-suites#149)

---------

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
RattataKing pushed a commit to RattataKing/iree that referenced this pull request May 27, 2026
…xt::OnlineAttentionOp (iree-org#23292)

Rewriter pattern for torch.hop_flex_attention -> iree_linalg_ext.online_attention

I ran the entire flex_attention_hop implementation with randomised input
tensors, (Also see llvm/torch-mlir#4366) through
aot.export and compared against eager mode, and I noticed no accuracy
losses (On CPU)

Test: [Torch ops test PR
](iree-org/iree-test-suites#149)

---------

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants