[LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::OnlineAttentionOp by keshavvinayak01 · Pull Request #23292 · iree-org/iree

keshavvinayak01 · 2026-01-27T06:55:47Z

Following the discussion from #22441.

I ran the entire flex_attention_hop implementation with randomised input tensors, (Also see llvm/torch-mlir#4366) through aot.export and compared against eager mode, and I noticed no accuracy losses (On CPU)

Test: Torch ops test PR

MaheshRavishankar · 2026-01-28T18:12:56Z

@keshavvinayak01 moving PRs around makes it hard to track what is new and what has been up for a while. It is disruptive for reviewers. Can we keep this a bit more stable?

keshavvinayak01 · 2026-03-30T22:40:47Z

@Groverkss Could you close reviews on this?

Convert torch.hop_flex_attention -> iree_linalg_ext.online_attention with inlined score/mask modification functions. The mask_mod and score_mod function bodies are inlined directly into the score modification region (no func.call, no separate mask tensor), enabling fusion during attention decomposition and proper tiling. Also fixes: - IndexOp verifier to accept OnlineAttentionOp as parent - OnlineAttentionOp::build scale/mask parameter swap - applyPostQKMatmulElementwise to convert iree_linalg_ext.index -> linalg.index when cloning the score region during decomposition Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Compute scale = rsqrt(head_dim) at runtime via tensor.dim + math.rsqrt when the scale is not a constant float, instead of requiring a static head dimension. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

keshavvinayak01

Re-opening with the rewriter converting to online_attention directly instead of AttentionOp. We might also use this torch op in fusilli cc @sjain-stanford so pulling you in for reviews.

cc @MaheshRavishankar @Groverkss

MaheshRavishankar

Overall looks fine to me.

@rsuderman can you review this PR and the follow ups on this if no one gets to it.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

…xt::OnlineAttentionOp (#23292) Rewriter pattern for torch.hop_flex_attention -> iree_linalg_ext.online_attention I ran the entire flex_attention_hop implementation with randomised input tensors, (Also see llvm/torch-mlir#4366) through aot.export and compared against eager mode, and I noticed no accuracy losses (On CPU) Test: [Torch ops test PR ](iree-org/iree-test-suites#149) --------- Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…xt::OnlineAttentionOp (iree-org#23292) Rewriter pattern for torch.hop_flex_attention -> iree_linalg_ext.online_attention I ran the entire flex_attention_hop implementation with randomised input tensors, (Also see llvm/torch-mlir#4366) through aot.export and compared against eager mode, and I noticed no accuracy losses (On CPU) Test: [Torch ops test PR ](iree-org/iree-test-suites#149) --------- Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Jan 27, 2026

[LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::AttentionOp #22769

Closed

Added generated test files for FlexAttention iree-org/iree-test-suites#149

Draft

keshavvinayak01 requested a review from Groverkss January 28, 2026 05:03

keshavvinayak01 marked this pull request as draft April 8, 2026 23:50

keshavvinayak01 force-pushed the personal/users/keshavvinayak01/linalgext-torch-rewrite-flexattention branch from 507e0ef to 8efbb1d Compare April 9, 2026 18:46

keshavvinayak01 and others added 4 commits April 9, 2026 18:59

Remove unnecessary DerefineOp wrapping for optional results

d9dd45e

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Simplify init fills and result type handling

74d0099

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Use !torch.none for unused optional results in tests

85d2e6b

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

keshavvinayak01 changed the title ~~[LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::AttentionOp~~ [LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::OnlineAttentionOp Apr 10, 2026

keshavvinayak01 marked this pull request as ready for review April 10, 2026 00:15

keshavvinayak01 requested review from IanWood1, MaheshRavishankar, Max191 and hanhanW as code owners April 10, 2026 00:16

keshavvinayak01 commented Apr 10, 2026

View reviewed changes

MaheshRavishankar approved these changes Apr 14, 2026

View reviewed changes

Comment thread compiler/plugins/input/Torch/InputConversion/ConvertTorchUnstructuredToLinalgExt.cpp Outdated

Return SmallVector<Value> from inlineTorchFunction

22363ad

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

keshavvinayak01 requested a review from rsuderman April 15, 2026 17:52

Fix expected error message in invalid.mlir test

002778f

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

keshavvinayak01 merged commit 4bd3742 into main Apr 22, 2026
64 of 66 checks passed

keshavvinayak01 deleted the personal/users/keshavvinayak01/linalgext-torch-rewrite-flexattention branch April 22, 2026 16:33

keshavvinayak01 mentioned this pull request Apr 24, 2026

[SDPA][hipDNN] generate_stats=true not supported iree-org/fusilli#275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::OnlineAttentionOp#23292

[LinalgExt] Rewriter for Torch::HigherOrderFlexAttentionOp -> LinalgExt::OnlineAttentionOp#23292
keshavvinayak01 merged 7 commits into
mainfrom
personal/users/keshavvinayak01/linalgext-torch-rewrite-flexattention

keshavvinayak01 commented Jan 27, 2026 •

edited

Loading

Uh oh!

MaheshRavishankar commented Jan 28, 2026

Uh oh!

keshavvinayak01 commented Mar 30, 2026

Uh oh!

keshavvinayak01 left a comment

Uh oh!

MaheshRavishankar left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

keshavvinayak01 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaheshRavishankar commented Jan 28, 2026

Uh oh!

keshavvinayak01 commented Mar 30, 2026

Uh oh!

keshavvinayak01 left a comment

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keshavvinayak01 commented Jan 27, 2026 •

edited

Loading