Arm backend: Add static cache integration test with llama by xingguo01 · Pull Request #18404 · pytorch/executorch

xingguo01 · 2026-03-23T13:19:39Z

Add static cache integration tests in llama

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

pytorch-bot · 2026-03-23T13:19:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18404

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 6 Pending, 3 Unrelated Failures

As of commit c175435 with merge base e638059 ():

NEW FAILURES - The following jobs have failed:

pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t 89ed163c1d4963e08c98a332b26bb0ac245911e4c63b02ea8e8f7472cb040c07 /exec failed with exit code 139
pull / unittest-arm-backend-with-no-deps (test_pytest_models_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t a86c9ed3601143af6cdcaea905ebaa312183407bf67866312b534c9446afe4fe /exec failed with exit code 1
trunk / test-arm-backend-ethos-u (test_smaller_stories_llama) / linux-job (gh)
RuntimeError: Command docker exec -t 67be55eee0c028d1f47ee01d5ec70b9e2a340875485c0b2fe0c0bf2e9afdd7df /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
trunk / unittest-release / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zingo · 2026-03-23T13:41:20Z

Hi @digantdesai this PR touch code outside Arm backend and need a Meta review. Thanks!

Change-Id: I881fa107f43c9682c18480d01996a5795ae7f086 Signed-off-by: Xingguo Li <xingguo.li@arm.com>

Copilot

Pull request overview

This PR adds an Arm backend integration test that exercises HuggingFace LLaMA StaticCache lowering, and adjusts backend transforms/passes to better support LLaMA-style attention and graph-signature constraints during lowering.

Changes:

Extend SDPA decomposition to handle LLaMA-style GQA (Q heads != KV heads) and refactor SDPA graph-copying helpers.
Add a new HuggingFace StaticCache-based LLaMA INT TOSA integration test.
Fix bias placeholder insertion ordering for rewritten convs to satisfy constant-vs-user-input placeholder constraints.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`backends/transforms/decompose_sdpa.py`	Refactors SDPA decomposition and adds a GQA wrapper for LLaMA-style head mismatch.
`backends/arm/test/models/test_llama.py`	Adds HF StaticCache LLaMA test and tweaks existing LLaMA TOSA pipeline settings.
`backends/arm/_passes/rewrite_conv_pass.py`	Ensures synthetic bias placeholders are inserted before user-input placeholders.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T17:28:51Z

            custom_path="llama_tosa_fb_int",
-            run_on_tosa_ref_model=False,  # Just want to write TOSA FB to disk
+            run_on_tosa_ref_model=True,  # Just want to write TOSA FB to disk
            use_to_edge_transform_and_lower=True,
            frobenius_threshold=None,
            cosine_threshold=None,


The inline comment says this test “Just want to write TOSA FB to disk”, but run_on_tosa_ref_model is now True (and the explicit serialize stage was removed). Either update the comment to match the new behavior (running the TOSA ref model) or set run_on_tosa_ref_model=False if the intent is still artifact-only.

Copilot · 2026-04-14T17:28:51Z

+    @staticmethod
+    def _extract_input_tensors(node: torch.fx.Node) -> tuple[object, ...]:
+        def _extract_arg_value(arg):
+            if isinstance(arg, torch.fx.Node):
+                if "val" not in arg.meta:
+                    raise RuntimeError(
+                        f"Missing meta['val'] for SDPA arg node: {arg.name}"
+                    )
+                return arg.meta["val"]
+            return arg
+
+        return tuple(_extract_arg_value(arg) for arg in node.args)
+


_extract_input_tensors only walks node.args and ignores node.kwargs. For aten.scaled_dot_product_attention it’s common for attn_mask / dropout_p / is_causal / scale to be provided as kwargs, so the make_fx trace here can silently use defaults and decompose the wrong computation. Consider canonicalizing the SDPA call into a full positional arg list (q,k,v,attn_mask,dropout_p,is_causal,scale) by merging args+kwargs+defaults, and use that both for tracing and for the later scale adjustment (including handling scale passed positionally).

Copilot · 2026-04-14T17:28:51Z

+        for decomposed_node in decomposed_module.graph.nodes:
+            node.meta["nn_module_stack"] = decomposed_node.meta.get("nn_module_stack")
+            if decomposed_node.op == "placeholder":
+                continue


In _copy_decomposed_graph, nn_module_stack metadata is being written onto the original SDPA node (which is erased) instead of propagating from the original node to the decomposed nodes / copied subgraph nodes. This likely drops nn_module_stack on the new nodes and breaks downstream tooling relying on that metadata. The direction should match other decomposition passes (e.g., set decomposed_node.meta["nn_module_stack"] = node.meta.get("nn_module_stack") before node_copy, or set it on subgraph_node after copying).

Copilot · 2026-04-14T17:28:51Z

+            Hk = k.shape[1]
+            if Hq != Hk:
+                # LLaMA-style GQA: tile K and V heads to match Q
+                assert Hq % Hk == 0, f"GQA mismatch: Hq={Hq}, Hk={Hk}"


Using a bare assert for the GQA head-ratio check makes this validation disappear under Python -O and can turn a shape mismatch into harder-to-debug downstream errors during tracing. Prefer raising a RuntimeError / ValueError with the same message so it is always enforced.

Suggested change

assert Hq % Hk == 0, f"GQA mismatch: Hq={Hq}, Hk={Hk}"

if Hq % Hk != 0:

raise ValueError(f"GQA mismatch: Hq={Hq}, Hk={Hk}")

Copilot · 2026-04-14T17:28:52Z

            custom_path="llama_tosa_fb",
-            run_on_tosa_ref_model=False,  # Just want to write TOSA FB to disk
+            run_on_tosa_ref_model=True,  # Just want to write TOSA FB to disk
            use_to_edge_transform_and_lower=True,
            transform_passes=[InsertInt32CastsAfterInt64PlaceholdersPass()],


The inline comment says this test “Just want to write TOSA FB to disk”, but run_on_tosa_ref_model is now True (and the explicit serialize stage was removed). Either update the comment to match the new behavior (running the TOSA ref model) or set run_on_tosa_ref_model=False if the intent is still artifact-only.

digantdesai · 2026-04-20T23:27:58Z

        )(*input_tensors)

        with graph.inserting_before(node):
-            name_to_input_tensor_map = {}


Is this just a refactor?

digantdesai

Does it work on the tosa ref model though? Just curious.

xingguo01 requested review from digantdesai and kimishpatel as code owners March 23, 2026 13:19

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026

xingguo01 added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: arm Changes to the ARM backend delegate labels Mar 23, 2026

Arm backend: Add static cache integration test with llama

c175435

Change-Id: I881fa107f43c9682c18480d01996a5795ae7f086 Signed-off-by: Xingguo Li <xingguo.li@arm.com>

Copilot AI review requested due to automatic review settings April 14, 2026 17:22

xingguo01 force-pushed the arm-backend-stat-cache-integration-llama branch from de5d980 to c175435 Compare April 14, 2026 17:22

Copilot started reviewing on behalf of xingguo01 April 14, 2026 17:23 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

digantdesai reviewed Apr 20, 2026

View reviewed changes

digantdesai approved these changes Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm backend: Add static cache integration test with llama#18404

Arm backend: Add static cache integration test with llama#18404
xingguo01 wants to merge 1 commit intopytorch:mainfrom
xingguo01:arm-backend-stat-cache-integration-llama

xingguo01 commented Mar 23, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

zingo commented Mar 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

digantdesai Apr 20, 2026

Uh oh!

digantdesai left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	assert Hq % Hk == 0, f"GQA mismatch: Hq={Hq}, Hk={Hk}"
	if Hq % Hk != 0:
	raise ValueError(f"GQA mismatch: Hq={Hq}, Hk={Hk}")

Conversation

xingguo01 commented Mar 23, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18404

❌ 3 New Failures, 6 Pending, 3 Unrelated Failures

Uh oh!

zingo commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

digantdesai Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xingguo01 commented Mar 23, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Mar 23, 2026 •

edited

Loading

zingo commented Mar 23, 2026 •

edited

Loading