Qualcomm AI Engine Direct - LLM multi-batch quantization and evaluation by DannyYuyang-quic · Pull Request #20488 · pytorch/executorch

DannyYuyang-quic · 2026-06-24T15:40:30Z

Summary

Support multi-batch calibration
Support multi-batch evaluation
Add CLI flag to specify batch size during calibration
Fix the shape of attention mask (Optimize runtime graph)

Details about fix the shape of attention mask (Optimize runtime graph)

The attention mask without the head dimension, the model had to broadcast the shape from [1, 1, S] to [1, 1, 1, S]. This introduced an extra view_copy node in the runtime graph, which is unnecessary.

With head dim, no broadcast needed, redundant view_copy node removed

Test plan

ExampleLLMScript
TestExampleMultimodalityScript

Summary: - Support multi-batch calibration - Support multi-batch evaluation - Add CLI flag to specify batch size during calibration - Fix the shape of attention mask

pytorch-bot · 2026-06-24T15:40:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20488

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI runner label rename: rebase PRs using old linux.rocm.gpu.gfx950.* labels

This comment was automatically generated by Dr. CI and updates every 15 minutes.

DannyYuyang-quic · 2026-06-24T15:41:18Z

@pytorchbot label "release notes: qualcomm"

DannyYuyang-quic · 2026-06-24T15:43:28Z

@psiddh Hi, this PR mainly adds support for multi-batch quantization and evaluation.
Please take a look. Thanks!

cc: @shewu-quic, @haowhsu-quic

Qualcomm AI Engine Direct - support multi-batch calibration

85cafbc

Summary: - Support multi-batch calibration - Support multi-batch evaluation - Add CLI flag to specify batch size during calibration - Fix the shape of attention mask

DannyYuyang-quic requested review from abhinaykukkadapu and psiddh as code owners June 24, 2026 15:40

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2026

pytorch-bot Bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - LLM multi-batch quantization and evaluation#20488

Qualcomm AI Engine Direct - LLM multi-batch quantization and evaluation#20488
DannyYuyang-quic wants to merge 1 commit into
pytorch:mainfrom
CodeLinaro:dev1/danny/support_llm_multi_batch_quantization

DannyYuyang-quic commented Jun 24, 2026

Uh oh!

pytorch-bot Bot commented Jun 24, 2026

Uh oh!

DannyYuyang-quic commented Jun 24, 2026

Uh oh!

DannyYuyang-quic commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DannyYuyang-quic commented Jun 24, 2026

Summary

Details about fix the shape of attention mask (Optimize runtime graph)

Test plan

Uh oh!

pytorch-bot Bot commented Jun 24, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20488

❗ 1 Active SEVs

Uh oh!

DannyYuyang-quic commented Jun 24, 2026

Uh oh!

DannyYuyang-quic commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant