Skip to content

Qualcomm AI Engine Direct - LLM multi-batch quantization and evaluation#20488

Open
DannyYuyang-quic wants to merge 1 commit into
pytorch:mainfrom
CodeLinaro:dev1/danny/support_llm_multi_batch_quantization
Open

Qualcomm AI Engine Direct - LLM multi-batch quantization and evaluation#20488
DannyYuyang-quic wants to merge 1 commit into
pytorch:mainfrom
CodeLinaro:dev1/danny/support_llm_multi_batch_quantization

Conversation

@DannyYuyang-quic

Copy link
Copy Markdown
Contributor

Summary

  • Support multi-batch calibration
  • Support multi-batch evaluation
  • Add CLI flag to specify batch size during calibration
  • Fix the shape of attention mask (Optimize runtime graph)

Details about fix the shape of attention mask (Optimize runtime graph)

The attention mask without the head dimension, the model had to broadcast the shape from [1, 1, S] to [1, 1, 1, S]. This introduced an extra view_copy node in the runtime graph, which is unnecessary.
image
With head dim, no broadcast needed, redundant view_copy node removed
image

Test plan

  • ExampleLLMScript
  • TestExampleMultimodalityScript

Summary:
- Support multi-batch calibration
- Support multi-batch evaluation
- Add CLI flag to specify batch size during calibration
- Fix the shape of attention mask
@pytorch-bot

pytorch-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20488

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2026
@DannyYuyang-quic

Copy link
Copy Markdown
Contributor Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot Bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Jun 24, 2026
@DannyYuyang-quic

Copy link
Copy Markdown
Contributor Author

@psiddh Hi, this PR mainly adds support for multi-batch quantization and evaluation.
Please take a look. Thanks!

cc: @shewu-quic, @haowhsu-quic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant