Add cuBLAS mm_out shim to eliminate libtorch runtime dependency by digantdesai · Pull Request #19360 · pytorch/executorch

digantdesai · 2026-05-07T05:31:42Z

Implements aoti_torch_cuda_mm_out as a thin cuBLAS wrapper in the ExecuTorch AOTI CUDA shims. When Inductor picks cuBLAS over Triton templates for aten::mm (F.linear), the compiled .so requires this symbol at runtime. Without this shim, it resolves from libtorch_cuda.so, pulling in the full libtorch runtime.

In practice, Inductor's autotune on A100 picks Triton templates for the Qwen3.5 MoE dense projections (bf16 [M,2048]x[2048,N]), so the shim is not exercised for this model. It serves as a safety net for models or shapes where cuBLAS wins the autotune, ensuring fully libtorch-free AOTI CUDA deployment in all cases.

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

pytorch-bot · 2026-05-07T05:31:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19360

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 4 Unrelated Failures

As of commit b6b4ad7 with merge base 8ae05c2 ():

NEW FAILURE - The following job has failed:

Test CUDA Builds / test-model-cuda-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-weight-only) / linux-job (gh)
RuntimeError: Command docker exec -t 4868e073cc3070f33123ea5f2b5f428ab834fe79d00c0f923b24497cd6625d93 /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

Test CUDA Builds / unittest-cuda / linux-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.
Test CoreML Backend / test-coreml / test-backend-macos (coreml, models) / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
Test CoreML Backend / test-coreml / test-backend-macos (coreml, operators) / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-07T05:32:31Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Implements aoti_torch_cuda_mm_out as a thin cuBLAS wrapper in the ExecuTorch AOTI CUDA shims. When Inductor picks cuBLAS over Triton templates for aten::mm (F.linear), the compiled .so requires this symbol at runtime. Without this shim, it resolves from libtorch_cuda.so, pulling in the full libtorch runtime. In practice, Inductor's autotune on A100 picks Triton templates for the Qwen3.5 MoE dense projections (bf16 [M,2048]x[2048,N]), so the shim is not exercised for this model. It serves as a safety net for models or shapes where cuBLAS wins the autotune, ensuring fully libtorch-free AOTI CUDA deployment in all cases. Co-authored-by: Claude <noreplyanthropic.com>

Gasoonjia · 2026-05-07T18:32:43Z

can you help me to update the title and summary a little bit? one thing is our cuda backend never depend on libtorch; our current state sounds like we are depending on it.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2026

digantdesai added ciflow/cuda and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels May 7, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2026

digantdesai requested review from Gasoonjia and mergennachin May 7, 2026 15:25

digantdesai force-pushed the cublas-mm-shim branch from 7316ecf to b6b4ad7 Compare May 7, 2026 15:26

Gasoonjia changed the title ~~Add cuBLAS mm_out shim to eliminate libtorch runtime dependency~~ Add cuBLAS mm_out shim to cuda backend May 7, 2026

Gasoonjia changed the title ~~Add cuBLAS mm_out shim to cuda backend~~ Add cuBLAS mm_out shim to eliminate libtorch runtime dependency May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cuBLAS mm_out shim to eliminate libtorch runtime dependency#19360

Add cuBLAS mm_out shim to eliminate libtorch runtime dependency#19360
digantdesai wants to merge 1 commit intomainfrom
cublas-mm-shim

digantdesai commented May 7, 2026

Uh oh!

pytorch-bot Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Gasoonjia commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

digantdesai commented May 7, 2026

Summary

Test plan

Uh oh!

pytorch-bot Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19360

❌ 1 New Failure, 1 Cancelled Job, 4 Unrelated Failures

Uh oh!

github-actions Bot commented May 7, 2026

This PR needs a release notes: label

Uh oh!

Gasoonjia commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented May 7, 2026 •

edited

Loading

This PR needs a `release notes:` label