[voxtral_tts] enable MLX backend by seyeong-han · Pull Request #19177 · pytorch/executorch

seyeong-han · 2026-04-28T01:08:50Z

Summary

Enable Voxtral TTS on the ExecuTorch MLX backend for Apple Silicon.

This PR adds MLX export support for the LM/flow methods and the codec decoder, make voxtral_tts-mlx and CMake preset wiring, README instructions, a one-shot MLX E2E script, and MLX parity/regression tests. The native MLX codec fix is in the backend advanced-indexing lowering used by the codec's F.unfold / im2col conv rewrite.

The shared runner exposes --streaming and --speaker for MLX builds. Offline MLX synthesis is validated here with the native MLX codec artifact; streaming uses the same codec artifact and avoids the old portable CPU codec fallback. --backend mlx --qlinear-codec is still rejected because MLX codec quantization is not yet validated.

Benchmark

Apple Silicon MLX benchmark using bf16 + 4w linear + 8w embedding export, with native MLX LM/flow and native MLX codec:

Run	Audio	Generate time	RTF	Process wall
1	3.44s	3132ms	0.910465	5.19s
2	3.44s	2634ms	0.765698	3.15s
3	3.44s	2607ms	0.757849	3.13s

Average generation RTF: 0.811337 (0.761774 warm-run average). Average process wall: 3.82s (3.14s warm-run average). WAV quality check: peak 0.42575, clipped samples 0. Apple Speech transcribed the benchmark WAV as Hello how are you today.

Test plan

conda run -n et-mlx python -m pytest -q examples/models/voxtral_tts/test_mlx_parity.py
- 8 passed in 395.04s
PYTHONPATH=/Users/younghan conda run -n et-mlx python -m pytest -q backends/mlx/test/test_runtime_ops.py
- 2 passed in 4.82s
Native MLX codec export with --backend mlx --dtype bf16 --export-target codec
- wrote /tmp/voxtral_tts_mlx_native_codec_256/codec_decoder.pte (289.2 MB)
Three runner benchmark invocations using /tmp/voxtral_tts_mlx_final/model.pte and /tmp/voxtral_tts_mlx_native_codec_256/codec_decoder.pte
- average generation RTF 0.811337, average process wall 3.82s
Apple Speech verification on generated WAV
- FINAL Hello how are you today

pytorch-bot · 2026-04-28T01:08:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19177

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Ubuntu services are down

❌ 1 New Failure, 2 Unrelated Failures

As of commit ba5b038 with merge base f3e49ff ():

NEW FAILURE - The following job has failed:

pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-28T01:09:34Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Adds Apple Silicon MLX export and runner wiring for Voxtral TTS while keeping codec lowering portable for waveform correctness. Made-with: Cursor

Clarify that the runner exposes streaming flags for MLX builds, while this branch only reports offline MLX performance because codec decoding still falls back to portable CPU. Made-with: Cursor

Made-with: Cursor

Keep the advanced-indexing fix reviewable by isolating gather permutation logic, tightening MLX test availability checks, and updating the Qwen MoE CI cast budget to match the corrected graph. Made-with: Cursor

Excluded per PR feedback; parity testing will be handled separately.

The original intro said weights are loaded directly from safetensors, which was ambiguous — at inference the C++ runner loads .pte files, not safetensors. Localize the claim to export.

Move MLX index regression coverage into the consolidated op tests and document why F.unfold needs PyTorch-compatible gather ordering. Co-authored-by: Cursor <cursoragent@cursor.com>

seyeong-han requested review from kirklandsign, larryliu0820 and lucylq as code owners April 28, 2026 01:08

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2026

seyeong-han changed the title ~~examples/voxtral_tts: enable MLX backend~~ feat: [voxtral_tts] enable MLX backend Apr 28, 2026

seyeong-han requested a review from mergennachin April 28, 2026 21:52

seyeong-han changed the title ~~feat: [voxtral_tts] enable MLX backend~~ [voxtral_tts] enable MLX backend Apr 29, 2026

seyeong-han added 5 commits April 30, 2026 10:07

examples/voxtral_tts: enable MLX backend

43f9261

Adds Apple Silicon MLX export and runner wiring for Voxtral TTS while keeping codec lowering portable for waveform correctness. Made-with: Cursor

examples/voxtral_tts: document MLX streaming scope

bbe9fe8

Clarify that the runner exposes streaming flags for MLX builds, while this branch only reports offline MLX performance because codec decoding still falls back to portable CPU. Made-with: Cursor

backends/mlx: fix advanced indexing gather order

9783048

Made-with: Cursor

backends/mlx: fix CI lint fallout

6d027dd

Keep the advanced-indexing fix reviewable by isolating gather permutation logic, tightening MLX test availability checks, and updating the Qwen MoE CI cast budget to match the corrected graph. Made-with: Cursor

backends/mlx: reduce index handler lint complexity

fca9ecf

seyeong-han force-pushed the voxtral-tts-mlx branch from 3f98344 to fca9ecf Compare April 30, 2026 17:08

seyeong-han added 2 commits April 30, 2026 10:10

examples/voxtral_tts: drop test_mlx_parity.py from PR

f894a11

Excluded per PR feedback; parity testing will be handled separately.

examples/voxtral_tts: clarify safetensors are loaded at export time

0c733aa

The original intro said weights are loaded directly from safetensors, which was ambiguous — at inference the C++ runner loads .pte files, not safetensors. Localize the claim to export.

metascroy reviewed May 1, 2026

View reviewed changes

Comment thread backends/mlx/ops.py

metascroy reviewed May 1, 2026

View reviewed changes

Comment thread backends/mlx/test/test_runtime_ops.py Outdated

backends/mlx: address index review feedback

ba5b038

Move MLX index regression coverage into the consolidated op tests and document why F.unfold needs PyTorch-compatible gather ordering. Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[voxtral_tts] enable MLX backend#19177

[voxtral_tts] enable MLX backend#19177
seyeong-han wants to merge 8 commits intopytorch:mainfrom
seyeong-han:voxtral-tts-mlx

seyeong-han commented Apr 28, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seyeong-han commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark

Test plan

Uh oh!

pytorch-bot Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19177

❗ 1 Active SEVs

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

github-actions Bot commented Apr 28, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seyeong-han commented Apr 28, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 28, 2026 •

edited

Loading

This PR needs a `release notes:` label