Skip to content

Fix Voxtral TTS tokenizer dependency contract#633

Merged
lucasnewman merged 12 commits intoBlaizzy:mainfrom
lyonsno:fix/voxtral-tts-tokenizer-contract-pr
Apr 22, 2026
Merged

Fix Voxtral TTS tokenizer dependency contract#633
lucasnewman merged 12 commits intoBlaizzy:mainfrom
lyonsno:fix/voxtral-tts-tokenizer-contract-pr

Conversation

@lyonsno
Copy link
Copy Markdown
Contributor

@lyonsno lyonsno commented Apr 5, 2026

Summary

  • Remove silent fallback prompt path in Voxtral TTS that produced garbage audio when mistral-common[audio] was unavailable — replace with a clear RuntimeError
  • Add mistral-common[audio] to the tts extra so the speech tokenizer is available in clean installs
  • Add focused tests covering both the dependency declaration and the explicit runtime failure mode

Motivation

Voxtral TTS looked broken in clean .[tts] environments not because the model was bad, but because the integration silently fell back to a manual prompt-construction path when encode_speech_request(...) was unavailable. That fallback accepted the wrong prompt shape and produced garbage audio with no error. This makes the contract explicit.

Test plan

  • python -m unittest mlx_audio.tts.tests.test_voxtral_tts -v — both tests pass
  • Human smoke: BF16 and 4-bit Voxtral quants sound correct on Apple Silicon once the Mistral speech-tokenizer path is available
  • Rebased on current origin/main — clean single commit

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this to test_models.py

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into test_models.py in 042355c.

Blaizzy
Blaizzy previously requested changes Apr 7, 2026
Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just have nit

@lyonsno
Copy link
Copy Markdown
Contributor Author

lyonsno commented Apr 7, 2026

Moved the tests back to a standalone file — when the test class lives in test_models.py, the module-level mlx/numpy/misaki imports mean the contract test can't run in a minimal environment (it dies with ModuleNotFoundError: No module named 'mlx' before unittest even discovers the class). The standalone file only needs stdlib at import time, which is the whole point of a packaging-contract test.

@lucasnewman lucasnewman dismissed Blaizzy’s stale review April 20, 2026 19:03

Change implemented.

Comment thread mlx_audio/tts/tests/test_voxtral_tts.py Outdated
@@ -0,0 +1,35 @@
import tomllib
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyonsno Can we avoid adding a new dependency here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucasnewman I don’t think we can safely avoid it in this PR. Voxtral TTS already depends on SpeechRequest / encode_speech_request from mistral-common[audio], and the old no-extra fallback was the path producing bad
audio. This change is just making the .[tts] install contract match the actual runtime requirement for Voxtral, and mistral-common[audio] is already present in our stt, sts, and all extras. If we want a
smaller install surface, I’d rather handle that as a follow-up by splitting Voxtral into its own extra instead of keeping a broken fallback in tts.

@lucasnewman lucasnewman self-requested a review April 20, 2026 19:13
@lucasnewman lucasnewman self-requested a review April 21, 2026 18:59
@lucasnewman
Copy link
Copy Markdown
Collaborator

@lyonsno Looks like the TTS tests are failing even though the mistral common package is being installed. Can you take a look?

lyonsno and others added 3 commits April 21, 2026 19:23
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lyonsno
Copy link
Copy Markdown
Contributor Author

lyonsno commented Apr 22, 2026

I Took a look. The TTS failure was on an earlier PR head; I pushed follow-up fixes afterward, and the current head (af37164) is green now. Latest CI is passing.

Copy link
Copy Markdown
Collaborator

@lucasnewman lucasnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lucasnewman lucasnewman merged commit c64890e into Blaizzy:main Apr 22, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants