add MOSS-TTS by Li-dongyang · Pull Request #586 · Blaizzy/mlx-audio

Li-dongyang · 2026-03-17T04:27:49Z

Context

Add support for MOSS-TTS.

Description

This model belongs to the MOSS-TTS family and corresponds to the moss_tts_delay variant, whose model_typeis set to moss_tts_delay.

The model supports plain text input as well as a structured conversation payload (generation and continuation modes) and optional reference-audio voice cloning.

Checklist

Tests added/updated
Documentation updated
Issue referenced (no related issue)

lucasnewman

Thanks for the PR! Please see the comments as it needs some work before we can merge it.

Li-dongyang · 2026-03-24T12:34:57Z

Hi @lucasnewman, I think everything is ready now. We’ve also prepared the pre-quantized weights of the TTS backbone on Hugging Face here: https://huggingface.co/mlx-community/MOSS-TTS-8B-8bit
Could you please take another look and review the code when you have a chance? Thank you.

akdeb · 2026-03-25T19:28:51Z

@Li-dongyang Is there a way to support https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect with MLX?

Li-dongyang · 2026-03-26T03:04:22Z

@Li-dongyang Is there a way to support https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect with MLX?

Hi @akdeb Thank you for your love
Because their architecture, behavior at inference time, and chat template are exactly the same, I think this PR can be reused directly.

IMO the thing that needs to be changed is sampling parameter. docs

Li-dongyang · 2026-04-09T05:45:31Z

Hi @lucasnewman , I think everything is ready now. Could you please take another look? Thank you.

gaoyang07 · 2026-04-10T10:04:47Z

Hi @Blaizzy @lucasnewman , we are from OpenMOSS team, and the PR is ready for merging. Would you mind reviewing it again? Thanks❤️~

Blaizzy · 2026-04-14T11:06:55Z

Hey @gaoyang07 @Li-dongyang

Thanks for the awesome contribution!

Could you please run pre-commit run --all and commit the changes.

Also please resolve the conflicts.

Note⚠️: Ignore this test Docs / docs-required-for-user-facing-changes (pull_request)F

Li-dongyang · 2026-04-16T09:26:04Z

@Blaizzy Hi, Thanks for the review. We’ve updated the PR to address all of the requested changes. If everything looks good, we’d appreciate a merge. Thank you!

Li-dongyang · 2026-04-20T03:27:38Z

@Blaizzy @lucasnewman The PR is ready for merging. Would you mind reviewing it again?

Li-dongyang · 2026-04-24T12:45:50Z

@Blaizzy The PR is ready for merging. I have resolved the conflict. Would you mind reviewing it again?

Blaizzy · 2026-04-24T13:03:27Z

Thanks @Li-dongyang!

@lucasnewman could you please review and cleanup this PR so we can land it 👌🏽

Blaizzy · 2026-04-24T13:15:13Z

Sorry for the delay. We’ve been swamped and the project is going through some major changes.

Given the size and number of changes, reviewing comment-by-comment will take a while. It’ll be faster if we just address all the comments directly on our end rather than going back and forth.

We’ll take care of it and follow up once it’s done.

lucasnewman · 2026-04-24T15:56:24Z

-        card.data.tags = tags
-        card.data.library_name = "mlx-audio"
+        setattr(card.data, "tags", tags)
+        setattr(card.data, "library_name", "mlx-audio")


What's the point of these changes? They seem functionally identical without context...

lucasnewman · 2026-04-24T15:57:51Z

            continue
-        weights.update(mx.load(wf))
+        loaded = mx.load(wf)
+        if isinstance(loaded, tuple):


In what situation is this a tuple?

lucasnewman · 2026-04-24T16:00:23Z

-    # Handle model_path attribute if needed
-    if hasattr(model_config, "model_path"):
-        model_config.model_path = model_path
+    if hasattr(model_class, "ModelConfig"):


What's the intention here? If model_path doesn't exist you could just add it to your config, right?

lucasnewman · 2026-04-24T16:01:09Z

 from pydantic import BaseModel

 from mlx_audio.audio_io import read as audio_read
+from mlx_audio.audio_io import sf_read, sf_write


These are just aliases for audio_io.read and audio_io.write which are already imported fwiw.

lucasnewman · 2026-04-24T16:04:31Z

    model: str
-    input: str
+    input: str | SpeechConversation | None = None
+    conversation: SpeechConversation | None = None


I don't think we should have both conversation and input here -- it should just be packed in the input when it's a conversation.

The conversation parameter is also non-standard with respect to OpenAI's API. The parameter itself is an ID to a persisted object in the API standard, which we don't have support for.

lucasnewman · 2026-04-24T16:06:21Z

+    return lambda p, m: base_requirements(p, m) and bool(mixed_predicate(p, m))
+
+
+def _is_moss_tts_export_exempt(path: str) -> bool:


We really don't want model-specific exemptions like this -- it could randomly break other models that happen to use the same module keys. A better solution would be a mapping of parameters -> precision that could be read in generically, or it may be be better to create a conversion script for your model and just upload the converted weights instead of modifying this file to handle model-specific mixed quants.

lucasnewman · 2026-04-24T16:08:12Z

            q_group_size=None,
            q_bits=None,
            q_mode="affine",
+            q_group_size=None,


These are double-defined here, hence the broken tests. Please verify all tests pass.

… bitwise alignment with main branch

…ests

…input handling in server.py

… in README

Li-dongyang · 2026-04-27T11:59:22Z

Hi @lucasnewman, thank you for the detailed review. We’ve addressed all the issues you raised earlier and refined the convert-related functionality based on your suggestions. Could you please take another look?

lucasnewman · 2026-04-30T16:40:30Z

Thanks for the contribution! I'm going to close this in favor of #691, which supports both the delay and local transformer versions with about half the code changes, and will let us add the dialogue model shortly afterwards.

Li-dongyang changed the title ~~add mlx support for moss-tts~~ add MOSS-TTS Mar 17, 2026