Skip to content

add MOSS-TTS#586

Closed
Li-dongyang wants to merge 12 commits intoBlaizzy:mainfrom
OpenMOSS:rxli/add-moss-tts
Closed

add MOSS-TTS#586
Li-dongyang wants to merge 12 commits intoBlaizzy:mainfrom
OpenMOSS:rxli/add-moss-tts

Conversation

@Li-dongyang
Copy link
Copy Markdown

Context

Add support for MOSS-TTS.

Description

This model belongs to the MOSS-TTS family and corresponds to the moss_tts_delay variant, whose model_typeis set to moss_tts_delay.

The model supports plain text input as well as a structured conversation payload (generation and continuation modes) and optional reference-audio voice cloning.

Checklist

  • Tests added/updated
  • Documentation updated
  • Issue referenced (no related issue)

@Li-dongyang Li-dongyang changed the title add mlx support for moss-tts add MOSS-TTS Mar 17, 2026
Comment thread mlx_audio/codec/models/moss_audio_tokenizer/moss_audio_tokenizer.py Outdated
Comment thread mlx_audio/tts/models/moss_tts/moss_tts.py Outdated
Comment thread mlx_audio/tts/models/moss_tts/moss_tts.py Outdated
Comment thread mlx_audio/tts/models/moss_tts/qwen3.py Outdated
Comment thread mlx_audio/tts/models/moss_tts/README.md
Comment thread mlx_audio/utils.py Outdated
Comment thread mlx_audio/tts/seedtts_eval.py Outdated
Comment thread mlx_audio/convert.py Outdated
Comment thread mlx_audio/server.py Outdated
Copy link
Copy Markdown
Collaborator

@lucasnewman lucasnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Please see the comments as it needs some work before we can merge it.

@Li-dongyang
Copy link
Copy Markdown
Author

Hi @lucasnewman, I think everything is ready now. We’ve also prepared the pre-quantized weights of the TTS backbone on Hugging Face here: https://huggingface.co/mlx-community/MOSS-TTS-8B-8bit
Could you please take another look and review the code when you have a chance? Thank you.

@akdeb
Copy link
Copy Markdown

akdeb commented Mar 25, 2026

@Li-dongyang Is there a way to support https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect with MLX?

@Li-dongyang
Copy link
Copy Markdown
Author

@Li-dongyang Is there a way to support https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect with MLX?

Hi @akdeb Thank you for your love
Because their architecture, behavior at inference time, and chat template are exactly the same, I think this PR can be reused directly.

IMO the thing that needs to be changed is sampling parameter. docs

@Li-dongyang
Copy link
Copy Markdown
Author

Hi @lucasnewman , I think everything is ready now. Could you please take another look? Thank you.

@gaoyang07
Copy link
Copy Markdown

Hi @Blaizzy @lucasnewman , we are from OpenMOSS team, and the PR is ready for merging. Would you mind reviewing it again? Thanks❤️~

@Blaizzy
Copy link
Copy Markdown
Owner

Blaizzy commented Apr 14, 2026

Hey @gaoyang07 @Li-dongyang

Thanks for the awesome contribution!

Could you please run pre-commit run --all and commit the changes.

Also please resolve the conflicts.

Note⚠️: Ignore this test Docs / docs-required-for-user-facing-changes (pull_request)F

@Li-dongyang
Copy link
Copy Markdown
Author

@Blaizzy Hi, Thanks for the review. We’ve updated the PR to address all of the requested changes. If everything looks good, we’d appreciate a merge. Thank you!

@Li-dongyang
Copy link
Copy Markdown
Author

@Blaizzy @lucasnewman The PR is ready for merging. Would you mind reviewing it again?

@Li-dongyang
Copy link
Copy Markdown
Author

@Blaizzy The PR is ready for merging. I have resolved the conflict. Would you mind reviewing it again?

@Blaizzy
Copy link
Copy Markdown
Owner

Blaizzy commented Apr 24, 2026

Thanks @Li-dongyang!

@lucasnewman could you please review and cleanup this PR so we can land it 👌🏽

@Blaizzy
Copy link
Copy Markdown
Owner

Blaizzy commented Apr 24, 2026

Sorry for the delay. We’ve been swamped and the project is going through some major changes.

Given the size and number of changes, reviewing comment-by-comment will take a while. It’ll be faster if we just address all the comments directly on our end rather than going back and forth.

We’ll take care of it and follow up once it’s done.

Comment thread mlx_audio/server.py
Comment thread mlx_audio/convert.py Outdated
card.data.tags = tags
card.data.library_name = "mlx-audio"
setattr(card.data, "tags", tags)
setattr(card.data, "library_name", "mlx-audio")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of these changes? They seem functionally identical without context...

Comment thread mlx_audio/convert.py Outdated
continue
weights.update(mx.load(wf))
loaded = mx.load(wf)
if isinstance(loaded, tuple):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation is this a tuple?

Comment thread mlx_audio/convert.py Outdated
# Handle model_path attribute if needed
if hasattr(model_config, "model_path"):
model_config.model_path = model_path
if hasattr(model_class, "ModelConfig"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention here? If model_path doesn't exist you could just add it to your config, right?

Comment thread mlx_audio/server.py Outdated
from pydantic import BaseModel

from mlx_audio.audio_io import read as audio_read
from mlx_audio.audio_io import sf_read, sf_write
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just aliases for audio_io.read and audio_io.write which are already imported fwiw.

Comment thread mlx_audio/server.py Outdated
model: str
input: str
input: str | SpeechConversation | None = None
conversation: SpeechConversation | None = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have both conversation and input here -- it should just be packed in the input when it's a conversation.

The conversation parameter is also non-standard with respect to OpenAI's API. The parameter itself is an ID to a persisted object in the API standard, which we don't have support for.

Comment thread mlx_audio/convert.py Outdated
return lambda p, m: base_requirements(p, m) and bool(mixed_predicate(p, m))


def _is_moss_tts_export_exempt(path: str) -> bool:
Copy link
Copy Markdown
Collaborator

@lucasnewman lucasnewman Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really don't want model-specific exemptions like this -- it could randomly break other models that happen to use the same module keys. A better solution would be a mapping of parameters -> precision that could be read in generically, or it may be be better to create a conversion script for your model and just upload the converted weights instead of modifying this file to handle model-specific mixed quants.

Comment thread mlx_audio/tts/tests/test_convert.py Outdated
q_group_size=None,
q_bits=None,
q_mode="affine",
q_group_size=None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are double-defined here, hence the broken tests. Please verify all tests pass.

@Li-dongyang
Copy link
Copy Markdown
Author

Li-dongyang commented Apr 27, 2026

Hi @lucasnewman, thank you for the detailed review. We’ve addressed all the issues you raised earlier and refined the convert-related functionality based on your suggestions. Could you please take another look?

@lucasnewman
Copy link
Copy Markdown
Collaborator

Thanks for the contribution! I'm going to close this in favor of #691, which supports both the delay and local transformer versions with about half the code changes, and will let us add the dialogue model shortly afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants