add MOSS-TTS#586
Conversation
lucasnewman
left a comment
There was a problem hiding this comment.
Thanks for the PR! Please see the comments as it needs some work before we can merge it.
b5dc8a2 to
c34a6b2
Compare
|
Hi @lucasnewman, I think everything is ready now. We’ve also prepared the pre-quantized weights of the TTS backbone on Hugging Face here: https://huggingface.co/mlx-community/MOSS-TTS-8B-8bit |
|
@Li-dongyang Is there a way to support https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect with MLX? |
Hi @akdeb Thank you for your love IMO the thing that needs to be changed is sampling parameter. docs |
|
Hi @lucasnewman , I think everything is ready now. Could you please take another look? Thank you. |
|
Hi @Blaizzy @lucasnewman , we are from OpenMOSS team, and the PR is ready for merging. Would you mind reviewing it again? Thanks❤️~ |
|
Thanks for the awesome contribution! Could you please run Also please resolve the conflicts. Note |
f4f3c07 to
c04ac73
Compare
|
@Blaizzy Hi, Thanks for the review. We’ve updated the PR to address all of the requested changes. If everything looks good, we’d appreciate a merge. Thank you! |
|
@Blaizzy @lucasnewman The PR is ready for merging. Would you mind reviewing it again? |
|
@Blaizzy The PR is ready for merging. I have resolved the conflict. Would you mind reviewing it again? |
|
Thanks @Li-dongyang! @lucasnewman could you please review and cleanup this PR so we can land it 👌🏽 |
|
Sorry for the delay. We’ve been swamped and the project is going through some major changes. Given the size and number of changes, reviewing comment-by-comment will take a while. It’ll be faster if we just address all the comments directly on our end rather than going back and forth. We’ll take care of it and follow up once it’s done. |
| card.data.tags = tags | ||
| card.data.library_name = "mlx-audio" | ||
| setattr(card.data, "tags", tags) | ||
| setattr(card.data, "library_name", "mlx-audio") |
There was a problem hiding this comment.
What's the point of these changes? They seem functionally identical without context...
| continue | ||
| weights.update(mx.load(wf)) | ||
| loaded = mx.load(wf) | ||
| if isinstance(loaded, tuple): |
There was a problem hiding this comment.
In what situation is this a tuple?
| # Handle model_path attribute if needed | ||
| if hasattr(model_config, "model_path"): | ||
| model_config.model_path = model_path | ||
| if hasattr(model_class, "ModelConfig"): |
There was a problem hiding this comment.
What's the intention here? If model_path doesn't exist you could just add it to your config, right?
| from pydantic import BaseModel | ||
|
|
||
| from mlx_audio.audio_io import read as audio_read | ||
| from mlx_audio.audio_io import sf_read, sf_write |
There was a problem hiding this comment.
These are just aliases for audio_io.read and audio_io.write which are already imported fwiw.
| model: str | ||
| input: str | ||
| input: str | SpeechConversation | None = None | ||
| conversation: SpeechConversation | None = None |
There was a problem hiding this comment.
I don't think we should have both conversation and input here -- it should just be packed in the input when it's a conversation.
The conversation parameter is also non-standard with respect to OpenAI's API. The parameter itself is an ID to a persisted object in the API standard, which we don't have support for.
| return lambda p, m: base_requirements(p, m) and bool(mixed_predicate(p, m)) | ||
|
|
||
|
|
||
| def _is_moss_tts_export_exempt(path: str) -> bool: |
There was a problem hiding this comment.
We really don't want model-specific exemptions like this -- it could randomly break other models that happen to use the same module keys. A better solution would be a mapping of parameters -> precision that could be read in generically, or it may be be better to create a conversion script for your model and just upload the converted weights instead of modifying this file to handle model-specific mixed quants.
| q_group_size=None, | ||
| q_bits=None, | ||
| q_mode="affine", | ||
| q_group_size=None, |
There was a problem hiding this comment.
These are double-defined here, hence the broken tests. Please verify all tests pass.
… bitwise alignment with main branch
…input handling in server.py
bf26080 to
706efbb
Compare
|
Hi @lucasnewman, thank you for the detailed review. We’ve addressed all the issues you raised earlier and refined the convert-related functionality based on your suggestions. Could you please take another look? |
|
Thanks for the contribution! I'm going to close this in favor of #691, which supports both the delay and local transformer versions with about half the code changes, and will let us add the dialogue model shortly afterwards. |
Context
Add support for MOSS-TTS.
Description
This model belongs to the MOSS-TTS family and corresponds to the moss_tts_delay variant, whose model_typeis set to
moss_tts_delay.The model supports plain text input as well as a structured conversation payload (generation and continuation modes) and optional reference-audio voice cloning.
Checklist