Problem
Currently, both Brazilian Portuguese and European Portuguese are merged
under a single pt language code. This causes unpredictable dialect output
during inference — even when providing a strong Brazilian Portuguese reference
audio, the model sometimes generates European Portuguese phonetics instead.
The two variants are phonetically distinct in ways that are immediately
noticeable to native speakers:
- Palatalization of /t/ and /d/ before /i/ (e.g., "tia" → /tʃia/ in BR,
/tia/ in PT)
- Pre-consonant /r/ realization (guttural in BR, tapped in PT)
- Open vowels and vocalic rhythm (fuller in BR, reduced in PT)
- Syllable-final /l/ → /w/ vocalisation (BR only)
Use Case
I am building a video dubbing pipeline (English → Brazilian Portuguese)
where consistent pt-BR output per segment is a hard requirement.
The current ambiguity makes the model unreliable for this use case
without additional post-processing workarounds.
Brazil has 200M+ Portuguese speakers and is the largest Portuguese-speaking
market in the world. A dedicated pt-BR tag would make MOSS-TTS significantly
more useful for a large and underserved developer audience.
Request
For MOSS-TTS 2.0, please consider:
- Separate language codes:
pt-BR for Brazilian Portuguese and pt-PT
for European Portuguese, following the ISO 639-1 standard already used
by most TTS systems.
- Dedicated training data per variant: ensuring the model has sufficient
pt-BR speech data so the dialect is reliably reproduced without depending
solely on reference audio to infer the variant.
- Reference audio + language tag combination: when both are provided,
the explicit tag should take precedence over the dialect inferred from
the audio.
Current Workaround (insufficient)
Using a reference audio with strong BR phonetics helps but does not guarantee
consistent output. The model still occasionally falls back to European
Portuguese, which is not acceptable for production dubbing workflows.
Thank you for the great work on MOSS-TTS — the token-level duration control
and Apache 2.0 license make it the most promising open-source TTS for this
use case. Looking forward to 2.0!
Problem
Currently, both Brazilian Portuguese and European Portuguese are merged
under a single
ptlanguage code. This causes unpredictable dialect outputduring inference — even when providing a strong Brazilian Portuguese reference
audio, the model sometimes generates European Portuguese phonetics instead.
The two variants are phonetically distinct in ways that are immediately
noticeable to native speakers:
/tia/ in PT)
Use Case
I am building a video dubbing pipeline (English → Brazilian Portuguese)
where consistent pt-BR output per segment is a hard requirement.
The current ambiguity makes the model unreliable for this use case
without additional post-processing workarounds.
Brazil has 200M+ Portuguese speakers and is the largest Portuguese-speaking
market in the world. A dedicated
pt-BRtag would make MOSS-TTS significantlymore useful for a large and underserved developer audience.
Request
For MOSS-TTS 2.0, please consider:
pt-BRfor Brazilian Portuguese andpt-PTfor European Portuguese, following the ISO 639-1 standard already used
by most TTS systems.
pt-BR speech data so the dialect is reliably reproduced without depending
solely on reference audio to infer the variant.
the explicit tag should take precedence over the dialect inferred from
the audio.
Current Workaround (insufficient)
Using a reference audio with strong BR phonetics helps but does not guarantee
consistent output. The model still occasionally falls back to European
Portuguese, which is not acceptable for production dubbing workflows.
Thank you for the great work on MOSS-TTS — the token-level duration control
and Apache 2.0 license make it the most promising open-source TTS for this
use case. Looking forward to 2.0!