Hey, I tested Moss-TTS-Nano, especially the voice cloning part.
What I don't understand from the docs is
- if the source audio transcription can be passed along somewhere (seems not?)
- what the expected source audio length is – I tried 2s, 3s, 6s, 10s, 30s, and only the 3s-part had somewhat decent results, the others all produced garbage.
- how to cache a voice profile for multiple generations.
Thanks!
Hey, I tested Moss-TTS-Nano, especially the voice cloning part.
What I don't understand from the docs is
Thanks!