This example demonstrates how to use the TextToSpeech feature of MaIN.NET. In this case the example shows how to setup a simple speech generation using Kokoro model with preset voices.
To run the example you need the Kokoro TTS model downloaded. The model must be in ONNX format.
It can be acquired here: GET KOKORO.
Voices can be downloaded from original Kokoro repository on HuggingFace GET VOICES.
Model needs to be placed in set Models directory. Voices can be stored wherever but their path MUST be set in the example code.
TTS feature is in an ongoing development. Some changes will be refactored in near future.
Some approaches will be changed completely to match desired MaIN approach. (VoiceService is great example)
public class ChatWithTextToSpeechExample : IExample
{
private const string VoicePath = "<your-path-to-voices>";
public async Task Start()
{
Console.WriteLine("ChatWithTextToSpeech is running! Put on your headphones and press any key.");
Console.ReadKey();
VoiceService.SetVoicesPath(VoicePath);
var voice = VoiceService.GetVoice("af_heart")
.MixWith(VoiceService.GetVoice("bf_emma"));
await AIHub.Chat().WithModel("gemma2:2b")
.WithMessage("Generate a 4 sentence poem.")
.Speak(new TextToSpeechParams("kokoro:82m", voice, true))
.CompleteAsync(interactive: true);
Console.WriteLine("Done!");
Console.ReadKey();
}
}- Set Voices Path β required for a time being. Sets manually directory where voice files are stored
- Voice Service β static utility class. Works as a temporary bridge make certain features possible. It is mainly used for
GetVoice()voice loading andMixWith()extension method that allows for voice mixing* - Speak β core of the TTS functionality. Vocalizes each message returned by model. In this case a 4 sentence poem. Requires
TextToSpeechParamsparameters which are essentially all "moving parts" of TTS. It consists of 3 parameters:
model- model name. Similar to howWithModel()parametervoice-Voiceclass loaded in previous stepplayback- a boolean that specifies whether generated audio should be played back to via system audio driver. This parameter is optional and defaults tofalse.
Generated TTS audio (apart from the optional playback) will be stored inMessageclass, inSpeechbyte array property.
- Kokoro model and voices downloaded
- Any audio device present
- MaIN.NET framework properly configured
*This feature as well as parts of TTS code were heavily inspired by Lyrcaxis project called KokoroSharp. Please check their work and give them a β