Skip to content

Latest commit

Β 

History

History
57 lines (43 loc) Β· 3.12 KB

File metadata and controls

57 lines (43 loc) Β· 3.12 KB

πŸ“’ Text To Speech Example

This example demonstrates how to use the TextToSpeech feature of MaIN.NET. In this case the example shows how to setup a simple speech generation using Kokoro model with preset voices.

πŸš€ Quick Start

To run the example you need the Kokoro TTS model downloaded. The model must be in ONNX format.
It can be acquired here: GET KOKORO.
Voices can be downloaded from original Kokoro repository on HuggingFace GET VOICES.
Model needs to be placed in set Models directory. Voices can be stored wherever but their path MUST be set in the example code.

⚠️⚠️⚠️ IMPORTANT ⚠️⚠️⚠️
TTS feature is in an ongoing development. Some changes will be refactored in near future.
Some approaches will be changed completely to match desired MaIN approach. (VoiceService is great example)

πŸ“ Code Example

public class ChatWithTextToSpeechExample : IExample
{
    private const string VoicePath = "<your-path-to-voices>";
    
    public async Task Start()
    {
        Console.WriteLine("ChatWithTextToSpeech is running! Put on your headphones and press any key.");
        Console.ReadKey();
        
        VoiceService.SetVoicesPath(VoicePath);
        var voice = VoiceService.GetVoice("af_heart")
            .MixWith(VoiceService.GetVoice("bf_emma"));
        
        await AIHub.Chat().WithModel("gemma2:2b")
            .WithMessage("Generate a 4 sentence poem.")
            .Speak(new TextToSpeechParams("kokoro:82m", voice, true))
            .CompleteAsync(interactive: true);

        Console.WriteLine("Done!");
        Console.ReadKey();
    }
}

πŸ”Ή How It Works

  1. Set Voices Path β†’ required for a time being. Sets manually directory where voice files are stored
  2. Voice Service β†’ static utility class. Works as a temporary bridge make certain features possible. It is mainly used for GetVoice() voice loading and MixWith() extension method that allows for voice mixing*
  3. Speak β†’ core of the TTS functionality. Vocalizes each message returned by model. In this case a 4 sentence poem. Requires TextToSpeechParams parameters which are essentially all "moving parts" of TTS. It consists of 3 parameters:
  • model - model name. Similar to how WithModel() parameter
  • voice - Voice class loaded in previous step
  • playback - a boolean that specifies whether generated audio should be played back to via system audio driver. This parameter is optional and defaults to false.
    Generated TTS audio (apart from the optional playback) will be stored in Message class, in Speech byte array property.

πŸ“‹ Prerequisites

  • Kokoro model and voices downloaded
  • Any audio device present
  • MaIN.NET framework properly configured

*This feature as well as parts of TTS code were heavily inspired by Lyrcaxis project called KokoroSharp. Please check their work and give them a ⭐