You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add speed priors, text-length embedding selection, and README improvements
Adopted from PR #9 by @BenjaminKobjolke:
- Speed priors from model config.json for per-voice speed correction
- Text-length-based embedding row selection for ONNX2 models (varied prosody)
- Accept both named voices and raw expr-voice-* identifiers via API
- Added .gitattributes for line ending normalization
README updates:
- Added emojis to all section headers matching Chatterbox README style
- Removed Raspberry Pi 4 references (unsupported)
Copy file name to clipboardExpand all lines: README.md
+36-49Lines changed: 36 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
**Self-host the ultra-lightweight [KittenTTS model](https://github.com/KittenML/KittenTTS) with this enhanced API server. Now supports all 7 KittenTTS models — from the tiny 15M Nano to the 80M Mini — with hot-swappable model switching, an intuitive Web UI, a flexible API, large text processing for audiobooks, and high-performance GPU acceleration.**
4
4
5
-
This server provides a robust, user-friendly, and powerful interface for the KittenTTS engine family, an open-source, realistic text-to-speech system. This project significantly enhances the original model by adding a full-featured server, an easy-to-use UI, and an optimized inference pipeline for hardware ranging from NVIDIA GPUs to CPUs and even the Raspberry Pi 5 (RP5) and Raspberry Pi 4 (RP4).
5
+
This server provides a robust, user-friendly, and powerful interface for the KittenTTS engine family, an open-source, realistic text-to-speech system. This project significantly enhances the original model by adding a full-featured server, an easy-to-use UI, and an optimized inference pipeline for hardware ranging from NVIDIA GPUs to CPUs and even the Raspberry Pi 5.
@@ -21,30 +21,30 @@ This server provides a robust, user-friendly, and powerful interface for the Kit
21
21
22
22
---
23
23
24
-
## What's New
24
+
## 🆕 What's New
25
25
26
-
### Complete KittenTTS model family support (new)
26
+
### 🎯 Complete KittenTTS model family support (new)
27
27
28
28
- Added full support for **all 7 KittenTTS models** across three model sizes (Nano, Micro, Mini) and two generations (v0.1/v0.2 and v0.8).
29
29
- Models range from the ultra-compact **15M-parameter Nano** to the high-quality **80M-parameter Mini**, all running on ONNX for maximum portability.
30
30
- v0.8 models feature improved expressivity, quantized INT8 variants for minimal footprint, and named voices (Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo).
31
31
32
-
### Hot-swappable model switching (new)
32
+
### 🔁 Hot-swappable model switching (new)
33
33
34
34
- Added a new **model selector** dropdown at the top of the Web UI.
35
35
- All 7 models are **hot-swappable** — select from the dropdown, click "Apply & Restart", and the backend automatically downloads (if needed), unloads the current model, and loads your choice. No server restart required.
36
36
- A **progress modal** with real-time status updates shows download and loading progress, so you always know what's happening.
37
37
- Models are **downloaded automatically** from Hugging Face on first use and cached locally in the project's `model_cache` directory for instant subsequent loads.
38
38
-**Cancellation support** — if you change your mind during a download, select a different model and the current load is cancelled automatically.
39
39
40
-
### Named voices for all models (new)
40
+
### 🎤 Named voices for all models (new)
41
41
42
42
- All models now use **human-friendly voice names** instead of technical identifiers.
- The voice dropdown automatically updates when you switch models.
46
46
47
-
### Complete model lineup
47
+
### ✅ Complete model lineup
48
48
49
49
**You now have access to the entire KittenTTS family:**
50
50
@@ -62,7 +62,7 @@ This server provides a robust, user-friendly, and powerful interface for the Kit
62
62
63
63
---
64
64
65
-
## Overview: Enhanced KittenTTS Generation
65
+
## 🗣️ Overview: Enhanced KittenTTS Generation
66
66
67
67
The [KittenTTS model by KittenML](https://github.com/KittenML/KittenTTS) provides a foundation for generating high-quality speech from models smaller than 80MB. This project elevates that foundation into a production-ready service by providing a robust [FastAPI](https://fastapi.tiangolo.com/) server that makes KittenTTS significantly easier to use, more powerful, and drastically faster.
68
68
@@ -77,17 +77,15 @@ We solve the complexity of setting up and running the model by offering:
77
77
***Cross-platform support** for Windows and Linux, with clear setup instructions.
78
78
***Docker support** for easy, reproducible containerized deployment.
79
79
80
-
## Raspberry Pi & Edge Device Support
80
+
## 🍓 Raspberry Pi & Edge Device Support
81
81
82
82
The ultra-lightweight nature of the KittenTTS model and the efficiency of this server make it a perfect candidate for running on single-board computers (SBCs) and other edge devices.
83
83
84
-
***Raspberry Pi 5 (RP5):** Confirmed to run with **excellent performance**. The server is fast and responsive, easily handling requests from other devices on the same local network (LAN). This makes it ideal for local network services, home automation, and other DIY projects.
85
-
86
-
***Raspberry Pi 4 (RP4):** Testing is currently in progress. Not working on the 32-bit Raspberry Pi OS.
84
+
* ✅ **Raspberry Pi 5 (RP5):** Confirmed to run with **excellent performance**. The server is fast and responsive, easily handling requests from other devices on the same local network (LAN). This makes it ideal for local network services, home automation, and other DIY projects.
87
85
88
86
To install, simply follow the standard **Linux installation guide** provided in this README.
89
87
90
-
## GPU Acceleration included
88
+
## 🔥 GPU Acceleration included
91
89
92
90
A standout feature of this server is the implementation of **high-performance GPU acceleration**, a capability not available in the original KittenTTS project. While the base model is CPU-only, this server unlocks the full potential of your hardware.
93
91
@@ -97,7 +95,7 @@ A standout feature of this server is the implementation of **high-performance GP
97
95
98
96
This enhancement transforms KittenTTS from a lightweight-but-modest engine into a high-speed synthesis powerhouse.
99
97
100
-
## Alternative to Piper TTS
98
+
## 🔄 Alternative to Piper TTS
101
99
102
100
The [KittenTTS model](https://github.com/KittenML/KittenTTS) serves as an excellent alternative to [Piper TTS](https://github.com/rhasspy/piper) for fast generation on limited compute and edge devices like Raspberry Pi 5.
103
101
@@ -112,34 +110,34 @@ While KittenTTS provides the ultra-lightweight foundation, this server transform
112
110
113
111
Perfect for users seeking Piper's offline capabilities with better performance on limited hardware and modern server infrastructure.
114
112
115
-
## Key Features of This Server
113
+
## ✨ Key Features of This Server
116
114
117
-
***Multi-Model Support:** All 7 KittenTTS models (Nano, Micro, Mini across v0.1/v0.2/v0.8) with hot-swappable switching from the UI.
118
-
***Automatic Model Download:** Models are downloaded from Hugging Face on first use and cached locally.
119
-
***True GPU Acceleration:** Full support for **NVIDIA (CUDA)** via an optimized `onnxruntime-gpu` pipeline with I/O Binding for maximum performance.
120
-
***Large Text & Audiobook Generation:**
115
+
*🔁 **Multi-Model Support:** All 7 KittenTTS models (Nano, Micro, Mini across v0.1/v0.2/v0.8) with hot-swappable switching from the UI.
116
+
*📦 **Automatic Model Download:** Models are downloaded from Hugging Face on first use and cached locally.
117
+
*⚡ **True GPU Acceleration:** Full support for **NVIDIA (CUDA)** via an optimized `onnxruntime-gpu` pipeline with I/O Binding for maximum performance.
118
+
*📚 **Large Text & Audiobook Generation:**
121
119
* Automatically handles long texts by intelligently splitting them based on sentence boundaries.
122
120
* Processes each chunk individually and seamlessly concatenates the resulting audio.
123
121
***Ideal for audiobooks** - paste entire books and get professional-quality audio.
124
-
***Modern Web Interface:**
122
+
*🖥️ **Modern Web Interface:**
125
123
* Intuitive UI for text input, model selection, voice selection, and parameter adjustment.
126
124
* Real-time waveform visualization of generated audio.
127
125
* Progress modal for model downloads with real-time status updates.
128
-
***Named Voices:**
126
+
*🎤 **Named Voices:**
129
127
* Up to 8 named voices per model (4 male, 4 female).
130
128
* Voice list updates automatically when switching models.
131
-
***Dual API Endpoints:**
129
+
*⚙️ **Dual API Endpoints:**
132
130
* A primary `/tts` endpoint offering full control over all generation parameters.
133
131
* An OpenAI-compatible `/v1/audio/speech` endpoint for seamless integration into existing workflows.
134
-
***Easy Configuration:**
132
+
*🔧 **Easy Configuration:**
135
133
* All settings are managed through a single `config.yaml` file.
136
134
* The server automatically creates a default config on the first run.
137
-
***UI State Persistence:** The web interface remembers your last-used text, voice, and settings to streamline your workflow.
138
-
***Docker Support:** Easy, reproducible deployment for both CPU and GPU via Docker Compose.
135
+
*💾 **UI State Persistence:** The web interface remembers your last-used text, voice, and settings to streamline your workflow.
136
+
*🐳 **Docker Support:** Easy, reproducible deployment for both CPU and GPU via Docker Compose.
139
137
140
138
---
141
139
142
-
## System Prerequisites
140
+
## 🔩 System Prerequisites
143
141
144
142
***Operating System:** Windows 10/11 (64-bit) or Linux (Debian/Ubuntu recommended).
145
143
***Python:** Version 3.10 or later.
@@ -149,14 +147,13 @@ Perfect for users seeking Piper's offline capabilities with better performance o
149
147
***Linux:**`sudo apt install espeak-ng`
150
148
***Raspberry Pi:**
151
149
* Raspberry Pi 5
152
-
* Raspberry Pi 4
153
150
***(For GPU Acceleration):**
154
151
* An **NVIDIA GPU** with CUDA support.
155
152
***(For Linux Only):**
156
153
*`libsndfile1`: Audio library needed by `soundfile`. Install via `sudo apt install libsndfile1`.
157
154
*`ffmpeg`: For robust audio operations. Install via `sudo apt install ffmpeg`.
158
155
159
-
## Installation and Setup
156
+
## 💻 Installation and Setup
160
157
161
158
This project uses specific dependency files and a clear process to ensure a smooth, one-command installation for your hardware.
The first time you start the server, it will automatically download the default KittenTTS Nano model (~25MB) from Hugging Face. This is a one-time process. Subsequent launches will be instant. Additional models are downloaded automatically when selected from the Web UI.
@@ -301,11 +298,11 @@ The first time you start the server, it will automatically download the default
301
298
302
299
4. **To stop the server:** Press `CTRL+C`in the terminal.
303
300
304
-
### **Raspberry Pi 4 & 5 Installation (CPU-Only)**
301
+
### **Raspberry Pi 5 Installation (CPU-Only)**
305
302
306
-
KittenTTS runs excellently on Raspberry Pi devices, making it ideal forlocal network services and DIY projects. However, installation requirements vary significantly between Pi models due to CPU architecture differences.
303
+
KittenTTS runs excellently on Raspberry Pi 5, making it ideal forlocal network services and DIY projects.
307
304
308
-
#### **Raspberry Pi 5 - Full Support**
305
+
#### **Raspberry Pi 5 - Full Support ✅**
309
306
310
307
**Raspberry Pi 5 works out-of-the-box** with the standard Linux installation guide above. No special steps required!
311
308
@@ -336,17 +333,7 @@ python server.py
336
333
337
334
>**Important:** During the `pip install -r requirements.txt` step, some Python packages (especially audio processing libraries like `librosa`, `praat-parselmouth`, and others) may need to be compiled from source on ARM architecture. This process can take **15-30 minutes** depending on your SD card speed and system load. This is normal - let it complete without interruption.
338
335
339
-
#### **Raspberry Pi 4 - Limited Support**
340
-
341
-
**Raspberry Pi 4 support is currently in development** due to complex dependency compilation issues on 32-bit ARM architecture.
342
-
343
-
**For Raspberry Pi 4 Users:**
344
-
We recommend upgrading to **64-bit Raspberry Pi OS**if possible, as this significantly improves compatibility with modern Python packages. For users requiring 32-bit support, please check our [GitHub Issues](https://github.com/devnen/Kitten-TTS-Server/issues) for the latest progress updates and community-contributed solutions.
345
-
346
-
**Alternative Recommendation:**
347
-
For the best Raspberry Pi TTS experience, we strongly recommend using a **Raspberry Pi 5** with the standard 64-bit OS, which provides excellent performance and full compatibility.
348
-
349
-
## Docker Installation
336
+
## 🐳 Docker Installation
350
337
351
338
Run Kitten-TTS-Server easily using Docker. The recommended method uses Docker Compose, which is pre-configured for both CPU and NVIDIA GPU deployment.
If `CUDA available:` prints `True`, your GPU setup is working correctly
430
417
431
-
## Usage Guide
418
+
## 💡 Usage Guide
432
419
433
420
### Generate Your First Audio
434
421
@@ -456,7 +443,7 @@ If `CUDA available:` prints `True`, your GPU setup is working correctly
456
443
5. Click **"Generate Speech"**. The server will process the entire text and stitch the audio together seamlessly.
457
444
6. Download your complete audiobook file.
458
445
459
-
## API Documentation
446
+
## 📖 API Documentation
460
447
461
448
The server exposes two main endpoints for TTS. See `http://localhost:8005/docs` for an interactive playground.
462
449
@@ -502,7 +489,7 @@ Use this for drop-in compatibility with scripts expecting OpenAI's TTS API struc
502
489
*`POST /restart_server` — Triggers an async model hot-swap based on current config.
503
490
*`POST /api/cancel-loading` — Cancels an in-progress model download/load.
504
491
505
-
## Configuration
492
+
## ⚙️ Configuration
506
493
507
494
All server settings are managed in the `config.yaml` file. It's created automatically on first launch if it doesn't exist.
508
495
@@ -513,7 +500,7 @@ All server settings are managed in the `config.yaml` file. It's created automati
513
500
*`generation_defaults.speed`: Default speech speed (1.0 is normal).
514
501
*`audio_output.format`: Default audio format (`wav`, `mp3`, `opus`).
515
502
516
-
## Troubleshooting
503
+
## 🛠️ Troubleshooting
517
504
518
505
***Phonemizer / eSpeak Errors:**
519
506
* This is the most common issue. Ensure you have installed **eSpeak NG** correctly for your OS and **restarted your terminal** afterward. The server includes auto-detection logic for common install paths.
@@ -529,16 +516,16 @@ All server settings are managed in the `config.yaml` file. It's created automati
529
516
* Try clearing the `model_cache` directory and restarting.
530
517
* Large models (Mini 0.1 at ~170MB) may take several minutes on slower connections.
531
518
532
-
## Acknowledgements & Credits
519
+
## 🙏 Acknowledgements & Credits
533
520
534
521
***Core Model:** This project is powered by the **[KittenTTS model](https://github.com/KittenML/KittenTTS)** created by **[KittenML](https://github.com/KittenML)**. Our work adds a high-performance server and UI layer on top of their excellent lightweight model.
***UI Inspiration:** The UI/server architecture is inspired by our previous work on the [Chatterbox-TTS-Server](https://github.com/devnen/Chatterbox-TTS-Server).
537
524
538
-
## License
525
+
## 📄 License
539
526
540
527
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
541
528
542
-
## Contributing
529
+
## 🤝 Contributing
543
530
544
531
Contributions, issues, and feature requests are welcome! Please feel free to open an issue or submit a pull request.
0 commit comments