Skip to content

Commit cdece6b

Browse files
committed
release v1.1.0
1 parent 6f1df37 commit cdece6b

7 files changed

Lines changed: 337 additions & 153 deletions

README.md

Lines changed: 114 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,12 @@
22

33
# LightTTS
44

5-
**LightTTS** is a lightweight and high-performance text-to-speech (TTS) inference and service framework based on Python. It is built around the [cosyvoice](https://github.com/FunAudioLLM/CosyVoice) model and based on the [lightllm](https://github.com/ModelTC/lightllm), with optimizations to support fast, scalable, and service-ready TTS deployment.
5+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
6+
[![Docker](https://img.shields.io/badge/docker-ready-brightgreen.svg)](https://hub.docker.com/r/lighttts/light-tts)
7+
8+
**⚡ Lightning-Fast Text-to-Speech Inference & Service Framework**
9+
10+
**LightTTS** is a lightweight and high-performance text-to-speech (TTS) inference and service framework based on Python. It supports **CosyVoice2** and **CosyVoice3** models, built upon the [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) architecture and [LightLLM](https://github.com/ModelTC/lightllm) framework, with optimizations to support fast, scalable, and service-ready TTS deployment.
611

712
---
813

@@ -19,18 +24,18 @@
1924

2025
### Installation
2126

22-
- Installing with Docker
27+
- (Option 1 Recommended) Run with Docker
2328
```bash
24-
# The easiest way to install Lightllm is by using the official image. You can directly pull and run the official image
25-
docker pull lighttts/light-tts:v1.0
29+
# The easiest way to install LightTTS is by using the official image. You can directly pull and run the official image
30+
docker pull lighttts/light-tts:latest
2631

2732
# Or you can manually build the image
28-
docker build -t light-tts:v1.0 .
33+
docker build -t light-tts:latest .
2934

3035
# Run the image
31-
docker run -it --gpus all -p 8080:8080 --shm-size 4g -v your_local_path:/data/ light-tts:v1.0 /bin/bash
36+
docker run -it --gpus all -p 8080:8080 --shm-size 4g -v your_local_path:/data/ light-tts:latest /bin/bash
3237

33-
- Installing from Source
38+
- (Option 2) Install from Source
3439

3540
```bash
3641
# Clone the repo
@@ -41,12 +46,11 @@
4146
# git submodule update --init --recursive
4247
4348
# (Recommended) Create a new conda environment
44-
conda create -n light-tts python=3.10 -y
49+
conda create -n light-tts python=3.10
4550
conda activate light-tts
4651
47-
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platforms.
48-
conda install -y -c conda-forge pynini==2.1.5
49-
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
52+
# Install dependencies (We use the latest torch==2.9.1, but other versions are also compatible)
53+
pip install -r requirements.txt
5054
5155
# If you encounter sox compatibility issues
5256
# ubuntu
@@ -55,23 +59,25 @@
5559
sudo yum install sox sox-devel
5660
```
5761

58-
### Model download
62+
### Model Download
5963

60-
We now only support CosyVoice2 model.
64+
We now support CosyVoice2 and CosyVoice3 models.
6165

6266
```python
63-
# SDK模型下载
67+
# ModelScope SDK model download (SDK模型下载)
6468
from modelscope import snapshot_download
69+
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
6570
snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
6671
snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
67-
```
68-
```python
69-
# git模型下载,请确保已安装git lfs
70-
mkdir -p pretrained_models
71-
git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5B
72-
git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd
72+
73+
# For overseas users, HuggingFace SDK model download
74+
from huggingface_hub import snapshot_download
75+
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
76+
snapshot_download('FunAudioLLM/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
77+
snapshot_download('FunAudioLLM/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
7378
```
7479

80+
(We have already installed the ttsfrd package in the docker image. If you are using docker image, you can skip this installation)
7581
For better text normalization performance, you can optionally install the ttsfrd package and unzip its resources. This step is not required — if skipped, the system will fall back to WeTextProcessing by default.
7682

7783
```bash
@@ -80,78 +86,109 @@ unzip resource.zip -d .
8086
pip install ttsfrd_dependency-0.1-py3-none-any.whl
8187
pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl
8288
```
83-
📝 This setup instruction is based on the original guide from the [CosyVoice repository](https://github.com/FunAudioLLM/CosyVoice).
8489

8590
### Start the Model Service
8691

92+
**Note:** It is recommended to enable the `load_trt` parameter for acceleration. The default flow precision is fp16 for CosyVoice2 and fp32 for CosyVoice3.
93+
94+
**For CosyVoice2:**
95+
96+
```bash
97+
python -m light_tts.server.api_server --model_dir ./pretrained_models/CosyVoice2-0.5B
98+
```
99+
100+
**For CosyVoice3:**
101+
102+
```bash
103+
python -m light_tts.server.api_server --model_dir ./pretrained_models/Fun-CosyVoice3-0.5B-2512
104+
```
105+
106+
**With custom data type** (float32, bfloat16, or float16; default: float16):
107+
87108
```bash
88-
# It is recommended to enable the load_trt parameter for acceleration.
89-
# token2wav: The default is fp16 mode for CosyVoice2 and fp32 mode for CosyVoice3.
90-
python -m light_tts.server.api_server --model_dir ./pretrained_models/CosyVoice2-0.5B-latest
109+
# Use float32 for better accuracy or float16 for faster speed
110+
python -m light_tts.server.api_server --model_dir ./pretrained_models/Fun-CosyVoice3-0.5B-2512 --data_type float32
91111
```
92112

93-
- max_total_token_num: llm arg, the total token nums the gpu and model can support, equals = `max_batch * (input_len + output_len)`
94-
- max_req_total_len: llm arg, the max value for `req_input_len + req_output_len`, 32768 is set here because the `max_position_embeddings` of the llm part is 32768
95-
- There are many other parameters that can be viewed in `light_tts/server/api_cli.py`
113+
**Available Parameters:**
96114

97-
Wait for a while, this service will be started. The default startup is localhost:8080.
115+
The default values are usually the fastest and generally do not need to be adjusted. If you need to customize them, please refer to the following parameter descriptions:
116+
- `load_trt`: Whether to load the flow_decoder in TensorRT mode (default: True).
117+
- `data_type`: The data type for LLM inference (default: float16)
118+
- `load_jit`: Whether to load the flow_encoder in JIT mode (default: False).
119+
- `max_total_token_num`: LLM arg, total token count the GPU and model can support = `max_batch * (input_len + output_len)` (default: 64 * 1024)
120+
- `max_req_total_len`: LLM arg, maximum value for `req_input_len + req_output_len` (default: 32768, matches `max_position_embeddings`)
121+
- `graph_max_len_in_batch`: Maximum sequence length for CUDA graph capture in decoding stage (default: 32768)
122+
- `graph_max_batch_size`: Maximum batch size for CUDA graph capture in decoding stage (default: 16)
123+
124+
For more parameters, see `light_tts/server/api_cli.py`
125+
126+
Wait for the service to initialize. The default address is `http://localhost:8080`.
98127

99128
### Request Examples
100129

101-
When your service is started, you can call the service through the http API. We support three modes: non-streaming, streaming and bi-streaming.
102-
103-
- non-streaming and streaming. You can also use `test/test_zero_shot.py`, which can print information such as rtf and ttft.
104-
105-
106-
```python
107-
import requests
108-
import time
109-
import soundfile as sf
110-
import numpy as np
111-
import os
112-
import threading
113-
import json
114-
115-
url = "http://localhost:8080/inference_zero_shot"
116-
path = "cosyvoice/asset/zero_shot_prompt.wav" # wav file path
117-
prompt_text = "希望你以后能够做的比我还好呦。"
118-
tts_text = "收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。"
119-
stream = True # Whether to use streaming inference
120-
files = {
121-
"prompt_wav": ("sample.wav", open(path, "rb"), "audio/wav")
122-
}
123-
data = {
124-
"tts_text": tts_text,
125-
"prompt_text": prompt_text,
126-
"stream": stream
127-
}
128-
response = requests.post(url, files=files, data=data, stream=True)
129-
sample_rate = 24000
130-
131-
audio_data = bytearray()
132-
try:
133-
for chunk in response.iter_content(chunk_size=4096):
134-
if chunk:
135-
audio_data.extend(chunk)
136-
except Exception as e:
137-
print(f"Exception: {e}")
138-
print(f"Error: {response.status_code}, {response.text}")
139-
return
140-
audio_np = np.frombuffer(audio_data, dtype=np.int16)
141-
if response.status_code == 200:
142-
output_wav = f"./outs/output{'_stream' if stream else ''}_{index}.wav"
143-
sf.write(output_wav, audio_np, samplerate=sample_rate, subtype="PCM_16")
144-
print(f"saved as {output_wav}")
145-
else:
146-
print("Error:", response.status_code, response.text)
147-
```
130+
Once the service is running, you can interact with it through the HTTP API. We support three modes: **non-streaming**, **streaming**, and **bi-streaming**.
131+
132+
- **Non-streaming and Streaming**: Use `test/test_zero_shot.py` for examples, which prints metrics such as RTF (Real-Time Factor) and TTFT (Time To First Token)
133+
- **Bi-streaming**: Uses WebSocket interface. See usage examples in `test/test_bistream.py`
134+
135+
## 📊 Performance Benchmarks
136+
137+
We have conducted performance benchmarks on different GPU configurations to demonstrate the throughput and latency characteristics of LightTTS in streaming mode.
138+
139+
Model: `Fun-CosyVoice3-0.5B-2512` datatype: `float16`
140+
141+
### NVIDIA GeForce RTX 4090D
142+
non-stream: `test/test_zs.py`
143+
144+
|num_workers|cost time 50%|cost time 90%|cost time 99%|rtf 50%|rtf 90%|rtf 99%|avg rtf|total_cost_time|qps|
145+
|------|------|------|------|------|------|------|------|------|------|
146+
|1|0.61|1.09|1.51|0.13|0.16|0.22|0.13|33.95|1.47|
147+
|2|0.8|1.24|1.71|0.15|0.22|0.25|0.16|21.46|2.33|
148+
|4|1.02|1.88|2.27|0.22|0.29|0.38|0.23|15.31|3.27|
149+
|8|1.76|2.36|3.48|0.33|0.49|0.62|0.36|12.18|4.1|
150+
151+
stream: `test/test_zs_stream.py`
152+
153+
|num_workers|cost time 50%|cost time 90%|cost time 99%|ttft 50%|ttft 90%|ttft 99%|rtf 50%|rtf 90%|rtf 99%|avg rtf|total_cost_time|qps|
154+
|------|------|------|------|------|------|------|------|------|------|------|------|------|
155+
|1|1.01|2.15|2.82|0.33|0.34|0.9|0.21|0.25|0.34|0.22|60.13|0.83|
156+
|2|1.83|3.56|5.16|0.93|1.53|2.3|0.34|0.63|0.81|0.4|52.47|0.95|
157+
|4|3.43|5.76|7.31|2.62|4.37|5.8|0.7|1.28|2.16|0.81|48.74|1.03|
158+
|8|7.27|10.01|10.45|6.4|8.55|9.03|1.28|2.67|3.66|1.57|47.37|1.06|
159+
160+
### NVIDIA GeForce RTX 5090
161+
non-stream
162+
163+
|num_workers|cost time 50%|cost time 90%|cost time 99%|rtf 50%|rtf 90%|rtf 99%|avg rtf|total_cost_time|qps|
164+
|------|------|------|------|------|------|------|------|------|------|
165+
|1|0.51|0.81|1.61|0.11|0.13|0.23|0.11|28.9|1.73|
166+
|2|0.64|1.1|1.48|0.13|0.16|0.26|0.13|17.54|2.85|
167+
|4|0.87|1.28|1.68|0.17|0.23|0.36|0.18|11.45|4.37|
168+
|8|1.32|1.86|2.14|0.25|0.4|0.6|0.29|8.97|5.57|
169+
170+
stream
171+
172+
|num_workers|cost time 50%|cost time 90%|cost time 99%|ttft 50%|ttft 90%|ttft 99%|rtf 50%|rtf 90%|rtf 99%|avg rtf|total_cost_time|qps|
173+
|------|------|------|------|------|------|------|------|------|------|------|------|------|
174+
|1|0.76|1.41|2.27|0.28|0.3|0.31|0.16|0.18|0.22|0.16|44.06|1.13|
175+
|2|1.45|2.34|3.46|0.74|1.28|1.75|0.27|0.45|0.7|0.3|38.82|1.29|
176+
|4|2.9|4.04|4.7|2.16|3.03|3.4|0.5|1.04|1.51|0.61|37.75|1.32|
177+
|8|5.78|7.74|8.49|5.01|6.73|7.35|1.03|2.09|2.85|1.22|37.67|1.33|
148178

149-
- bi-streaming. We use the websocket interface implementation, and we can find usage examples in `test/test_bistream.py`.
179+
**Metrics Explanation:**
180+
- **num_workers**: Number of concurrent workers
181+
- **cost time**: Total request processing time in seconds (50th/90th/99th percentile)
182+
- **ttft**: Time to First Token in seconds (50th/90th/99th percentile)
183+
- **rtf**: Real-Time Factor (50th/90th/99th percentile)
184+
- **avg rtf**: Average Real-Time Factor
185+
- **total_cost_time**: Total benchmark duration in seconds
186+
- **qps**: Queries Per Second
150187

151188
## License
152189

153190
This repository is released under the [Apache-2.0](LICENSE) license.
154191

155192
### Third-Party Code Attribution
156193

157-
This project includes code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) (Copyright Alibaba, Inc. and its affiliates), which is also licensed under Apache-2.0. The CosyVoice code is located in the `cosyvoice/` directory and has been integrated and modified as part of Light TTS. See [NOTICE](NOTICE) file for complete attribution details.
194+
This project includes code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) (Copyright Alibaba, Inc. and its affiliates), which is also licensed under Apache-2.0. The CosyVoice code is located in the `cosyvoice/` directory and has been integrated and modified as part of LightTTS. See the [NOTICE](NOTICE) file for complete attribution details.

light_tts/common/all_kernel_configs/triton_flashdecoding/{head_dim=64,kv_head_num=2,out_dtype=torch.float16,q_head_num=14}_NVIDIA_GeForce_RTX_5090.json

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)