Skip to content

Commit 8e48675

Browse files
authored
Merge pull request #22 from zyumo777/main
docs: 添加 Piper TTS 安装指南 AND docs: 添加 Fire Red ASR 模型使用教程
2 parents 7657a86 + 1f2390a commit 8e48675

File tree

4 files changed

+200
-0
lines changed

4 files changed

+200
-0
lines changed

docs/user-guide/backend/asr.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,53 @@ uv add onnxruntime-gpu==1.17.1 sherpa-onnx==1.10.39+cuda -f https://k2-fsa.githu
8585
2. 将模型文件放置在项目的 `models` 目录下
8686
3. 按照 `conf.yaml` 中的说明修改 `sherpa_onnx_asr` 的相关配置
8787

88+
### 使用 Fire Red ASR 模型
89+
90+
[Fire Red ASR](https://github.com/FireRedTeam/FireRedASR) 是一个高质量的中英文语音识别模型,在 sherpa-onnx 中也得到了支持。相比默认的 SenseVoiceSmall 模型,Fire Red ASR 在中英文混合场景下表现更好。
91+
92+
#### 推荐用户
93+
- 需要高质量中英文混合识别的用户
94+
- 对识别准确度要求较高的用户
95+
- 配置难度: 简单
96+
97+
#### 下载模型
98+
99+
首先确保安装了 `huggingface_hub`,以便使用命令行下载模型:
100+
101+
```sh
102+
uv add huggingface_hub
103+
```
104+
105+
使用 huggingface-cli 下载模型:
106+
107+
```sh
108+
uv run hf download csukuangfj/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16 --local-dir models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16
109+
```
110+
111+
#### 配置使用
112+
113+
`conf.yaml` 中配置 Fire Red ASR 模型:
114+
115+
```yaml
116+
asr_config:
117+
asr_model: 'sherpa_onnx_asr'
118+
119+
sherpa_onnx_asr:
120+
model_type: 'fire_red_asr'
121+
122+
fire_red_asr_encoder: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx'
123+
fire_red_asr_decoder: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx'
124+
tokens: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt'
125+
126+
num_threads: 4
127+
provider: 'cpu' # 可选 'cpu' 或 'cuda'
128+
use_itn: False
129+
```
130+
131+
:::info
132+
如果您使用 CUDA 推理,建议下载 fp16 版本的模型以获得更好的效果。将上述配置中的 `encoder.int8.onnx` 和 `decoder.int8.onnx` 替换为对应的 fp16 版本文件即可。
133+
:::
134+
88135
## `fun_asr` (本地)
89136

90137
[FunASR](https://github.com/modelscope/FunASR?tab=readme-ov-file) 是 ModelScope 的一个基础端到端语音识别工具包,支持多种 ASR 模型。其中,阿里的 [FunAudioLLM](https://github.com/FunAudioLLM/SenseVoice) 的 SenseVoiceSmall 模型在性能和速度上都表现不错。

docs/user-guide/backend/tts.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,62 @@ sherpa-onnx 是一个强大的推理引擎,支持多种 TTS 模型(包括 Me
2020
如需使用 GPU 推理(仅支持 CUDA),请参考 [CUDA推理](/docs/user-guide/backend/asr#cuda-推理)
2121
:::
2222

23+
## Piper TTS(本地 & 轻量快速)
24+
Piper 是一个快速、本地化的神经网络文本转语音系统,支持多种语言和声音。使用预训练的 ONNX 模型,可在 CPU 上实现实时语音合成。
25+
26+
### 安装步骤
27+
1. 安装 piper-tts:
28+
```sh
29+
uv pip install piper-tts
30+
```
31+
32+
2. 下载模型文件:
33+
- Piper 需要使用经过训练的 ONNX 模型文件来进行语音生成
34+
- **推荐模型**
35+
- `zh_CN-huayan-medium` - 中文(普通话)
36+
- `en_US-lessac-medium` - 英文
37+
- `ja_JP-natsuya-medium` - 日文
38+
39+
- **下载方式**
40+
- 方式一:手动下载
41+
- 中文模型:[https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main](https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main)
42+
- 其他模型:在 [Hugging Face](https://huggingface.co/models) 搜索 "piper" 或自行训练
43+
- 方式二:使用命令自动下载(不推荐)
44+
```sh
45+
python -m piper.download_voices zh_CN-huayan-medium
46+
```
47+
48+
49+
- **文件存放**
50+
- 下载 `.onnx``.onnx.json` 两个文件到 `models/piper/` 目录
51+
52+
3. 在 `conf.yaml` 中配置:
53+
```yaml
54+
piper_tts:
55+
model_path: "models/piper/zh_CN-huayan-medium.onnx" # ONNX 模型文件路径
56+
speaker_id: 0 # 多说话人模型的说话人 ID(单说话人模型使用 0)
57+
length_scale: 1.0 # 语速控制(1.0 为正常速度,>1.0 更慢,<1.0 更快)
58+
noise_scale: 0.667 # 音频变化程度(0.0-1.0)
59+
noise_w: 0.8 # 说话风格变化程度(0.0-1.0)
60+
volume: 1.0 # 音量(0.0-1.0)
61+
normalize_audio: true # 是否标准化音频
62+
use_cuda: false # 是否使用 GPU 加速(需要 CUDA 支持)
63+
```
64+
65+
4. 在 `conf.yaml` 中设置 `tts_model: piper_tts`
66+
67+
### 特点
68+
- ✅ 完全本地化,无需网络连接
69+
- ✅ CPU 实时推理,速度快
70+
- ✅ 支持多种语言和声音
71+
- ✅ 支持 GPU 加速(可选)
72+
- ✅ 模型文件小,易于部署
73+
74+
:::tip
75+
如需更多模型选择,可访问 [Piper 语音样本页面](https://rhasspy.github.io/piper-samples/) 试听并下载不同语言和声音的模型。
76+
:::
77+
78+
2379
## pyttsx3(轻量快速)
2480
简单易用的本地 TTS 引擎,使用系统默认语音合成器。使用 `py3-tts` 而不是更著名的 `pyttsx3`,因为 `pyttsx3` 似乎无人维护,且在测试电脑上无法运行。
2581

i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/asr.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,53 @@ If you want to try other speech recognition models:
120120
3. Modify the relevant configuration of `sherpa_onnx_asr` according to the instructions in `conf.yaml`
121121
3. Modify the relevant configuration of `sherpa_onnx_asr` according to the instructions in `conf.yaml`
122122

123+
### Using Fire Red ASR Model
124+
125+
[Fire Red ASR](https://github.com/FireRedTeam/FireRedASR) is a high-quality Chinese-English speech recognition model that is also supported in sherpa-onnx. Compared to the default SenseVoiceSmall model, Fire Red ASR performs better in Chinese-English mixed scenarios.
126+
127+
#### Recommended Users
128+
- Users who need high-quality Chinese-English mixed recognition
129+
- Users with high requirements for recognition accuracy
130+
- Configuration difficulty: Simple
131+
132+
#### Download Model
133+
134+
First, ensure `huggingface_hub` is installed to use the command line for downloading models:
135+
136+
```sh
137+
uv add huggingface_hub
138+
```
139+
140+
Use huggingface-cli to download the model:
141+
142+
```sh
143+
uv run hf download csukuangfj/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16 --local-dir models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16
144+
```
145+
146+
#### Configuration
147+
148+
Configure the Fire Red ASR model in `conf.yaml`:
149+
150+
```yaml
151+
asr_config:
152+
asr_model: 'sherpa_onnx_asr'
153+
154+
sherpa_onnx_asr:
155+
model_type: 'fire_red_asr'
156+
157+
fire_red_asr_encoder: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx'
158+
fire_red_asr_decoder: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx'
159+
tokens: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt'
160+
161+
num_threads: 4
162+
provider: 'cpu' # Options: 'cpu' or 'cuda'
163+
use_itn: False
164+
```
165+
166+
:::info
167+
If you're using CUDA inference, it's recommended to download the fp16 version of the model for better results. Replace `encoder.int8.onnx` and `decoder.int8.onnx` with their fp16 counterparts in the configuration above.
168+
:::
169+
123170
## `fun_asr` (Local)
124171

125172
[FunASR](https://github.com/modelscope/FunASR?tab=readme-ov-file) is a fundamental end-to-end speech recognition toolkit from ModelScope that supports various ASR models. Among them, Alibaba's [FunAudioLLM](https://github.com/FunAudioLLM/SenseVoice) SenseVoiceSmall model performs well in both performance and speed.

i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,56 @@ sherpa-onnx is a powerful inference engine that supports multiple TTS models (in
1919
For GPU inference (CUDA only), please refer to [CUDA Inference](/docs/user-guide/backend/asr#cuda-inference).
2020
:::
2121

22+
## Piper TTS (Local & Lightweight and Fast)
23+
Piper is a fast, local neural text-to-speech system that supports multiple languages and voices. It uses pre-trained ONNX models and can achieve real-time speech synthesis on CPU.
24+
25+
### Installation Steps
26+
1. Install piper-tts:
27+
```sh
28+
uv pip install piper-tts
29+
```
30+
31+
2. Download model files:
32+
- Piper requires trained ONNX model files for speech generation
33+
- **Recommended models**:
34+
- `zh_CN-huayan-medium` - Chinese (Mandarin)
35+
- `en_US-lessac-medium` - English
36+
- `ja_JP-natsuya-medium` - Japanese
37+
38+
- **Download methods**:
39+
- Method : Manual download
40+
- Chinese model: [https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main](https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main)
41+
- Other models: Search "piper" on [Hugging Face](https://huggingface.co/models) or train your own
42+
43+
- **File placement**:
44+
- Download both `.onnx` and `.onnx.json` files to the `models/piper/` directory
45+
46+
3. Configure in `conf.yaml`:
47+
```yaml
48+
piper_tts:
49+
model_path: "models/piper/zh_CN-huayan-medium.onnx" # ONNX model file path
50+
speaker_id: 0 # Speaker ID for multi-speaker models (use 0 for single-speaker models)
51+
length_scale: 1.0 # Speech rate control (1.0 = normal, >1.0 = slower, <1.0 = faster)
52+
noise_scale: 0.667 # Audio variation level (0.0-1.0)
53+
noise_w: 0.8 # Speaking style variation level (0.0-1.0)
54+
volume: 1.0 # Volume (0.0-1.0)
55+
normalize_audio: true # Whether to normalize audio
56+
use_cuda: false # Whether to use GPU acceleration (requires CUDA support)
57+
```
58+
59+
1. Set `tts_model: piper_tts` in `conf.yaml`
60+
61+
### Features
62+
- ✅ Completely local, no internet connection required
63+
- ✅ Real-time CPU inference, fast speed
64+
- ✅ Supports multiple languages and voices
65+
- ✅ Supports GPU acceleration (optional)
66+
- ✅ Small model files, easy to deploy
67+
68+
:::tip
69+
For more model options, visit the [Piper Voice Samples page](https://rhasspy.github.io/piper-samples/) to listen and download models for different languages and voices.
70+
:::
71+
2272
## pyttsx3 (Lightweight and Fast)
2373
A simple and easy-to-use local TTS engine that uses the system's default speech synthesizer. We use `py3-tts` instead of the more famous `pyttsx3` because `pyttsx3` seems unmaintained and failed to run on the test computer.
2474

0 commit comments

Comments
 (0)