You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add --yes/-y to train.py to auto-confirm download/prepare (no EOF in non-interactive runs)
- Add --copy-only to run_mfa_alignment_prepared.sh to copy alignment from cache only (fixes Step 5 with 100k+ files via find -print0)
- Dataset manager: run MFA with --copy-only when alignment cache exists; require JSONs for 'prepared'; skip corpus creation when WAV+LAB already exist
- QUICKSTART: AMD GPU (ROCm) section, note about uv run reverting to CUDA wheel
- SSH multiplexing rule for ai-tools LXC
Made-with: Cursor
Copy file name to clipboardExpand all lines: QUICKSTART.md
+69-3Lines changed: 69 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,16 +83,82 @@ To train for **UK English** (British phoneme set and viseme mapping):
83
83
84
84
The UK recipe uses `training/configs/viseme_map_en_uk_mfa.json`, which maps the UK MFA phone set (IPA-style symbols) to the same 15 visemes. When prompted to download/prepare data, answer **`y`**; alignment will run with the UK model.
85
85
86
+
## 4c. Full training (production ONNX)
87
+
88
+
The quick recipes (4 and 4b) use **dev-clean** only and produce a small ONNX suitable for testing. For a **production-quality** model you need to train on the **full** LibriSpeech training sets and then export to ONNX.
89
+
90
+
**Data:** Full training uses LibriSpeech **train-clean-100** (~6GB), **train-clean-360** (~23GB), and **train-other-500** (~30GB). The first time you run, the script will prompt to download and prepare these; preparation (WAV + MFA alignment) takes a long time per split. **GPU optional:** the recipes default to `device = "cpu"` so training runs without a GPU; for much faster training set `[hardware] device = "cuda"` in the recipe (or `mps` on Apple Silicon).
91
+
92
+
**US English (full):**
93
+
94
+
```bash
95
+
uv run python training/train.py --config training/recipes/tcn_config.toml
96
+
```
97
+
98
+
When prompted to download and prepare missing datasets, answer **`y`**. Each split (train-clean-100, train-clean-360, train-other-500) will be downloaded, converted, and aligned with MFA (US) in turn. Training runs for up to 100 epochs with early stopping.
uv run python training/train.py --config training/recipes/tcn_full_uk.toml
108
+
```
109
+
110
+
Answer **`y`** when asked to download and prepare datasets. Alignment will use the UK dictionary.
111
+
112
+
**Export to ONNX:** After training, export the best checkpoint so the realtime harness and C# app can use it:
113
+
114
+
```bash
115
+
uv run python training/tools/export_onnx.py --list
116
+
uv run python training/tools/export_onnx.py --run <run_name> --checkpoint best
117
+
```
118
+
119
+
`--list` shows available runs under `training/runs/`. Use the run name (e.g. `tcn_full_uk_2026-02-21_12-00-00`) with `--run`. The export writes to `export/<run_name>/` (model.onnx and config.json). The realtime script and C# app pick the newest `export/*/model.onnx` by default.
120
+
121
+
**Smaller full run:** To try full training with less data, edit the recipe and set e.g. `splits = ["train-clean-100"]` (100h only). Use `training/recipes/tcn_config.toml` (US) or `training/recipes/tcn_full_uk.toml` (UK).
122
+
86
123
## 5. Optional: use GPU
87
124
88
-
Edit `training/recipes/tcn_quick_laptop.toml` and set:
125
+
Edit the recipe (e.g. `training/recipes/tcn_quick_laptop.toml` or `tcn_config.toml`) and set:
89
126
90
127
```toml
91
128
[hardware]
92
-
device = "cuda"# or "mps" on Apple Silicon
129
+
device = "cuda"# NVIDIA GPU, or AMD GPU with ROCm (same API)
130
+
# device = "mps" # Apple Silicon
93
131
```
94
132
95
-
If CUDA/MPS isn’t available, the trainer falls back to CPU and logs a warning.
133
+
If CUDA/ROCm/MPS isn’t available, the trainer falls back to CPU and logs a warning.
134
+
135
+
### 5b. AMD GPU (ROCm)
136
+
137
+
The default `uv sync` installs PyTorch built for **NVIDIA CUDA**. On a machine with an **AMD GPU** (e.g. Radeon RX 7700/7800, Navi 32), you need PyTorch built for **ROCm** so that `torch.cuda.is_available()` is True (ROCm uses the same `torch.cuda` API).
138
+
139
+
**1. Ensure the GPU is visible**
140
+
141
+
- Kernel driver: `/dev/kfd` and `/dev/dri/renderD*` should exist (amdgpu driver).
142
+
- Your user must be in the `render` (and usually `video`) group so the process can open those devices:
143
+
`groups` should list `render`; if not, add with `sudo usermod -aG render,video $USER` and log in again.
144
+
145
+
**2. Install PyTorch with ROCm**
146
+
147
+
From the project root, override the default torch/torchaudio with the ROCm wheels. Use the index that matches your ROCm version (see [PyTorch get-started](https://pytorch.org/get-started/locally/) and choose Linux → Pip → ROCm). Example for ROCm 6.3:
If your distro uses a different ROCm version, use the matching index (e.g. `rocm5.6`, `rocm6.2`). Python 3.13 may not have ROCm wheels on all indices; if so, try the [AMD ROCm docs](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/) or PyTorch “Previous versions” for a compatible wheel.
154
+
155
+
**3. Use the GPU in training**
156
+
157
+
In the recipe set `device = "cuda"` (same as for NVIDIA). Then run training as usual; the trainer will use the AMD GPU via ROCm.
158
+
159
+
**Verify:**`uv run python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0) if torch.cuda.is_available() else '')"` should print `True` and the GPU name.
160
+
161
+
**Note:** If you use `uv` and install ROCm via `uv pip install ... --index-url ...rocm6.3`, then `uv run` will re-sync from the lock file and can revert to the default CUDA wheel. To keep using the GPU, run training with the venv Python directly, e.g. `.venv/bin/python training/train.py --config ...`, or a wrapper script that calls `.venv/bin/python`.
`uv run python training/train.py --config training/recipes/tcn_quick_laptop.toml`
32
+
33
+
**Full training (production ONNX):** See [QUICKSTART.md](QUICKSTART.md) section 4c. Use `training/recipes/tcn_config.toml` (US) or `training/recipes/tcn_full_uk.toml` (UK), then export with `training/tools/export_onnx.py --run <run_name> --checkpoint best`.
31
34
32
35
33
36
This project uses the [LibriSpeech ASR corpus](https://openslr.org/12/) (CC BY 4.0 license).
0 commit comments