Built for PiperTTS (custom model training) with a simple file structure:
dataset.zip/
├── metadata.csv
└── wavs/
├── <name>_processed<index>.wav
└── ...Works with LJSpeech-compatible TTS engines too!
metadata.csv looks like:
wav_filename|transcript
wavs/<name>_processed<index>.wav|<transcript>
...Tested on Ubuntu Server 22.04 (Python 3.10.12) and Ubuntu Desktop 24.04 (Python 3.12.3). Should run fine on Debian-based systems.
No (official) Windows support, sorry!
-
Clone it:
git clone https://github.com/DominicTWHV/LJSpeech_Dataset_Generator.git
-
Setup:
cd LJSpeech_Dataset_Generator chmod +x pipeline.sh -
Run it:
./pipeline.sh
Then hop onto the Gradio WebUI @ port 7860. The server listens on 0.0.0.0:7860 by default. For local use, connect at https://127.0.0.1:7860/ .
Move dataset.zip to your training directory:
mv /output/dataset.zip /path/to/training/dir
unzip dataset.zipOr just download it via the WebUI.
File permission issues? Missing files? Check script permissions or background processes.
Still stuck? Feel free to drop a note in the issues tab.