VoiceTransor is an open-source speech-to-text and text assistant.
It provides a simple workflow to transcribe audio locally using Whisper, process text with AI, and export results in multiple formats.
- Import audio files
- Local transcription with Whisper (supports resume from interruption)
- AI-powered text processing
- Export results as TXT / PDF
- Cross-platform: Windows, macOS
- Python 3.10+
- FFmpeg installed and available in PATH
- Virtual environment (recommended)
VoiceTransor uses Ollama for local AI-powered text processing. This keeps your data completely private without sending it to the cloud.
Quick Setup:
- Windows: Run
scripts\setup\install_ollama.batfrom project root (automatic installation) - Manual installation: Download from ollama.com/download
- Start service: Run
ollama servein a terminal - Download a model: Run
ollama pull llama3.1:8b
For detailed setup instructions, see OLLAMA_SETUP_GUIDE.md (中文版).
Recommended Models:
llama3.1:8b- English (default, ~4.7GB)qwen2.5:7b- Balanced Chinese/English (~4.4GB)gemma2:9b- High quality (~5.4GB)
System Requirements:
- GPU mode: NVIDIA GPU with 8GB+ VRAM (recommended)
- CPU mode: 16GB+ RAM (slower but works)
git clone https://github.com/leonshen/VoiceTransor.git
cd VoiceTransor
pip install -r requirements.txtIf you want Whisper to use an NVIDIA GPU on Windows:
- Uninstall any existing CPU-only PyTorch wheels inside your virtualenv:
pip uninstall torch torchvision torchaudio -y
- Install the matching CUDA wheels (examples below use CUDA 12.1; pick the build that matches your driver):
pip install torch==2.3.0+cu121 torchvision==0.18.0+cu121 torchaudio==2.3.0+cu121 \ --index-url https://download.pytorch.org/whl/cu121 - Verify CUDA is detected before launching VoiceTransor:
The command should print
python -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"Trueand a CUDA version string.
Make sure the virtual environment is activated
python -m app.mainFor support or collaboration: voicetransor@gmail.com
MIT License. See LICENSE for details.