VoiceTransor

VoiceTransor is an open-source speech-to-text and text assistant.
It provides a simple workflow to transcribe audio locally using Whisper, process text with AI, and export results in multiple formats.

✨ Features

Import audio files
Local transcription with Whisper (supports resume from interruption)
AI-powered text processing
Export results as TXT / PDF
Cross-platform: Windows, macOS

🚀 Installation

Prerequisites

Python 3.10+
FFmpeg installed and available in PATH
Virtual environment (recommended)

Configure AI Text Processing

VoiceTransor uses Ollama for local AI-powered text processing. This keeps your data completely private without sending it to the cloud.

Quick Setup:

Windows: Run scripts\setup\install_ollama.bat from project root (automatic installation)
Manual installation: Download from ollama.com/download
Start service: Run ollama serve in a terminal
Download a model: Run ollama pull llama3.1:8b

For detailed setup instructions, see OLLAMA_SETUP_GUIDE.md (中文版).

Recommended Models:

llama3.1:8b - English (default, ~4.7GB)
qwen2.5:7b - Balanced Chinese/English (~4.4GB)
gemma2:9b - High quality (~5.4GB)

System Requirements:

GPU mode: NVIDIA GPU with 8GB+ VRAM (recommended)
CPU mode: 16GB+ RAM (slower but works)

Setup

git clone https://github.com/leonshen/VoiceTransor.git
cd VoiceTransor
pip install -r requirements.txt

Windows GPU (CUDA) setup

If you want Whisper to use an NVIDIA GPU on Windows:

Uninstall any existing CPU-only PyTorch wheels inside your virtualenv:
```
pip uninstall torch torchvision torchaudio -y
```

Install the matching CUDA wheels (examples below use CUDA 12.1; pick the build that matches your driver):

pip install torch==2.3.0+cu121 torchvision==0.18.0+cu121 torchaudio==2.3.0+cu121 \
    --index-url https://download.pytorch.org/whl/cu121

Verify CUDA is detected before launching VoiceTransor:
```
python -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"
```
The command should print True and a CUDA version string.

Run

Make sure the virtual environment is activated

python -m app.main

📧 Contact

For support or collaboration: voicetransor@gmail.com

📜 License

MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VoiceTransor

✨ Features

🚀 Installation

Prerequisites

Configure AI Text Processing

Setup

Windows GPU (CUDA) setup

Run

📧 Contact

📜 License

FilesExpand file tree

README_DEV.md

Latest commit

History

README_DEV.md

File metadata and controls

VoiceTransor

✨ Features

🚀 Installation

Prerequisites

Configure AI Text Processing

Setup

Windows GPU (CUDA) setup

Run

📧 Contact

📜 License