Welcome to VoiceTransor! This guide will help you get started with the application.
VoiceTransor is a speech-to-text application that:
- Converts audio files to text using AI (Whisper)
- Processes text with AI assistance (summarize, translate, etc.)
- Exports results as TXT or PDF files
- Works completely offline (your data stays on your computer)
VoiceTransor needs FFmpeg to process audio files.
Option 1: Automatic (Recommended)
- Download from: https://www.gyan.dev/ffmpeg/builds/
- Choose "ffmpeg-release-essentials.zip"
- Extract to
C:\ffmpeg - Add to PATH:
- Open "Environment Variables" (search in Start menu)
- Under "System variables", find "Path"
- Click "Edit" → "New"
- Add:
C:\ffmpeg\bin - Click "OK" on all windows
- Restart VoiceTransor
Option 2: Package Manager
# Using Chocolatey
choco install ffmpeg
# Using Scoop
scoop install ffmpeg# Using Homebrew (recommended)
brew install ffmpeg
# Verify installation
ffmpeg -version# Ubuntu/Debian
sudo apt install ffmpeg
# Fedora
sudo dnf install ffmpeg
# Arch Linux
sudo pacman -S ffmpegOllama enables AI-powered text processing (summarize, translate, etc.) without sending data to the cloud.
Windows:
- Run
install_ollama.bat(included with VoiceTransor) - Or download from: https://ollama.com/download
macOS:
# Download from: https://ollama.com/download
# Or use Homebrew
brew install ollamaLinux:
curl -fsSL https://ollama.com/install.sh | sh-
Start Ollama service:
ollama serve
-
Download a model (first time only):
# For English ollama pull llama3.1:8b # For Chinese/English ollama pull qwen2.5:7b
-
Models are ~4-5GB each and stored in:
- Windows:
%USERPROFILE%\.ollama\models - macOS/Linux:
~/.ollama/models
- Windows:
-
Launch VoiceTransor
-
Check FFmpeg (first time)
- Try importing an audio file
- If you see "ffprobe not found", install FFmpeg (see above)
-
Import Audio
- Click "Import Audio"
- Supported formats: WAV, MP3, M4A, FLAC, OGG, AAC
-
Transcribe
- Click "Transcribe to Text"
- Choose settings:
- Model:
tiny(fast) →base(balanced) →small(accurate) - Device:
auto- Let the app choosecuda- NVIDIA GPU (fastest)mps- Apple Silicon GPUcpu- Works on any computer (slower)
- Language: Auto-detect or select specific language
- Model:
-
Wait for Transcription
- First time downloads the model (~140MB - ~960MB)
- Progress bar shows ETA
- Can resume if interrupted
-
Save or Process
- Save as TXT
- Or use "Run Text Operation" with Ollama
-
Audio Quality
- Clear audio = better transcription
- Reduce background noise
- Use good microphone
-
Model Selection
tiny: Fast, good for clear speechbase: Best balance (recommended)small: Most accurate, slower
-
Device Selection
- GPU is 10-50x faster than CPU
- First transcription is slower (model loading)
- Subsequent ones are faster
-
Summarize
- Select "Summarize" preset
- Or write custom prompt
-
Translate
- Select "Translate to Chinese" preset
- Or specify target language
-
Custom Operations
- Write what you want in the Prompt panel
- Examples:
- "Extract action items from this meeting"
- "Create a blog post from this transcript"
- "Fix grammar and spelling errors"
Ctrl/Cmd + +/-: Zoom in/outCtrl/Cmd + 0: Reset zoom
- Go to View → Language
- Available: English, 简体中文
- Changes apply immediately
Cause: FFmpeg is not installed or not in PATH
Solution:
- Install FFmpeg (see "Before You Start" section)
- Make sure to add FFmpeg to PATH
- Restart VoiceTransor
- Test: Open terminal and run
ffmpeg -version
Cause: Ollama service is not started
Solution:
- Open terminal/command prompt
- Run:
ollama serve - Keep terminal open
- Try text operation again
Alternative: Set Ollama to start automatically:
- Windows: Create scheduled task
- macOS: Add to login items
- Linux: Use systemd service
Possible causes:
- Using CPU instead of GPU
- Solution: Select
cudaormpsdevice
- Solution: Select
- Large audio file
- Solution: Split into smaller files
- Computer resources busy
- Solution: Close other applications
Cause: Network issues or insufficient disk space
Solution:
- Check internet connection
- Ensure 5GB+ free disk space
- Check firewall settings
- Try again later
Windows:
- Right-click → "Run as Administrator"
- Allow in Windows Defender
macOS:
- Go to System Preferences → Security & Privacy
- Click "Open Anyway"
Possible causes:
- Wrong language selected
- Solution: Use "Auto" or select correct language
- Poor audio quality
- Solution: Use clearer audio
- Wrong model size
- Solution: Try larger model (
smallinstead oftiny)
- Solution: Try larger model (
- OS: Windows 10 / macOS 10.15 / Linux
- RAM: 4GB
- Disk: 2GB free space
- Processor: Any modern CPU
- RAM: 8GB+ (16GB for Ollama)
- GPU: NVIDIA GPU with 4GB+ VRAM or Apple Silicon
- Disk: 10GB free space (for models)
If you encounter issues:
- Check this guide - Most common issues are covered
- Check logs - Look for error messages in the app
- GitHub Issues - https://github.com/leonshen/VoiceTransor/issues
- Email Support - voicetransor@gmail.com
- Whisper Models: https://github.com/openai/whisper
- Ollama: https://ollama.com
- FFmpeg: https://ffmpeg.org
Thank you for using VoiceTransor! 🎉