Whisper is a local speech recognition service that converts audio to text for VoiceMode using OpenAI's Whisper model. It provides offline STT capabilities with various model sizes to balance speed and accuracy.
# Install whisper service with default base model (includes Core ML on Apple Silicon!)
voicemode whisper service install
# Install with a different model
voicemode whisper service install --model large-v3
# List available models and their status
voicemode whisper model --all
# Switch to a different model (auto-installs if needed)
voicemode whisper model large-v2
# Start the service
voicemode whisper service startApple Silicon Bonus: On M1/M2/M3/M4 Macs, VoiceMode automatically downloads pre-built Core ML models for 2-3x faster performance. No Xcode or Python dependencies required!
Default endpoint: http://127.0.0.1:2022/v1
VoiceMode includes an installation tool that sets up Whisper.cpp automatically:
# Install with default base model (142MB) - good balance of speed and accuracy
voicemode whisper service install
# Install with a specific model
voicemode whisper service install --model smallThis will:
- Clone and build Whisper.cpp with GPU support (if available)
- Download the specified model (default: base)
- On Apple Silicon: Automatically download pre-built Core ML models for 2-3x faster performance
- Create a start script with environment variable support
- Set up automatic startup (launchd on macOS, systemd on Linux)
# Install via Homebrew
brew install whisper.cpp
# Download model
mkdir -p ~/.voicemode/models/whisper
cd ~/.voicemode/models/whisper
curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v2.bin# Clone and build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make
# Download model
mkdir -p ~/.voicemode/models/whisper
cd ~/.voicemode/models/whisper
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v2.binmacOS:
- Xcode Command Line Tools (
xcode-select --install) - Only for building whisper.cpp - Homebrew (https://brew.sh)
- cmake (
brew install cmake)
Note for Apple Silicon users: Core ML models are pre-built and downloaded automatically. No Xcode, PyTorch, or coremltools required!
Linux:
- Build essentials (
sudo apt install build-essentialon Ubuntu/Debian)
On Apple Silicon Macs (M1/M2/M3/M4), VoiceMode automatically downloads pre-built Core ML models from Hugging Face for 2-3x faster transcription:
- Automatic: Core ML models download alongside regular models
- No Dependencies: No PyTorch, Xcode, or coremltools needed
- Pre-built: Models are pre-compiled and ready to use
- Performance: 2-3x faster than Metal acceleration alone
Core ML models are included automatically when available. The installation process handles this transparently.
| Model | Size | RAM Usage | Accuracy | Speed | Language Support |
|---|---|---|---|---|---|
| tiny | 39 MB | ~390 MB | Low | Fastest | Multilingual |
| tiny.en | 39 MB | ~390 MB | Low | Fastest | English only |
| base | 142 MB | ~500 MB | Fair | Fast | Multilingual |
| base.en | 142 MB | ~500 MB | Fair | Fast | English only |
| small | 466 MB | ~1 GB | Good | Moderate | Multilingual |
| small.en | 466 MB | ~1 GB | Good | Moderate | English only |
| medium | 1.5 GB | ~2.6 GB | Very Good | Slow | Multilingual |
| medium.en | 1.5 GB | ~2.6 GB | Very Good | Slow | English only |
| large-v1 | 2.9 GB | ~3.9 GB | Excellent | Slower | Multilingual |
| large-v2 | 2.9 GB | ~3.9 GB | Excellent | Slower | Multilingual (recommended) |
| large-v3 | 3.1 GB | ~3.9 GB | Best | Slowest | Multilingual |
| large-v3-turbo | 1.6 GB | ~2.5 GB | Very Good | Moderate | Multilingual |
# List all models with installation status
voicemode whisper model --all
# Show current active model
voicemode whisper model
# Switch to a model (auto-installs if not present)
voicemode whisper model small.en
# Switch model without auto-installing (fails if model not installed)
voicemode whisper model medium --no-install
# Switch model without restarting service
voicemode whisper model large-v2 --no-restartNote: After changing the active model with --no-restart, restart the whisper service manually for changes to take effect.
Configure in ~/.voicemode/voicemode.env:
VOICEMODE_WHISPER_MODEL=large-v2
VOICEMODE_WHISPER_PORT=2022
VOICEMODE_WHISPER_THREADS= # Auto-detected if not set
VOICEMODE_WHISPER_LANGUAGE=auto
VOICEMODE_WHISPER_MODEL_PATH=~/.voicemode/models/whisperThread Configuration: By default, VoiceMode auto-detects the number of CPU cores and configures threads accordingly. You can override this by setting VOICEMODE_WHISPER_THREADS to a specific number.
whisper-server \
--model models/ggml-large-v2.bin \
--host 127.0.0.1 \
--port 2022 \
--inference-path "/v1/audio/transcriptions" \
--threads 4 \
--processors 1 \
--convert \
--print-progressKey options:
--model: Path to model file--host: Server host (default: 127.0.0.1)--port: Server port (VoiceMode expects 2022)--inference-path: OpenAI-compatible endpoint path--threads: Number of threads for processing (auto-detected by VoiceMode)--processors: Number of parallel processors--convert: Convert audio to required format automatically (required for VoiceMode)--print-progress: Show transcription progress
Note: When using VoiceMode's managed service, threads are auto-detected based on your CPU cores. The --convert flag is required for VoiceMode to work correctly with various audio formats.
# Start/stop service
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper.plist
# Enable/disable at startup
launchctl load -w ~/Library/LaunchAgents/com.voicemode.whisper.plist
launchctl unload -w ~/Library/LaunchAgents/com.voicemode.whisper.plist
# Check status
launchctl list | grep whisper# Start/stop service
systemctl --user start whisper
systemctl --user stop whisper
# Enable/disable at startup
systemctl --user enable whisper
systemctl --user disable whisper
# Check status and logs
systemctl --user status whisper
journalctl --user -u whisper -fCoreML provides 2-3x faster transcription on Apple Silicon Macs:
# Performance comparison
# CPU Only: ~1x baseline
# Metal: ~3-4x faster
# CoreML + Metal: ~8-12x faster
Core ML models are downloaded automatically when installing Whisper on Apple Silicon. No additional configuration needed.
The installation tool automatically detects and enables:
- Mac (Apple Silicon): Metal acceleration
- NVIDIA GPU: CUDA acceleration
- CPU: Optimized CPU builds
VoiceMode automatically detects Whisper when available:
- First: Checks for Whisper.cpp on
http://127.0.0.1:2022/v1 - Fallback: Uses OpenAI API (requires
OPENAI_API_KEY)
To use a different endpoint or force Whisper use:
export STT_BASE_URL=http://127.0.0.1:2022/v1Or in MCP configuration:
"voicemode": {
...
"env": {
"STT_BASE_URL": "http://127.0.0.1:2022/v1"
}
}For completely offline voice processing, combine Whisper with Kokoro:
export STT_BASE_URL=http://127.0.0.1:2022/v1 # Whisper for STT
export TTS_BASE_URL=http://127.0.0.1:8880/v1 # Kokoro for TTS
export TTS_VOICE=af_sky # Kokoro voice- Check if port 2022 is already in use:
lsof -i :2022 - Verify model file exists at configured path
- Check service logs for error messages
- Try a larger model (base → small → medium → large)
- Ensure audio input quality is good
- Set specific language instead of 'auto' if known
- Use a smaller model for better performance
- Consider English-only models (.en) for English content
- Enable GPU acceleration if available
- Verify adequate disk space (models range from 39MB to 3GB)
- Check network connectivity to Hugging Face
- Delete corrupted model files from
~/.voicemode/models/whisper/and re-run the model command
# Check service status
voicemode whisper service status
# Monitor real-time processing
tail -f ~/.voicemode/services/whisper/logs/performance.log
# List available models
voicemode whisper model --all- Models:
~/.voicemode/models/whisper/or~/.voicemode/services/whisper/models/ - Service Config:
~/.voicemode/services/whisper/config.json - Model Preferences:
~/.voicemode/whisper-models.txt - Logs:
~/.voicemode/services/whisper/logs/ - LaunchAgent (macOS):
~/Library/LaunchAgents/com.voicemode.whisper.plist - Systemd Service (Linux):
~/.config/systemd/user/whisper.service