add examples/whisper.linux by nnnet · Pull Request #3656 · ggml-org/whisper.cpp

nnnet · 2026-02-08T14:28:48Z

PR: Add `examples/whisper.linux` — Voice typing for Linux desktop

Summary

Add a new example application for voice typing on Linux desktop. The app runs as a system tray icon, records speech from the microphone via pw-record/arecord, transcribes it using whisper-cli, and injects the resulting text at the cursor position via xdotool/wtype/clipboard.

Works on both X11 and Wayland (including GNOME, KDE, Sway)
Two input modes: hotkey (push-to-talk) and listen (wake word activation)
Two output modes: batch (transcribe all at once) and stream (transcribe per speech segment via VAD)
Voice commands: spoken words like "enter", "backspace", "tab" trigger key presses instead of being typed
Hallucination filter: rejects known training data leaks and checks speech rate sanity
Full system tray menu: language, model (with download), GPU, audio device, all settings
204 unit tests (all mocked, no hardware needed)
Install script: builds whisper-cli, downloads model, configures system

Features

Input / Output Modes

Two independent axes give 4 combinations:

	batch	stream
hotkey	Record → transcribe all → inject	Each speech segment transcribed and injected live
listen	Wake word → accumulate text → inject on stop	Wake word → each segment injected live

Voice Commands

Word (EN)	Word (RU)	Action
enter	энтер, ввод	Press Enter
backspace	бэкспейс, назад	Delete previous word
tab	таб, табуляция	Press Tab
escape, stop	эскейп, стоп	Press Escape

Commands use fuzzy matching (threshold 0.75), editable via tray menu.

Hallucination Filter

Two layers:

Pattern matching — rejects known whisper training data leaks ("subtitles by...", "спасибо за просмотр", etc.)
Speech rate check — rejects text that is impossibly long for the audio duration (max 5 words/sec, 25 chars/sec)

Text Injection (Wayland + X11)

Fallback chain: wtype → ydotool (with evdev key name translation) → xdotool (via XWayland) → clipboard paste.
Non-ASCII text (e.g. Cyrillic) always uses clipboard paste to avoid encoding issues.

System Tray

Full settings menu: language, model selection (with one-click download from Hugging Face), GPU device, audio device, input/output mode, wake word, silence timeout, voice commands editor.

Files

examples/whisper.linux/
  whisper-linux               # Launcher script
  install.sh                  # One-command install (deps, build, model, config)
  whisper-linux.desktop       # Desktop entry for autostart
  README.md                   # Documentation
  .gitignore                  # Exclude __pycache__, .pytest_cache
  app/                        # Python package
    __init__.py               # Re-exports public API
    __main__.py               # Entry point
    config.py                 # Config, AppState, constants, helpers
    audio.py                  # AudioRecorder, AudioStream, SimpleVAD
    transcriber.py            # Transcriber, WakeWordDetector
    injector.py               # TextInjector (xdotool/wtype/clipboard)
    commands.py               # VoiceCommands (fuzzy matching)
    tray.py                   # TrayIcon, system tray menu, settings UI
    app.py                    # WhisperLinuxApp, state machine, CLI
  tests/
    conftest.py               # Fixtures
    test_whisper_linux.py     # 204 unit tests (all mocked)
  run_tests.sh                # Run tests with one command

Dependencies

Python 3.10+, PyQt5
whisper-cli (built from this repo)
xdotool or wtype (text injection)
xclip or wl-copy (clipboard fallback)
arecord or pw-record (audio capture)

No additional Python packages beyond PyQt5.

Test plan

All 204 unit tests pass (python3 -m pytest tests/ -v)
Manual test: hotkey+batch mode (record → stop → text appears)
Manual test: hotkey+stream mode (speak → text appears per segment)
Manual test: listen+stream mode (wake word → dictate → text appears)
Manual test: voice commands (say "enter", "tab", "backspace")
Manual test: model download via tray menu
Manual test: keyboard shortcut setup (GNOME/KDE)
Test on X11
Test on Wayland (GNOME)

- Add global hotkey (Ctrl+Super) via pynput with config support - Fix display_server auto-detection (wayland → x11) - Fix text injection: use xdotool type directly (no clipboard pollution), fall back to clipboard with both PRIMARY+CLIPBOARD selections - Fix tray icon not updating from background threads (pyqtSignal instead of QTimer.singleShot) - Add blinking tray icon for RECORDING/DICTATING states - Fix wake word → DICTATING: save active window at wake time, play end signal on stop, start silence timer immediately - Reduce VAD trailing silence 300ms → 150ms for faster response - Redesign Settings menu: hotkey always active, wake word listening is an independent ON/OFF toggle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pynput's XRecord backend silently fails to receive keyboard events on this system. Switch to direct Xlib XRecord API which works reliably. - Replace pynput keyboard.Listener with Xlib XRecord context - Hotkey in listen mode now acts as manual dictation trigger: LISTENING → DICTATING on press, DICTATING → LISTENING on next press - Wake word continues to work in parallel with hotkey Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nnnet and others added 3 commits February 8, 2026 17:25

add examples/whisper.linux

d92c33b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add examples/whisper.linux#3656

add examples/whisper.linux#3656
nnnet wants to merge 3 commits intoggml-org:masterfrom
nnnet:whisper-linux

nnnet commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nnnet commented Feb 8, 2026

PR: Add examples/whisper.linux — Voice typing for Linux desktop

Summary

Features

Input / Output Modes

Voice Commands

Hallucination Filter

Text Injection (Wayland + X11)

System Tray

Files

Dependencies

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PR: Add `examples/whisper.linux` — Voice typing for Linux desktop