|
| 1 | +# Whisper Transcriber |
| 2 | + |
| 3 | +Android floating overlay app for voice-to-text using a self-hosted [whisper.cpp](https://github.com/ggml-org/whisper.cpp) server. Tap the bubble, speak, and the transcription is typed directly into whatever text field you're using. |
| 4 | + |
| 5 | +Works over Tailscale / ZeroTier — just point it at your server's VPN IP. |
| 6 | + |
| 7 | +## How it works |
| 8 | + |
| 9 | +1. A floating bubble sits over all apps (like Messenger chat heads) |
| 10 | +2. Tap to start recording, tap again to stop |
| 11 | +3. Audio is sent to your whisper.cpp server via the OpenAI-compatible API (`/v1/audio/transcriptions`) |
| 12 | +4. Transcribed text is automatically typed into the focused input field (or copied to clipboard) |
| 13 | + |
| 14 | +## Setup |
| 15 | + |
| 16 | +### Server |
| 17 | + |
| 18 | +Run [whisper.cpp server](https://github.com/ggml-org/whisper.cpp) on your machine: |
| 19 | + |
| 20 | +```bash |
| 21 | +./whisper-server -m models/ggml-base.en.bin --port 8080 |
| 22 | +``` |
| 23 | + |
| 24 | +### App |
| 25 | + |
| 26 | +1. Install the APK (grab from [Actions artifacts](../../actions) or build yourself) |
| 27 | +2. Open the app, go to **Settings**, enter your server URL (e.g. `http://10.147.20.13:8080`) |
| 28 | +3. Grant permissions when prompted: |
| 29 | + - **Microphone** — for recording audio |
| 30 | + - **Display over other apps** — for the floating bubble |
| 31 | + - **Notifications** — for the foreground service |
| 32 | +4. Enable the **Whisper Transcriber** accessibility service in Android Settings → Accessibility (needed to type into other apps' text fields) |
| 33 | +5. Tap **Start Overlay** — the floating bubble appears |
| 34 | + |
| 35 | +### Permissions |
| 36 | + |
| 37 | +| Permission | Why | |
| 38 | +|---|---| |
| 39 | +| `RECORD_AUDIO` | Capture voice from microphone | |
| 40 | +| `SYSTEM_ALERT_WINDOW` | Floating bubble overlay | |
| 41 | +| `FOREGROUND_SERVICE` | Keep the overlay alive | |
| 42 | +| `INTERNET` | Send audio to whisper server | |
| 43 | +| `POST_NOTIFICATIONS` | Foreground service notification (Android 13+) | |
| 44 | +| Accessibility Service | Type transcription into focused text fields | |
| 45 | + |
| 46 | +## Building |
| 47 | + |
| 48 | +### With Nix (CI uses this) |
| 49 | + |
| 50 | +```bash |
| 51 | +nix develop --command ./gradlew assembleDebug |
| 52 | +``` |
| 53 | + |
| 54 | +The `flake.nix` provides JDK 17 + Android SDK (platform 34, build-tools 34.0.0). |
| 55 | + |
| 56 | +### Without Nix |
| 57 | + |
| 58 | +Requires JDK 17 and Android SDK with platform 34: |
| 59 | + |
| 60 | +```bash |
| 61 | +export ANDROID_HOME=/path/to/android/sdk |
| 62 | +./gradlew assembleDebug |
| 63 | +``` |
| 64 | + |
| 65 | +APKs end up in `app/build/outputs/apk/`. |
| 66 | + |
| 67 | +## CI |
| 68 | + |
| 69 | +GitHub Actions builds debug + release APKs on every push using a self-hosted NixOS runner. Artifacts are retained for 7 days, older ones are cleaned up automatically. |
| 70 | + |
| 71 | +## Project structure |
| 72 | + |
| 73 | +``` |
| 74 | +app/src/main/java/com/whispertranscriber/ |
| 75 | +├── MainActivity.kt # Home screen, nav, permissions |
| 76 | +├── audio/ |
| 77 | +│ └── AudioRecorder.kt # Mic recording → WAV conversion |
| 78 | +├── data/ |
| 79 | +│ ├── SettingsStore.kt # DataStore-backed preferences |
| 80 | +│ └── TranscriptionLog.kt # Transcription history (last 100) |
| 81 | +├── network/ |
| 82 | +│ └── WhisperApiClient.kt # OkHttp multipart POST to whisper server |
| 83 | +├── service/ |
| 84 | +│ ├── FloatingOverlayService.kt # Bubble UI + record/transcribe flow |
| 85 | +│ └── TranscriberAccessibilityService.kt # Types text into focused fields |
| 86 | +└── ui/ |
| 87 | + ├── LogScreen.kt # Transcription history viewer |
| 88 | + ├── SettingsScreen.kt # Server URL + audio quality config |
| 89 | + └── theme/Theme.kt # Material 3 theme |
| 90 | +``` |
| 91 | + |
| 92 | +### whisper-client (Rust crate) |
| 93 | + |
| 94 | +`whisper-client/` contains an async Rust library for calling a Whisper API with either API key auth or [Cashu](https://cashu.space) ecash payment (using [cdk](https://github.com/cashubtc/cdk) 0.8). This is a standalone library, not used by the Android app. |
| 95 | + |
| 96 | +```rust |
| 97 | +let client = WhisperClient::new("https://whisper.example.com".into()); |
| 98 | + |
| 99 | +// With API key |
| 100 | +let result = client.transcribe_with_key( |
| 101 | + "sk-...", audio_bytes, "recording.wav", TranscribeOptions::default() |
| 102 | +).await?; |
| 103 | + |
| 104 | +// With Cashu payment (10 sats/minute) |
| 105 | +let result = client.transcribe_with_cashu( |
| 106 | + &wallet, 10, audio_bytes, "recording.wav", TranscribeOptions::default() |
| 107 | +).await?; |
| 108 | + |
| 109 | +println!("{}", result.text); |
| 110 | +``` |
| 111 | + |
| 112 | +## Network notes |
| 113 | + |
| 114 | +- **HTTP** works out of the box to any IP (cleartext traffic is allowed via network security config) |
| 115 | +- **HTTPS with self-signed certs** works — the client trusts all certificates (this is a private VPN tool, not a public app) |
| 116 | +- Works over **Tailscale**, **ZeroTier**, or any VPN — just use the VPN IP as the server URL |
| 117 | + |
| 118 | +## Tech stack |
| 119 | + |
| 120 | +- Kotlin + Jetpack Compose + Material 3 |
| 121 | +- OkHttp for network |
| 122 | +- DataStore for preferences |
| 123 | +- Target SDK 34, min SDK 26 |
| 124 | +- Gradle 8.5, AGP 8.2.2 |
0 commit comments