Skip to content

Commit 0875eb6

Browse files
committed
1 parent 433a4c4 commit 0875eb6

1 file changed

Lines changed: 124 additions & 0 deletions

File tree

README.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Whisper Transcriber
2+
3+
Android floating overlay app for voice-to-text using a self-hosted [whisper.cpp](https://github.com/ggml-org/whisper.cpp) server. Tap the bubble, speak, and the transcription is typed directly into whatever text field you're using.
4+
5+
Works over Tailscale / ZeroTier — just point it at your server's VPN IP.
6+
7+
## How it works
8+
9+
1. A floating bubble sits over all apps (like Messenger chat heads)
10+
2. Tap to start recording, tap again to stop
11+
3. Audio is sent to your whisper.cpp server via the OpenAI-compatible API (`/v1/audio/transcriptions`)
12+
4. Transcribed text is automatically typed into the focused input field (or copied to clipboard)
13+
14+
## Setup
15+
16+
### Server
17+
18+
Run [whisper.cpp server](https://github.com/ggml-org/whisper.cpp) on your machine:
19+
20+
```bash
21+
./whisper-server -m models/ggml-base.en.bin --port 8080
22+
```
23+
24+
### App
25+
26+
1. Install the APK (grab from [Actions artifacts](../../actions) or build yourself)
27+
2. Open the app, go to **Settings**, enter your server URL (e.g. `http://10.147.20.13:8080`)
28+
3. Grant permissions when prompted:
29+
- **Microphone** — for recording audio
30+
- **Display over other apps** — for the floating bubble
31+
- **Notifications** — for the foreground service
32+
4. Enable the **Whisper Transcriber** accessibility service in Android Settings → Accessibility (needed to type into other apps' text fields)
33+
5. Tap **Start Overlay** — the floating bubble appears
34+
35+
### Permissions
36+
37+
| Permission | Why |
38+
|---|---|
39+
| `RECORD_AUDIO` | Capture voice from microphone |
40+
| `SYSTEM_ALERT_WINDOW` | Floating bubble overlay |
41+
| `FOREGROUND_SERVICE` | Keep the overlay alive |
42+
| `INTERNET` | Send audio to whisper server |
43+
| `POST_NOTIFICATIONS` | Foreground service notification (Android 13+) |
44+
| Accessibility Service | Type transcription into focused text fields |
45+
46+
## Building
47+
48+
### With Nix (CI uses this)
49+
50+
```bash
51+
nix develop --command ./gradlew assembleDebug
52+
```
53+
54+
The `flake.nix` provides JDK 17 + Android SDK (platform 34, build-tools 34.0.0).
55+
56+
### Without Nix
57+
58+
Requires JDK 17 and Android SDK with platform 34:
59+
60+
```bash
61+
export ANDROID_HOME=/path/to/android/sdk
62+
./gradlew assembleDebug
63+
```
64+
65+
APKs end up in `app/build/outputs/apk/`.
66+
67+
## CI
68+
69+
GitHub Actions builds debug + release APKs on every push using a self-hosted NixOS runner. Artifacts are retained for 7 days, older ones are cleaned up automatically.
70+
71+
## Project structure
72+
73+
```
74+
app/src/main/java/com/whispertranscriber/
75+
├── MainActivity.kt # Home screen, nav, permissions
76+
├── audio/
77+
│ └── AudioRecorder.kt # Mic recording → WAV conversion
78+
├── data/
79+
│ ├── SettingsStore.kt # DataStore-backed preferences
80+
│ └── TranscriptionLog.kt # Transcription history (last 100)
81+
├── network/
82+
│ └── WhisperApiClient.kt # OkHttp multipart POST to whisper server
83+
├── service/
84+
│ ├── FloatingOverlayService.kt # Bubble UI + record/transcribe flow
85+
│ └── TranscriberAccessibilityService.kt # Types text into focused fields
86+
└── ui/
87+
├── LogScreen.kt # Transcription history viewer
88+
├── SettingsScreen.kt # Server URL + audio quality config
89+
└── theme/Theme.kt # Material 3 theme
90+
```
91+
92+
### whisper-client (Rust crate)
93+
94+
`whisper-client/` contains an async Rust library for calling a Whisper API with either API key auth or [Cashu](https://cashu.space) ecash payment (using [cdk](https://github.com/cashubtc/cdk) 0.8). This is a standalone library, not used by the Android app.
95+
96+
```rust
97+
let client = WhisperClient::new("https://whisper.example.com".into());
98+
99+
// With API key
100+
let result = client.transcribe_with_key(
101+
"sk-...", audio_bytes, "recording.wav", TranscribeOptions::default()
102+
).await?;
103+
104+
// With Cashu payment (10 sats/minute)
105+
let result = client.transcribe_with_cashu(
106+
&wallet, 10, audio_bytes, "recording.wav", TranscribeOptions::default()
107+
).await?;
108+
109+
println!("{}", result.text);
110+
```
111+
112+
## Network notes
113+
114+
- **HTTP** works out of the box to any IP (cleartext traffic is allowed via network security config)
115+
- **HTTPS with self-signed certs** works — the client trusts all certificates (this is a private VPN tool, not a public app)
116+
- Works over **Tailscale**, **ZeroTier**, or any VPN — just use the VPN IP as the server URL
117+
118+
## Tech stack
119+
120+
- Kotlin + Jetpack Compose + Material 3
121+
- OkHttp for network
122+
- DataStore for preferences
123+
- Target SDK 34, min SDK 26
124+
- Gradle 8.5, AGP 8.2.2

0 commit comments

Comments
 (0)