Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Whisper Transcriber

Android floating overlay app for voice-to-text using a self-hosted [whisper.cpp](https://github.com/ggml-org/whisper.cpp) server. Tap the bubble, speak, and the transcription is typed directly into whatever text field you're using.

Works over Tailscale / ZeroTier — just point it at your server's VPN IP.

## How it works

1. A floating bubble sits over all apps (like Messenger chat heads)
2. Tap to start recording, tap again to stop
3. Audio is sent to your whisper.cpp server via the OpenAI-compatible API (`/v1/audio/transcriptions`)
4. Transcribed text is automatically typed into the focused input field (or copied to clipboard)

## Setup

### Server

Run [whisper.cpp server](https://github.com/ggml-org/whisper.cpp) on your machine:

```bash
./whisper-server -m models/ggml-base.en.bin --port 8080
```

### App

1. Install the APK (grab from [Actions artifacts](../../actions) or build yourself)
2. Open the app, go to **Settings**, enter your server URL (e.g. `http://10.147.20.13:8080`)
3. Grant permissions when prompted:
- **Microphone** — for recording audio
- **Display over other apps** — for the floating bubble
- **Notifications** — for the foreground service
4. Enable the **Whisper Transcriber** accessibility service in Android Settings → Accessibility (needed to type into other apps' text fields)
5. Tap **Start Overlay** — the floating bubble appears

### Permissions

| Permission | Why |
|---|---|
| `RECORD_AUDIO` | Capture voice from microphone |
| `SYSTEM_ALERT_WINDOW` | Floating bubble overlay |
| `FOREGROUND_SERVICE` | Keep the overlay alive |
| `INTERNET` | Send audio to whisper server |
| `POST_NOTIFICATIONS` | Foreground service notification (Android 13+) |
| Accessibility Service | Type transcription into focused text fields |

## Building

### With Nix (CI uses this)

```bash
nix develop --command ./gradlew assembleDebug
```

The `flake.nix` provides JDK 17 + Android SDK (platform 34, build-tools 34.0.0).

### Without Nix

Requires JDK 17 and Android SDK with platform 34:

```bash
export ANDROID_HOME=/path/to/android/sdk
./gradlew assembleDebug
```

APKs end up in `app/build/outputs/apk/`.

## CI

GitHub Actions builds debug + release APKs on every push using a self-hosted NixOS runner. Artifacts are retained for 7 days, older ones are cleaned up automatically.

## Project structure

```
app/src/main/java/com/whispertranscriber/
├── MainActivity.kt # Home screen, nav, permissions
├── audio/
│ └── AudioRecorder.kt # Mic recording → WAV conversion
├── data/
│ ├── SettingsStore.kt # DataStore-backed preferences
│ └── TranscriptionLog.kt # Transcription history (last 100)
├── network/
│ └── WhisperApiClient.kt # OkHttp multipart POST to whisper server
├── service/
│ ├── FloatingOverlayService.kt # Bubble UI + record/transcribe flow
│ └── TranscriberAccessibilityService.kt # Types text into focused fields
└── ui/
├── LogScreen.kt # Transcription history viewer
├── SettingsScreen.kt # Server URL + audio quality config
└── theme/Theme.kt # Material 3 theme
```

### whisper-client (Rust crate)

`whisper-client/` contains an async Rust library for calling a Whisper API with either API key auth or [Cashu](https://cashu.space) ecash payment (using [cdk](https://github.com/cashubtc/cdk) 0.8). This is a standalone library, not used by the Android app.

```rust
let client = WhisperClient::new("https://whisper.example.com".into());

// With API key
let result = client.transcribe_with_key(
"sk-...", audio_bytes, "recording.wav", TranscribeOptions::default()
).await?;

// With Cashu payment (10 sats/minute)
let result = client.transcribe_with_cashu(
&wallet, 10, audio_bytes, "recording.wav", TranscribeOptions::default()
).await?;

println!("{}", result.text);
```

## Network notes

- **HTTP** works out of the box to any IP (cleartext traffic is allowed via network security config)
- **HTTPS with self-signed certs** works — the client trusts all certificates (this is a private VPN tool, not a public app)
- Works over **Tailscale**, **ZeroTier**, or any VPN — just use the VPN IP as the server URL

## Tech stack

- Kotlin + Jetpack Compose + Material 3
- OkHttp for network
- DataStore for preferences
- Target SDK 34, min SDK 26
- Gradle 8.5, AGP 8.2.2
Loading