On-device transcription via sherpa-onnx + SAF/MediaStore + Bluetooth#365
Draft
julianjc84 wants to merge 12 commits intoFossifyOrg:mainfrom
Draft
On-device transcription via sherpa-onnx + SAF/MediaStore + Bluetooth#365julianjc84 wants to merge 12 commits intoFossifyOrg:mainfrom
julianjc84 wants to merge 12 commits intoFossifyOrg:mainfrom
Conversation
Squash-merge of upstream PR FossifyOrg#317 (28 commits) onto current main. Adds a new :store Gradle module that abstracts recording I/O behind a unified RecordingStore interface, with two backends: - MediaStore (default; no folder picker on first run) - Storage Access Framework (lets the user save into any folder or document provider, including cloud / sync apps) Switches Recording from path-based to URI-based and removes the old DocumentFile/path extension helpers. Includes the upstream test suite under store/src/androidTest. Conflict resolved: kept main's commons 6.1.6 over the PR's 6.1.0. Verified ./gradlew :app:assembleDebug succeeds across Core/Foss/Gplay. Original PR: FossifyOrg#317 Co-Authored-By: Adam Cigánek <adam.ciganek@proton.me> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
it also improves Android 12+ compat and reduce warnings on build
Adds a new :transcribe Gradle module wrapping sherpa-onnx (JitPack v1.12.40) with a Whisper-tiny multilingual int8 model downloaded on first use. A new foreground TranscriptionService streams the recording through MediaCodec + a linear resampler to 16 kHz mono Float32, runs inference per 30 s chunk, and writes a JSON sidecar (.transcript.json) next to the audio via TranscriptStore (SAF + MediaStore parity). In the player UI a transcript icon next to the title opens TranscriptDialog, which renders idle/busy/ready states and supports tap-segment-to-seek. Progress is throttled to ≤4 events/sec so the foreground notification isn't re-posted thousands of times per second during model download. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cribe Adds a Transcription section in Settings with a model picker that lists all catalog entries with install state, size, and per-row download/delete actions, plus a language preference. Downloads run through the existing foreground service via a new ACTION_DOWNLOAD_MODEL path so they survive backgrounding and reuse the cancellation/notification plumbing. Transcripts now persist processing wall-clock time (processing_ms in the sidecar JSON), shown in the transcript dialog ready state. Adds a Re-transcribe button that overwrites the existing sidecar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Wrap the model list in a ScrollView so all entries are reachable (matches how commons' own dialog_radio_group.xml works). - Restructure each row so the action button sits on its own line, preventing the radio + name from being squeezed by long button labels. - Pass forceFinished = true from the Completed/Failed/Cancelled subscribers so the row no longer races against the service clearing TranscriptionService.downloadingModelId in its finally block — fixes the row briefly showing "0%" right after a download completes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Detection used to be a one-shot poll on lifecycle events, so plugging
or unplugging a BT headset while sitting on the recorder screen had no
visible effect until something else triggered a refresh.
- Register an AudioDeviceCallback on attach (unregister on destroy)
and refresh the BT tab on every onAudioDevicesAdded/Removed.
- Keep the mic selector row visible whenever recording is stopped
(instead of vanishing when no BT device is present) and render the
Bluetooth tab as dimmed + non-tappable when unavailable, with a
"Bluetooth · Not connected" label.
- When a BT mic is connected and BT_CONNECT permission is granted,
show the device productName ("Bluetooth · AirPods Pro") so the
user can confirm which headset is in use.
- If BT was selected and the device disconnects, fall back to Device
Mic automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t toggle Promotes transcript viewing from a popup dialog to a dedicated activity and gives the player a dedicated view onto transcribed recordings. Recordings list (PlayerFragment): - Add a two-tab segmented control at the top: "Audio" (current behaviour) and "Transcripts" (filtered to recordings with a sidecar JSON). - "Has transcript" set is recomputed on a background thread when entering Transcripts mode and on every refresh while in it. - In Transcripts mode, tap on a row opens the new TranscriptActivity; long-press shows transcript-flavoured cab actions: Share transcript (text/plain), Share transcript JSON, Copy transcript, Delete transcript. The original Audio long-press menu is unchanged. Transcript viewer (TranscriptActivity, replaces TranscriptDialog): - Material toolbar with back arrow, in-toolbar SearchView, and overflow (Re-transcribe / Copy / Share JSON / Delete). Menu is inflated directly on MaterialToolbar — commons' setupTopAppBar does not register the toolbar as the support action bar, so onCreateOptionsMenu does not fire for an AppCompat menu attached that way. - Search highlights all matches across segments via SpannableString (active match in primary colour, passive matches in semi-transparent primary), with prev/next chevrons and an "X / Y matches" counter. - Self-contained MediaPlayer with mini play/pause + seekbar; segment tap seeks and plays. The fragment's player is paused before launching so the two players don't compete. - A 200ms tick syncs a playhead-segment highlight (tinted background + bold/primary timestamp) and auto-scrolls the active segment into view only when it has moved off-screen. Plumbing: - TranscriptStore.sidecarUri exposes the JSON URI for ACTION_SEND. - TranscriptShare.kt builds the plain-text body and the share intents. - New strings for tabs, share / copy / delete labels, and search UI. - Manifest entry for TranscriptActivity (parented to MainActivity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
While transcribing, the activity now shows a "Elapsed: 00:42 · ETA: 03:24" line under the progress bar, ticking once per second. The ETA is a linear extrapolation from the consumer-side fraction (audio actually transcribed, not merely decoded), suppressed until fraction > 5% to avoid noise from chunk-boundary jitter, and only shown during the TRANSCRIBING phase (the model-download and decode/write phases are too short to extrapolate from). TranscriptionService exposes a transcriptionStartMs companion so the activity can reconstruct elapsed time even when opened mid-job. Pipelined decode+transcribe in TranscriptionService: - AudioDecoder runs on a worker thread, pushing PcmChunks into a small bounded queue (capacity 2 for one in-flight + one waiting). - The recognizer drains the queue on the existing pipeline coroutine, so MediaCodec / extractor wait time on chunk N+1 overlaps with inference on chunk N. - Progress is now emitted from the consumer side (chunk.endMs vs. the recording's known duration) so the bar tracks real transcription progress instead of running ahead of it. - Cancellation propagates both ways: the producer polls isCancelled while waiting to enqueue, and any consumer error sets isCancelled so the producer drops out of decodeChunks cleanly. An EOF sentinel always unblocks the consumer's queue.take(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pipelined version posted progress only on each chunk's completion, which made the bar jump roughly every 5–10 s of wall time instead of moving continuously like the old decoder-driven progress did. A 400 ms ticker thread now interpolates inside the current chunk using a rolling EMA of per-chunk wall time (seeded at 6 s, alpha 0.4) to estimate where we are between the chunk's start and end fraction. The notification rebuild is skipped when the rounded % is unchanged so the foreground notif isn't churned several times per second. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Long-press a segment in TranscriptActivity to enter selection mode. Tap or long-press additional segments to toggle them, then copy or share the selection as "[mm:ss] text" lines via the toolbar. The toolbar swaps to an X-nav + Copy / Share / Select all menu, with hardware back wired through OnBackPressedCallback to exit cleanly. Row styling is unified into a single applyRowStyle helper that picks selection > playhead > none, so the playhead can pass under selected rows without disturbing them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er-row menu Drop the Audio/Transcripts toggle and the player-bar transcript button in favour of a single recordings list where each row carries its own transcript state inline. Rows with a transcript show an italic preview snippet and the transcript icon; rows without one show a dim "Transcribe" prompt. Tapping either opens the full transcript view, which already handles both idle and ready states. Add a per-row 3-dot overflow menu for single-item operations (rename, open with, share/delete audio, copy/share/delete transcript) and trim the CAB to bulk-only operations (share, delete, delete transcript, select all). This removes the icon doubling that occurred when the CAB tried to host both audio and transcript actions for one selection. Subscribe PlayerFragment to TranscriptionCompleted so the affected row's indicator refreshes in place once a transcription finishes, no longer requiring an app restart to see the new state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… full-screen view Deleting a transcript from the row's 3-dot popup already refreshed the list because the adapter calls refreshListener.refreshRecordings(). Deleting from the toolbar inside TranscriptActivity skipped that path, so the row's indicator kept showing the stale preview snippet until the app was reopened. Add Events.TranscriptDeleted, post it from TranscriptActivity's delete flow, and subscribe in PlayerFragment to recompute the preview map. The affected row's indicator now flips back to the "Transcribe" prompt in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft for discussion — not yet ready to merge. Adds on-device speech-to-text
on top of two foundational changes:
3dd79248) — recordings via content URIs.66880eb5+5d177606) — SCO mic with live detection.e977cd16…23bb97aa) — sherpa-onnx Whisper engine, modeldownload manager, inline transcript indicator, full-screen transcript view
with multi-segment selection, pipelined decode+transcribe with live ETA.
Based on upstream
1a2f0963.Acknowledgements
Huge thanks to @FossifyOrg and the Fossify Voice Recorder maintainers and
contributors — this branch sits directly on top of
mainand reuses theexisting recorder/player/settings architecture throughout.
This branch also incorporates two open upstream PRs that I rebased and
resolved conflicts for:
Squash-merged with the conflict resolution noted in
3dd79248.Authorship preserved in
66880eb5; live mic-detection UX added in5d177606.All transcription work on top is mine.