Skip to content

feat(frontend): local transcription in the browser#1770

Draft
surt91 wants to merge 121 commits into
mainfrom
feat/local-asr
Draft

feat(frontend): local transcription in the browser#1770
surt91 wants to merge 121 commits into
mainfrom
feat/local-asr

Conversation

@surt91
Copy link
Copy Markdown
Collaborator

@surt91 surt91 commented May 13, 2026

No description provided.

Copilot AI review requested due to automatic review settings May 13, 2026 09:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

ccthmanthey and others added 28 commits May 13, 2026 11:26
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lls)

Research for local browser-based Whisper speech recognition via Transformers.js.
Covers technology stack, feature landscape, architecture patterns, domain pitfalls,
and synthesized summary with roadmap implications.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two plans for Infrastructure & Backend Extension phase:
- Plan 01 (Wave 1): Walking skeleton with backend extension, i18n, Vite config, Transformers.js install, frontend recognition
- Plan 02 (Wave 2): Regression verification via E2E tests + Admin UI visual checkpoint
- SKELETON.md documents architectural decisions for the local transcription feature

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address 4 checker issues: add (RESOLVED) markers to RESEARCH.md open
questions, add INFRA-01 assetsInclude deviation note to Plan 01 Task 2,
strengthen verify with Vite build smoke test, fix VALIDATION.md task IDs
to reference Plan 01 for EXT-01/02/03.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create LocalTranscribeExtension with spec: name='transcribe-local',
  group='speech-to-text', type='other', defaultLanguage select (de/en)
- Add 5 unit tests verifying name, group, type, config, empty middlewares
- Register extension in ExtensionLibraryModule providers
- Add i18n entries in en/de for title, description, defaultLanguage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ension

- Install @huggingface/transformers@4.2.0 in frontend
- Add optimizeDeps.exclude for @huggingface/transformers (prevent WASM pre-bundling)
- Add worker.format: 'es' for ES module Web Workers
- Add COOP/COEP headers (credentialless) to Vite dev server
- Wire 'transcribe-local' in ChatInput.tsx voiceExtensions filter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SUMMARY.md documenting 2 tasks, 8 files, 21min execution
- All acceptance criteria met, self-check passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…h COOP/COEP headers

- Backend: 225/225 tests pass, 0 failures
- E2E (Chromium): 30/33 pass, 3 failures are pre-existing REIS dependency issue
- No CORP-related blocking, no regressions from extension registration
- SUMMARY includes checkpoint state for human-verify Task 2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1 (Infrastructure & Backend Extension) verified and marked complete.
All 7 requirements (INFRA-01..04, EXT-01..03) satisfied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two plans covering all 11 requirements (WORK-01..05, AUDIO-01..04, MODEL-01..02):
- Plan 01 (Wave 1): Whisper Web Worker + audio resampling utility
- Plan 02 (Wave 2): useLocalTranscribe hook + i18n keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…U detection

- Singleton TranscriberPipeline with null-coalescing assignment (??=) pattern
- WebGPU auto-detection with WASM fallback via navigator.gpu.requestAdapter
- Language mapping: de->german, en->english with english fallback
- Progress forwarding via postMessage for model download tracking
- Load/transcribe message handlers with typed message protocol
- fp16 dtype per D-02, onnx-community/whisper-base model per D-01
- 14 unit tests covering singleton, device detection, language mapping,
  load/transcribe flow, progress forwarding, and error handling
- resampleToMono16kHz converts MediaRecorder output to 16kHz mono Float32Array
- Uses browser-native OfflineAudioContext for sample rate conversion and mixing
- Proper AudioContext cleanup via finally block
- Returns .slice() copy to allow garbage collection of rendered buffer
- 7 unit tests covering return type, sample rate, duration calculation,
  cleanup on success/error, slice copy, and source connection
ccthmanthey and others added 24 commits May 13, 2026 11:29
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two parallel plans: code cleanup (lint fixes, planning ref removal, JSDoc)
and documentation updates (whisper-base -> whisper-small q8 correction).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ocal transcription files

- Strip 6 planning reference suffixes (D-04, D-05, D-08, D-09, D-03, AUDIO-03) from useLocalTranscribe.ts
- Strip 2 planning reference suffixes (D-08, D-09) from whisper.worker.ts
- Strip 1 planning reference (D-04) from DownloadProgressBanner.tsx
- Fix import order, set-state-in-effect, and Prettier violations in DownloadProgressBanner.tsx
- Fix Prettier violations in PrivacyBadge.tsx (collapse multi-line span)
- Fix Prettier violations in whisper.worker.ts (arrow param parens, ternary collapse)
… transcription modules

- Add JSDoc to LocalTranscribeState, DownloadProgress, UseLocalTranscribeProps types
- Add JSDoc to useLocalTranscribe hook function with property-level docs on props
- Add JSDoc to LocalTranscribeButtonProps, DownloadProgressBannerProps interfaces
- Add JSDoc to resampleToMono16kHz exported function in audio-utils.ts
- Add JSDoc to WorkerMessageData interface in whisper.worker.ts
- All 84 frontend and 5 backend tests pass without regressions
- SUMMARY.md documenting 2 tasks: planning ref removal, lint fixes, JSDoc additions
- All 84 frontend + 5 backend tests pass, zero ESLint violations across 8 files
…whisper-small q8

- Update 7 occurrences across 5 sections to match shipped code
- Change model size references from ~140MB to ~240MB
- Update Key Decisions table with correct rationale and mark Implemented
- Documentation now matches actual code: onnx-community/whisper-small with q8 quantization
…e to whisper-small q8

- Update MODEL-01 description to whisper-small q8 (~240MB)
- Update Out of Scope table to reference whisper-small q8
- Fix additional Whisper-base reference in Multi-Speaker Diarization row
- SUMMARY.md documents 2 tasks aligning whisper model references
- PROJECT.md and REQUIREMENTS.md now match shipped whisper-small q8 code
Replace non-null assertion `workerRef.current!` with a null guard to
prevent TypeError crash when worker is null during component unmount
or timing race conditions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add fallback resolution when recorder.state is not 'recording' to
prevent the promise from hanging indefinitely when MediaRecorder state
diverges from hook state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevent division by zero when samples array is empty, which would
produce NaN and bypass silence detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ding

Prevent user from getting stuck in 'downloading' state when workerRef
is null by adding a null check before sending the load message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevent negative WARNING_THRESHOLD when maxSeconds < 15, which would
cause the timer to show in red from the start of recording.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reorder unmount cleanup to stop the MediaRecorder before calling
cleanup(), preventing lost audio chunks and incorrect error paths
from empty audioChunksRef.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 6 findings (2 critical, 4 warning) have been resolved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adjust line wrapping to satisfy prettier after the workerRef null
guard changed the variable name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Safety commit with milestone archives before REQUIREMENTS.md removal.
Archives ROADMAP, REQUIREMENTS, and MILESTONE-AUDIT to milestones/.
Updates PROJECT.md (full evolution review), ROADMAP.md (milestone grouping),
STATE.md (shipped status), and creates MILESTONES.md entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
REQUIREMENTS.md archived to milestones/v1.0-REQUIREMENTS.md.
Fresh REQUIREMENTS.md will be created for next milestone via /gsd-new-milestone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all 6 phase directories to milestones/v1.0-phases/.
Remove audit file from root (copy in milestones/).
Create RETROSPECTIVE.md with v1.0 milestone section.
Update STATE.md to shipped status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
we are usinmg @huggingface/transformers only in the browser, so we will never need the sharp dependency. We overwrite it to avoid a problem on `npm install` which appears in rare cases, where a dependency of sharp (libvips) is already installed on the host system.
Copilot AI review requested due to automatic review settings May 13, 2026 09:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@surt91 surt91 marked this pull request as draft May 13, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants